LLM多模态•audiocraft•av(interfacing FFmpeg API)•Audio/Video/Bitstream

发布时间 2023-07-10 21:46:29作者: abaelhe

无论是ChatGPT、 LLM大语言模型、还是Meta公司的AI生成音乐?,

都需要对 Audio、Video、Bitstream 进行处理。

 

以Meta(Facebook已改名为Meta)开源的 audiocraft 为例:

  1. ASR(Audio转文本, 人机语音交互与识别)、
  2. TTS(文本合成语音)、
  3. NLP(自然语言处理)、
  4. NLG(自然语言生成)、
  5. Content Generation(智能生成 Text/Image/Audio/Video/…)

 

audiocraft 的:

  • NLP 部分用的是Python库SpaCy;
  • audio/video 部分用的是Python库 av(用 Cython 封装好FFmpeg C/C++ API),极大的方便 Audio/Video/Bitstream 的上层应用例如 AI/MachinLearning调用.
  • 当然还可以参考Python的 OpenCV / av 库封装其它的多模态内容接口; 实现全媒体覆盖(Article/Text/Image/Audio/Video/…)

 

av 这个库(https://pypi.org/project/av/#description)

 

PyAV is a Pythonic binding for the [FFmpeg][ffmpeg] libraries. We aim to provide all of the power and control of the underlying library, but manage the gritty details as much as possible.

PyAV is for direct and precise access to your media via containers, streams, packets, codecs, and frames. It exposes a few transformations of that data, and helps you get your data to/from other packages (e.g. Numpy and Pillow). This power does come with some responsibility as working with media is horrendously complicated and PyAV can't abstract it away or make all the best decisions for you. If the ffmpeg command does the job without you bending over backwards, PyAV is likely going to be more of a hindrance than a help. But where you can't work without it, PyAV is a critical tool.

Installation

Due to the complexity of the dependencies, PyAV is not always the easiest Python package to install from source. Since release 8.0.0 binary wheels are provided on [PyPI][pypi] for Linux, Mac and Windows linked against a modern FFmpeg. You can install these wheels by running:
bash pip install av If you want to use your existing FFmpeg, the source version of PyAV is on [PyPI][pypi] too:
bash pip install av --no-binary av

Alternative installation methods --------------------------------
Another way of installing PyAV is via [conda-forge][conda-forge]: ```bash conda install av -c conda-forge

</code>