Pythonic access to audio files: python-wavefile

Posted 2014-06-30

Last week, python-wavefile received a pull request from the PyDAW project to make it compatible with Python3. So, I awaked the project to pull the contributions and addressing some of the old pending tasks.

I did not realize python-wavefile got more relevance than most of my github projects: Other people, not just me, are actually using it, and that’s cool. So I think I owe the project a blog entry… and maybe a logo.

python-wavefile is a Python module to read and write audio files in a pythonic way.

Instead of just exposing the C API of the powerful Eric De Castro Lopo’s libsndfile, it enables common Python idioms and numpy bridging for signal processing. There are many Python modules around wrapping libsndfile including an standard one. At the end of the article I do a quick review of them and justify why i did yet-another libsndfile Python wrapper.

History

This module was born to cover the needs I had while doing research for my PhD thesis on 3D audio. I needed floating point samples and multi-channel formats for Higher Order Ambisonics and multi-speaker mix-down. I also needed efficient block processing, as well as the inefficient, but sometimes convenient, Matlab-like load-it-all functionality.

This is why I proposed Xavi Serra, when he was starting his Master Thesis, a warm-up exercise: Mocking up Python bindings for the libsndfile library using different methods: Cython, CPython module, Boost, CTypes, SIP… That exercise resulted in several mock-ups for each binding method, and an almost full implementation using CPython. But the [CPython code] became too complex, which led to the double layer strategy Xavi finally used for iPyCLAM. That is, a lower narrow layer making the C API available to Python as is, and a user layer adding all the Python sugar.

So I created python-wavefile by reimplementing the user API we defined with Xavier Serra but relying on the C-API wrapping defined in libsndfile-ctypes.

Python-wave, the official API and the root of all evil

Why do we do that? The root of all evil is Python official module to deal with wave files. It is based on libsndfile as well, but the Python API is a crap, a real crap:

:-) As standard lib it is available on every Python install, but…
:-( It has nasty accessors like getcomptype, getsampwidth…
- Names with a hard to read/remember combination of shorts
- Using getters instead of properties
:-( It just opens WAV files, and none of the many formats libsndfile supports
:-( It just opens Mono and Stereo audio.
:-( It just opens some limited encodings.
- A patch to implement floating point samples was rejected because… well, who knows
:-( Data is passed as coded byte strings.
- On writting, users are responsable of encoding samples which is a low level and error prone task.
- Even worse, on reading, users have to implement decoding for every kind of encoding available.
- Libsndfile actually does all this stuff for you, so why the hell to use the raw interface?
:-( It ignores Python constructs and idioms:
- Generators to access files progressively in iterations
- Context managers to deal safely with file resources
- Properties instead of getters and setters
:-( It allocates a new data block for each block you read, which is a garbage collector nightmare.
:-( It has no support for numpy
- A core lib cannot have a dependency on numpy but it is quite convenient feature to have to perform signal processing

Because of this, many programmers built their own libaudiofile wrapper but most of them fail for some reason, to fulfill the interface I wanted. Instead of reinventing the wheel I reused design and even code from others. At the end of the article I place an extensive list of such alternatives and their strong and weak points.

The API by example

Let’s introduce the API with some examples.

To try the examples you can install the module from PyPi repositories using the pip command.

$ pip install wavefile

Notes for Debian/Ubuntu users:

Use sudo or su to get administrative rights
If you want to install it for Python3 use pip3 instead

Writting example

Let’s create an stereo OGG file with some metadata and a synthesized sound inside:

from wavefile import WaveWriter, Format
import numpy as np

with WaveWriter('synth.ogg', channels=2, format=Format.OGG|Format.VORBIS) as w :
    w.metadata.title = "Some Noise"
    w.metadata.artist = "The Artists"
    data = np.zeros((2,512), np.float32)
    for x in xrange(100) :
        # Synthesize a kind of triangular sweep in one channel
        data[0,:] = (x*np.arange(512, dtype=np.float32)%512/512)
        # And a squared wave on the other
        data[1,512-x:] =  1
        data[1,:512-x] = -1

        w.write(data)

Playback example (using pyaudio)

Let’s playback a command line specified audio file and see its metadata and format.

import pyaudio, sys
from wavefile import WaveReader

p = pyaudio.PyAudio()
with WaveReader(sys.argv[1]) as r :

    # Print info
    print "Title:", r.metadata.title
    print "Artist:", r.metadata.artist
    print "Channels:", r.channels
    print "Format: 0x%x"%r.format
    print "Sample Rate:", r.samplerate

    # open pyaudio stream
    stream = p.open(
            format = pyaudio.paFloat32,
            channels = r.channels,
            rate = r.samplerate,
            frames_per_buffer = 512,
            output = True)

    # iterator interface (reuses one array)
    # beware of the frame size, not always 512, but 512 at least
    for frame in r.read_iter(size=512) :
        stream.write(frame, frame.shape[1])
        sys.stdout.write("."); sys.stdout.flush()

    stream.close()

Processing example

Let’s process some file by lowering the volume and changing the title.

import sys
from wavefile import WaveReader, WaveWriter

with WaveReader(sys.argv[1]) as r :
    with WaveWriter(
            'output.wav',
            channels=r.channels,
            samplerate=r.samplerate,
            ) as w :
        w.metadata.title = r.metadata.title + " (dull version)"
        w.metadata.artist = r.metadata.artist

        for data in r.read_iter(size=512) :
            sys.stdout.write("."); sys.stdout.flush()
            w.write(.8*data)

read_iter simplifies the code by transparently:

allocating the data block for you,
reusing such block for each read and thus reducing the memory overhead, and
returning a slice of it when the last incomplete block arrives.

Masochist example

If you like you can still do things by hand using a more C-ish API:

import sys, numpy as np
from wavefile import WaveReader, WaveWriter

with WaveReader(sys.argv[1]) as r :
    with WaveWriter(
            'output.wav',
            channels=r.channels,
            samplerate=r.samplerate,
            ) as w :
        w.metadata.title = r.metadata.title + " (masochist)"
        w.metadata.artist = r.metadata.artist

        data = r.buffer(512)   # equivalent to: np.empty((r.channels,512), np.float32, order='F')
        nframes = r.read(data)
        while nframes :
            sys.stdout.write("."); sys.stdout.flush()
            w.write(.8*data[:,:nframes])
            nframes = r.read(data)

Notice that with read you have to reallocate the data yourself, the loop structure is somewhat more complex with duplicated read inside and outside the loop. You also have to slice to the actual number of read frames since the last block usually does not have the size you asked for.

The API uses channel as the first index for buffers. This is convenient because usually processing splits channels first. But audio files (WAV) interleaves samples for different channels in the same frame:

f1ch1 f1ch2 f2ch1 f2ch2 f3ch1 f3ch2 ...

Reads are optimized by using a read buffer with Fortran order (F). Numpy handles the indexing transparently but for the read buffer, and just for the read buffer we recommend to use the buffer() method. That’s not needed for the rest of buffers, for example, for writting and you don’t have to worry at all if you are using the read_iter API.

Load and save it all interface

This interface is not recommended for efficient processing, because it loads all the audio data in memory at once, but is sometimes convenient in order to have some code quickly working.

import wavefile

samplerate, data = wavefile.load("synth.ogg")

data = data[::-1,:] # invert channels

wavefile.save("output.flac", data, samplerate)

New introduced features

Python 3 support

That was the pull request from Jeff Hugges of the PyDAW project. Thanks a lot for the patches!

We managed to make Python 3 code to be also compatible with Python 2. So now the same code base works on both versions and passes the same tests.

Unicode in paths and tags

Besides Python3 compatibility, now the API deals transparently with Unicode strings both for file names and text tags such as title, artist…

If you encode the string before passing it to the API, and pass it as a byte string, the API will take that encoding with no question and use it. More safe is just passing the unicode string (unicode in Py2 and str in Py3). In that case the API encodes or decodes the string transparently. In the case of filenames, it uses the file system default encoding available to Python as sys.getfilesystemencoding(). In the case of text tags, it will use UTF-8 which is the standard for Vorbis based files (ogg, flac…).

WAV’s and AIFF standard just specifies about ASCII strings and I had my concerns about using UTF-8 there. After a discussion with Eric de Castro, we settled that UTF-8 is a safe option for reading and a nice one to push as de facto standard, but I am still not confident about the later. The alternative would have been raise a text encoding exception whenever a non ASCII character is written to a WAV/AIFF. I am still open to further arguments.

Seek, seek, seek

I also added API to seek within the file. This enables a feature a user asked like reseting the file reading and being able to loop. I was uncertain about libsndfile behaviour on seek. Now such behaviour is engraved on API unit tests:

Seeks can be a positive or negative number of frames from a reference frame
Frames are as many samples as channels, being a sample a digitally encoded audio level
The reference point for the seeking can be the beginning (SET), the end (END) or the current next sample to be read (CUR)
- That is, if your last read was a 10 frame block starting at 40, your current seek reference is 50
Seek returns the new frame position to be read if the jump is successful or -1 if not.
Jumps to the first frame after the last frame do not fail, even though that frame does not exist.
EOF status resets whenever you successfully seek

Why yet another…

A list of alternative implementations follow.

Official python-wave

Nothing to see. It is crap.

scikits.audiolab

Author: David Cournapeau
Web: http://cournape.github.io/audiolab/
PyPi: https://pypi.python.org/pypi/scikits.audiolab/
Source: git clone https://github.com/cournape/audiolab
Wrap: Cython
:-) Property accessors to format metadata and strings
:-) Matlab like functions
:-) Block processing
:-) Numpy integration
:-) Enumerable formats
:-( Not in-place read (generates a numpy array for each block)
:-( No context managers
:-| Part of a huge library (no dependencies, though)

ewave

Author: C Daniel Meliza
Web: https://github.com/melizalab/py-ewave
PyPi:https://pypi.python.org/pypi/ewave
Source: git clone git@github.com:melizalab/py-ewave.git
Wrap: Pure Python (not based on libsndfile)
:-( Just WAV’s and limited encodings (no 24bits)
:-) Support for floating point encodings, multichannel,
:-) Memory mapping for long files
:-) Numpy support
:-) Context managers

pysndfile (savanah)

Author: ???
Web: http://savannah.nongnu.org/projects/pysndfile/
Wrap: Swig
:-( Ugly: Uses a similar metadata API than python-wave
:-( Unusable: unfinished implementation, empty read/write methods in wrapper!
:-( Unmaintained since 2006

libsndfile-python

Author: Hedi Soula (current maintainer) / Rob Melby (original)
Web: http://code.google.com/p/libsndfile-python/
Source: svn checkout http://libsndfile-python.googlecode.com/svn/trunk/ libsndfile-python
Wrap: CPython
:-) NumPy
:-( Not in-place read (generates a numpy array for each block)
:-( Some edges are not that pythonic
:-) Implements ‘command’ sndfile interface

libsndfile-ctypes

http://code.google.com/p/pyzic/wiki/LibSndFilectypes
Author: Timothe Faudot
Source: svn checkout http://pyzic.googlecode.com/svn/trunk/libsndfile-ctypes
Wrap: CTypes
:-) no CPython module compilation required
:-) NumPy
:-) Context managers!
:-) Property accessors for format metadata and strings
:-( Not inplace read (creates an array every block read)
:-) No property accessors for strings
:-( No generator idiom
:-( Windows only setup
:-( Text tags not as properties
:-( Long access to constants (scoping + prefixing)
:-( Single object mixing read and write API’s

python-wavefile

That’s the one. I used the implementation layer from libsndfile-ctypes. I really liked the idea of having a direct C mapping without having to compile a CPython module, and how nicely the numpy arrays were handled by CTypes. Then, over that implementation layer, I added a user level API implementing pythonic interface including those supported by other wrappers and the new ones.

https://github.com/vokimon/python-wavefile
Author: David Garcia Garzon (with code from all the above)
Source: git clone git@github.com:vokimon/python-wavefile.git
PyPi: wavefile
Wrap: CTypes
:-) Property accessors to format metadata and strings
:-) Dual interface: matlab like and OO block processing
:-) No CPython module compilation required
:-) NumPy
:-) Context managers!
:-) Pythonic block iteration
:-) Reuses data blocks avoiding garbage collector nigthmares
:-) Matlab load-all interface
:-) Unicode integration
:-) Works in Windows, Linux and Mac
:-) Python 2 and Python 3 support
:-( Command API not implemented
:-( No simultaneous Read/Write mode
:-( No writting seek
:-( No format enumeration (yet!)
:-( Does not accept single dimensional arrays (nuisance)

Other wrappers I found afterwards and I didn’t check

Yet to be reviewed:

pysndfile (IRCAM)
- Author: Axel Roebel (IRCAM)
- Web: http://forge.ircam.fr/p/pysndfile/
- PyPi: https://pypi.python.org/pypi/pysndfile
- Source: No git/svn, Pyx file missing in the tarball
- Wrap: Cython
- :-) Command API implemented
- :-) Format enumeration
- :-) Interface to disable clipping
- :-) NumPy
- :-( Method read returns a new buffer
sndfile.io
- Author: Eduardo Moguillansky
- Web: https://github.com/gesellkammer/sndfileio
- PyPi: https://pypi.python.org/pypi/sndfileio/
- Wrap: Relies on scikits.audiolab but, when missing, implements in pure Python simple WAV and AIFF formats
- :-) Iterator idiom
- :-) Plugin abstraction (could eventually allow MP3 using other backends than libsndfile)
- :-( Creates buffer per read (just like audiolab does)
PySoundFile
- https://github.com/bastibe/PySoundFile (First commit Aug 2013)
- Wrap: CFFI
- :-) Numpy
libsndfile-python
- Web: http://arcsin.org/softwares/libsndfile-python.html