Master Thesis: Visual Prototyping of Audio Applications

The prototyping architecture

Index

This chapter describes the details of the prototyping architecture. Firstly, an overview describes the relations and functionalities of the different architectonic elements. Then, it explains how each architectonic element address the different issues that are required to fulfill the functionality.

Requirements

The family of applications the architecture is able to visually build includes real-time audio processing applications as defined in sec:RealTimeAudioApplications. This include some applications archetypes such as real-time software synthesizers, real-time music analyzers (figure fig:TonalAnalysis) and audio effects and plugins (figure fig:SMSTransposition).

fig:TonalAnalysis
An example of audio analysis application: Tonal analysis with chord extraction. This application can be prototyped in CLAM in a matter of minutes and is able to analyze and incoming audio and extract and represent its chords and tonal components.
fig:SMSTransposition
An example of a rapid-prototyped audio effect application: Pitch transposition. Note how in this application apart from representing different signal components three sliders control the process interacting directly with the underlying processing engine.

The only limitation imposed on the target applications is that their logic should be limited to just starting and stopping the processing algorithm, configuring it, connecting it to the system streams (audio from devices, audio servers, plugin hosts, MIDI, files, OSC...), visualizing the inner data and controlling some algorithm parameters while running. Note that these limitations are very much related to the explicit life-cycle of a 4MPS Processing object outlined in section~lifecycle.

Given those limitations, the defined architecture does not claim to visually build every kind of audio application. For example, audio authoring tools, which have a more complex application logic, would be out of the scope, although the architecture would help to build important parts of such applications.

Besides that, the architecture provides the following features:

Main architecture

fig:Architecture
Visual prototyping architecture. The CLAM components that enable the user to visually build applications.

The proposed architecture (figure fig:Architecture) has three main components:

The key element is the run-time engine. It dynamically builds definitions coming from both tools, relates them and manages the application logic. We implemented this architecture using some existing tools. We are using CLAM NetworkEditor as the audio processing visual builder, and Trolltech's Qt Designer as the user interface definition tool. Both Qt Designer and CLAM NetworkEditor provide similar capabilities in each domain, user interface and audio processing, which are later exploited by the run-time engine.

Visual builders

Qt Designer can be used to define user interfaces by combining several widgets. The set of widget is not limited; developers may define new ones that can be added to the visual tool as plugins. Figure fig:QtDesigner shows a Qt Designer session designing the interface for an audio application, which uses some CLAM data objects related widgets provided by CLAM as a Qt widgets plugin. Note that other CLAM data related widgets are available on the left panel list. For example to view spectral peaks, tonal descriptors or spectra.

fig:QtDesigner
Qt Designer tool editing the interface of an audio application.

Interface definitions are stored as XML files with the ``.ui'' extension. Ui files can be rendered as source code or directly loaded by the application at run-time. Applications may, also, discover the structure of a run-time instantiated user interface by using introspection capabilities.

fig:NetworkEditor
NetworkEditor is the visual builder of the CLAM framework. It can be used not only as an interactive multimedia data-flow application but also to build networks that can be run as stand-alone applications embedded in other applications and plugins.

Analogously, CLAM Network Editor (figure fig:NetworkEditor) allows to visually combine several processing modules into a processing network definition. The set of processing modules in the CLAM framework is also extensible with plugin libraries. Processing network definitions can be stored as XML files that can be loaded later by applications in run-time. And, finally the CLAM framework also provides introspection so a loader application may discover the structure of a run-time loaded network.

Run-time engine

If only a data-flow visual tool and a visual interface designer was provided, some programming would still be required to glue it all together and launch the application. The purpose of the run-time engine, which is called Prototyper in our implementation, is to automatically provide this glue. Next, we enumerate the problems that the run-time engine faces and how it solves them.

Dynamic building

Both component structures, the audio processing network and the user interface, have to be built up dynamically in run-time from an XML definition. The complexity to be addressed is how to do such task when the elements of such structure are not known before hand because they are defined by add-on plugins.

Both CLAM and Qt frameworks provide object factories that can build objects given a type identifier. Because we want interface and processing components to be expandable, factories should be able to incorporate new objects defined by plugin libraries. To enable the creation of a certain type of object, the class provider must register a creator on the factory at plugin initialization.

In order to build up the components into an structure, both frameworks provide means for reflection so the builder can discover the properties and structure of unknown objects. For instance, in the case of processing elements, the builder can browse the ports, the controls, and the configuration parameters using a generic interface, and it can guess the type compatibility of a given pair of ports or controls.

Relating processing and user interface

The run-time engine must relate components of both structures. For example, the spectrum view on the Transposition application (second panel on figure fig:SMSTransposition) needs to periodically access spectrum data flowing by a given port of the processing network. The run-time engine first has to identify which components, are connected. Then decide whether the connection is feasible. For example, spectrum data can not be viewed by an spectral peaks view. And then, perform the connection, all that without the run-time engine knowing anything about spectra and spectral peaks.

The proposed architecture uses properties such the component name to relate components on each side. Then components are located by using introspection capabilities on each side framework.

Once located, the run-time engine must assure that the components are compatible and connect them. The run-time engine is not aware of the types of data that connected objects will handle, we deal that by applying the {\sf Typed Connections} design pattern mentioned in section patterns. In a nutshell, this design pattern allows to establish a type dependant connection construct between two components without the connector maker knowing the types and still be type safe. This is done by dynamically check the handled type on connection time, and once the type is checked both sides are connected using statically type checked mechanisms which will do optimal communication on run-time.

Thread safe communication in real-time

One of the main issues that typically need extra effort while programming is multi-threading. It is hard to program and to debug. In real-time audio applications based on a data flow graph, the processing core is executed in a high priority thread while the rest of the application is executed in a normal priority one following the {\sf Out-of-band and In-band partition} pattern [ManolescuDataflowPatterns]. Being in different threads, safe communication is needed, but traditional mechanisms for concurrent access are blocking and the processing thread can not be blocked. Lock-free structures are overkill as conditions are loose: The reading thread (visualization) may block and even lose tokens. Indeed the refresh rate of the screen is orders of magnitude greater than most token rates. Thus, new solutions, as the one proposed by the {\sf Port Monitor} pattern in section patterns, are needed.

A Port Monitor is a special kind of processing component which does double buffering of an input data and offers a thread safe data source interface for the visualization widgets. A flag tells which is the read and the write buffer. The processing thread does a try lock to switch the writing buffer. The visualization thread will block the flag when accessing the data but as the processing thread just does a `try lock', so it will just overwrite the same buffer but it won't block fulfilling the real-time requirements of the processing thread.

Component extensibility

The architecture provides means to extend its capabilities by adding plug-ins. Such extensions cover different aspects of the architecture: Processing modules, widgets, extended data types, binders...

The shared mechanism of those plug-in is that they populate a singleton structure such a factory or a meta-data dictionary by creating at global scope object whose constructor does the population. As library plug-ins are loaded by CLAM all those objects are created and their constructor register their load onto the singleton. We use a C++ idiom that ensures that the singleton is created before any use [Alexandrescu].

Besides processing components and widgets there are other plug-ins which are worth to explain in detail.

Binding Plug-ins

Binding plug-ins are used by the run-time engine to locate special widgets that need to be related somehow with the network. `Special' means any criteria on type, name, properties or any other aspect accessible with Qt introspection capabilities. Binding plug-ins are the ones that bind port monitors and control senders, but also transport buttons, back-end and playback indicators...

Type Plug-ins

The {\sc Typed Connections} pattern does not require to register any type. If two ports share the token type they can be connected. Anyway type plug-ins can be defined to increase the services for that type. Currently, such services just include port coloring.

System back-ends

Real-time audio applications may work in a heterogeneous set of contexts: System interfaces to the audio devices such as ALSA, OSS, CoreAudio, WMME, DirectSound or ASIO, wiring API's such as JACK, or plugins systems such as LADSPA, VST, VSTi, DSSI, Audio Units... Audio applications have to perform a set of tasks that tightly depend on such a context. Managing threads, providing callbacks, exploring devices, or feeding data from and to the application.

The architecture solves that complexity by enabling back-end plugins. Back-end plugins encapsulate context dependant tasks into interchangeable objects, so that by selecting a different back-end plugin, the application can be run in a different context. This also has the side effect that enables the extension of the architecture to future execution contexts.

Thus, back-end plugins address the often complex back-end setup, relate and feed sources and sinks in a network with real system sources and sinks, control processing thread and provide any required callback. Such plugins, hide all that complexity with a simple interface with operations such as setting up the back-end, binding a network, start and stop the processing, and release the back-end.

Some important aspects of the back-end plugin system should transcend to the user interface to provide a set of functionalities: choosing the back-end among the available ones, choosing the device binding to each source and sink, changing the back-end status (playing, stopped, paused...), and displaying back-end information, such the status, error conditions... The architecture also defines graphical elements that perform such function, which are connected upon binding.

Reusing components in non-real-time applications

Section sec:RealTimeAudioApplications, defined what we consider a real-time applications which are the ones that can be modeled with the visual prototyping architecture. This disables the application of the visual prototyping to applications featuring:

Anyway, components developed in the present framework can be helpful to develop applications that fall outside of those limits. For example, being the sinks and sources a plug-in system the full application as is, can be used within a more complex host applications, as in the case of an audio authoring tool or a DAW system.

Also, if a regular toolkit is used for the user interface, such interface definition can be used and dynamically extended in a more complex application which introduces more application logic. All the means to bind the user interface and the processing algorithm in a transparent way are also available when programming.

Also, in-memory or file representation could serve as streaming source or sink for an streaming process, including the data source for instant views.

And finally the streaming processing components can be building blocks of more complex processing patterns. Section ExecutionModes explained a way of building a non-streaming processing by communicating summary computations among two different streaming processes. And streaming and non-streaming processes could be the core process scheduled for multiple audio item processing.

Examples of some of those adaptations can be found in some use cases explained in chapter chap:Evaluation.