Master Thesis: Master Thesis Proposal: Rapid prototyping for audio applications and plugins

Abstract: The goal of the proposed master thesis is to define an architecture that enables visual prototyping of real-time audio applications and plugins. Visual prototyping means that a developer can build a working application, including user interface and processing core, just by assembling elements together and changing their properties in a visual way. Specifically, this research will address the problem of having to bind interactive user interface to a real-time processing core, when both are defined dynamically, the set of components is extensible, it allows bidirectional communication of arbitrary types of data between the interface and the processing core, and, it still fulfils the real-time requirements of audio applications.

Keywords: Music and audio software engineering, Rapid application prototyping, Real-time data flow, User interface

Index

Introduction
Background

Frameworks and Visual builders
Data flow languages
Visual user interfaces builders
The problem of connecting both sides

Proposed work

Goals and scope
Methodology

Expected outcomes

Introduction

In the music and audio related research and industry, there is a long way from the conception of a novel processing algorithm until it reaches the market. Reducing the time of the overall process, the `Time to market'[TimeToMarket], gives a clear advantage over competitors[NewProductDevelopment]. This is true for the traditional market, but in the context of the fast paced technology market, this is even more vital for success.

Having a proper development environment is something that may increase development productivity and thus, reduce the time to market[ReducingTimeToMarketWithPrototyping]. Development frameworks offer system models that enables system development dealing with concepts of the target domain. Eventually, they provide visual building tools which also boost the development productivity. In the audio and music domain, the approach of modeling systems using visual data-flow tools has been widely and successfully used in system such as PD [PuckettePD96], Marsyas [TzanetakisMarsyasBook], Open Sound World [ChaudrayOSW] and CLAM [www-CLAM]. But, such environments are used to build just processing algorithms, not full applications ready for the public. A full application would need further development work addressing the user interface and the application work-flow.

User interface design is supported by existing toolboxes and visual prototyping tools for interfaces which also gives a similar flexibility than the one data-flow tools provide to build the processing core. Examples of such environment which are freely available are Qt Designer [QTProgramming], Fltk Fluid [www-FLTK] or Gtk's Glade [www-Glade]. But such tools just solve the composition of graphical components into a layout and limited reactivity. They still do not address a lot of low level programming that is needed to solve the typical problems that an audio application has. Those problems are mostly related to the communication between the processing core and the user interface.

The proposed research addresses this gap. Its goal is to define an architecture that helps to efficiently develop full standalone audio and music applications.

This paper first will describe the background needed to understand the research area, then, it will describe the proposed work, defining the scope, and the methodology to evaluate the results, and, finally, it describes the expected outcomes of the research.

Background

Frameworks and Visual builders

Tools are one of the factors of the development process that can be modified to get an impact on its efficiency. Frameworks are very valuable tools to consider since they let you reuse both design and code.

A framework can be defined as a reusable design of all or part of a software system described by a set of abstract classes and the way instances of those classes collaborate. Roberts and Johnson [EvolvingFrameworksRobertsJohnson] explain the evolution patterns of a development framework.

According to them, frameworks should follow certain evolution which involves incremental abstraction and refinement. The first stages of a framework should consists in very few and simple abstractions from several existing applications. On the last stages, the abstraction would be so high that the developer would be able to build a system just by using a visual builder.

A good visual builder should allow a domain expert with no programming knowledge to build a system just by drawing the components together using the domain conventions.

Data flow languages

Data-flow models have a long tradition on system engineering. Signal processing area does an extensive use of them. Amatriain [AmatriainThesis] described a Metamodel for Multimedia Systems (4MS), an object oriented model to model audio and music data flow systems. Arumi [ArumiDea] compiled a set of design patterns that addressed several design challenges one can find when trying to implement data-flow systems on the audio domain.

Visual builders that follow the data-flow paradigm are often called data-flow languages. Several of such data-flow languages exists for the audio and music domain. Beside being close to signal processing experts domain, data flow languages has more advantages. Firstly, being visual languages, a developer can get, at a glance, insight of the structure of the system. Data-flow languages also make more difficult to generate syntactically badly built systems. The language that such syntax generates is large enough to express a wide set of systems. Large interconnected systems are hard to understand visually, but the black box idea enables grouping a subset of interconnected subsystem as a subsystem itself and thus the implementation can be more scalable. And last but not least, having such defined interfaces between subsystems, eases to reuse them in a different system.

Visual user interfaces builders

So, data-flow is successful on providing a design language for application processing algorithms. What about building products up to the public? Commonly, audio and music products need a user interface to give the user control over the application and to provide feedback on what's happening on the system. So, that is a two fold function: control and visualization.

Often data-flow prototyping tools offer integrated controls and visualizations to plug into the data-flow. So, you might consider releasing the data-flow prototyping tool as the product. But, that will blur the functional intent of your product. Although this kind of interface could be perfectly suited for power users, it gives too much access to the inners of your product: User interface elements for data-flow building are adding noise to the user interface elements that the user is intended to use, that is the control and visualization user interface.

A proper user interface can be prototyped visually. In fact, user interface domain was one of the first domains to be provided of visual builders [PastPresentFutureOfUISoftwareTools]. Visual interface building consists on visually setting the layout of the set graphical interface elements and setting their static properties. Some limited dynamic behaviour can be specified by using an event language [GreenEventLanguages]

This kind of prototyping shares a lot of the advantages with the data-flow based prototyping for the processing core but for the user interface domain. The resulting system is also a visual combination of the domain entities, which can be extended by the developer.

But visual user interface builder does not solve the full application building. It just solves the layout of graphical elements, their static properties and some responses to events that can be solved within the interface. Application logic is to be implemented by hand using the low level language the prototyping tool translates the prototype into.

The problem of connecting both sides

The first issue to solve is how to find which elements of two independently and dynamically created structures should be connected. One structure contains a set of processing elements which are created dynamically in run-time. The other structure contains a set of widgets which are also built dynamically. When programming directly, those elements are concrete elements and it is just a matter of taking them, but here the set of elements is not known by the prototyping tool.

The next issue to solve is how the binding itself is done. This is not trivial since both the set of processing components and the set of widgets may be extended and the data they exchange may not be known.

One of the main issues that are typically need extra programming is multi-threading. In real-time audio applications, the processing core is executed in a high priority thread while the rest of the application is executed in a normal priority one following the {\sf Out-of-band and In-band partition} pattern [ManolescuDataflowPatterns]. Being in different threads, safe communication is needed, but traditional mechanisms for concurrent access are blocking and the processing thread can not be blocked. Thus, new solutions, as the one proposed by the Port Monitor pattern in [Plop06], are needed.

Regarding the application logic, some work is needed to connect the algorithm to the system data sink and sources and letting the user decide how this is done. That is connecting the inputs and outputs to the audio card, the plugin system

Proposed work

Goals and scope

The proposed work consists in defining an architecture that helps to efficiently develop standalone audio and music technology based applications. This needs some detail.

For the scope of the thesis we limit the set of applications to real-time processing applications, which has a simple application logic. That is, just the application logic needed to start and stop a processing algorithm, to configure it, to connect it to the system streams (audio, MIDI, files, OSC, plugin system...), to visualize the inner processing data and to control it while running.

Given those limitations, the architecture to define would not claim to build every kind of audio application. For example, audio authoring tools, which have a more complex application logic, would be out of the scope, although the architecture would help to build important parts of such applications. The architecture just will claim building applications such as synthesizers, real-time music analyzers or audio effects.

`Efficient develoment' means that the developer should address just the programming of novel components, and, once all the components are available, the full application can be build by using the visual builders.

The architecture will provide the following features:

Communication of any kind of data and control objects between GUI and processing core (not just audio buffers)
The prototype could be embedded in a wider application with a minimal effort
Plugin extensibility for:

Processing units
Graphical elements for data visualization and edition
System connectivity backends (ASIO, JACK, OSC, VST, LADSPA...)

Methodology

Evaluation is a tricky problem in Software Engineering. An ideal environment would be having the same system developed in different ways just changing the aspects of the process to be evaluated and comparing how each aspect affects the development efficiency. This is not a viable evaluation method because building complex systems, is expensive, and, even in this ideal environment, we are not taking into account human factors that would make two identical experiments differ. The human factor forces us to use an statistical approach and, thus, more cases are needed to evaluation.

So, the classical approach is to analyze the development process of existing real projects. This approach is very limited in the world of the proprietary software. The set projects a researcher is able to analyze tends to be very limited and biased. Fortunately, the large availability of open source projects and the visibility they offer to their development process give us a chance to obtain more significant metrics[GrexEmpiricalSWEngineering].

The proposed evaluation method is using the architecture to build several existing open source audio and music applications and compare the programming effort to the original one. But in order to rely in such evaluation, there are some aspects that must be considered carefully.

Most of the expected efficiency boosts rely on components reuse. Of course, reuse is viable when the component already exists. So we should provide a criteria to estimate the likelihood for a given component to be already present and evaluate the development cost of such component consistently.

Another aspect to consider is the fact that the new implementation would not need the exploration process for sure the first implementation had. So, both development processes won't be comparable. The solution for this issue can be either considering metrics that evaluates just the final product, or trying to reproduce the exploration process, which can be also valuable.

So, the proposed methodology is to iterate through the following steps

Analyze the development process of a set of existing open source audio applications.
Adjust the definition of the architecture so that would support their construction.
Implement such architecture.
Implement the analyzed audio application by using the architecture.
Compare the development process and extract conclusions.

Expected outcomes

The proposed work would lead to the following outcomes:

The definition of an architecture that would allow to develop real-time audio and music processing applications.
A concrete implementation of such architecture within the CLAM framework.
An implementation of audio related widgets for the Qt framework that could be reused.
Re-implementations of existing open source audio applications by using the implemented architecture.
An quantitative analysis of the development process of a set of existing open source audio software contrasted with the development process that uses the architecture.