This thesis addresses the general problem of how to efficiently develop audio and music software. This work proposes an architecture that enables visual prototyping of real-time audio applications. We analyze implementation issues that developers have to deal when developing different archetypes of audio applications, such as user interface communication, multi-threading or real-time restrictions, and the means that software engineering provides to handle such complexities.
This chapter describes the context of this work, including its motivation, the research context, a summary of the work, and a description of the content of this thesis.
I've been working on the Music Technology Group of the Universitat Pompeu Fabra for six years. That dynamic research group leaded by Xavier Serra is involved in the development of innovative audio and music technology. One of my main function there was providing software engineering support to other researchers, by porting, testing and optimizing code, integrating systems and developing research prototypes. Those tasks provided me the opportunity to be involved in the development of very different kinds of audio applications: synthesizers, audio effects, authoring tools, music information retrieval systems, plug-ins, web applications...
One of the main outputs of that activity has been the CLAM framework, co-developed with Xavier Amatriain, Pau Arumí and others. The original intent of the framework was to join development efforts among different teams within the group. Every team was implementing similar algorithms and utilities and the framework was a common place to share them. This way, testing and porting would be centralized and integration would be easier. But we paid our initial design inexperience. The framework was powerful enough to build the diverse set of applications we required to but it was hardly usable by non-experts. Moving existing code to the framework was a hard task.
Thus, over the last years, we focused on simplifying the framework usage and providing tools to easy develop with it. In this context, the work presented in this thesis represents an step beyond on what similar frameworks are able to do: An architecture to visually build full real-time audio applications.
There is a long way from the conception of a novel audio processing algorithm until it becomes an end-user product. In a very rough generalization we can say that the typical process has two development levels. On the first level, several versions of the processing algorithm are evaluated and their parameters are tunned in order to get optimal results. Because this stage requires flexibility, processing algorithms researchers often use scripting languages such as Matlab. In a later stage, algorithms are integrated into a end-user product. Here computational performance is often a requirement and the algorithm is usually ported to a low level compiled language such as C or C++. This language port carries not just costs and time, but also risks due to translation errors or to the evolution divergence, that usually happens after the translation, between the product and the research versions of the algorithm. Another factor that make this two stage development hard is the fact that an audio end-user product must address a lot of low level platform issues that can be out of the scope of an algorithm demonstrator. In order to have a quality end-user product, several refinement iterations are needed on the interface, obtaining user feedback. This not only makes the process longer but also leads to algorithm modifications not foreseen during the research level. This research proposes means to reduce the gap between the algorithm conception and a working end-user application.
Both industry and academy can get the benefits of such research. In the industry, reducing the time of the overall process, the so called `time to market'[TimeToMarket], gives a clear advantage over competitors[NewProductDevelopment]. This fact, that is already true for the traditional market, becomes even more vital for success in the context of the fast paced technology market. Academy could also benefit of having end-user applications of their technology. When conducing user experiments, a end-user application is more comfortable to use for the subjects of the experiment. End-user application is also a good technology demonstrator. And working toward end-user application eases the technology transfer to the society.
So, how to address this problem? Proper development environment may increase development productivity and thus, reduce the time to market[ReducingTimeToMarketWithPrototyping]. Development frameworks offer system models that enables system development dealing with concepts of the target domain. Eventually, frameworks provide visual building tools which also boost the development productivity [green96usability]. In the audio and music domain, the approach of modeling systems using visual data-flow tools has been widely and successfully used in frameworks such as PD [PuckettePD96], Marsyas [TzanetakisMarsyasBook], Open Sound World [ChaudrayOSW] and CLAM [www-CLAM].
But, such environments are used to build just processing algorithms, not full applications ready for the public. A full application would need further development work addressing the user interface and the application work-flow. User interface design is supported by existing toolboxes and visual interface builders which gives a similar flexibility for the user interface than the one data-flow tools provide to build the processing core. Examples of such environment which are freely available are Qt Designer [QTProgramming], Fltk Fluid [www-FLTK] or Gtk's Glade [www-Glade]. But such tools just solve the composition of graphical components into a layout and limited reactivity. They still do not address a lot of low level programming that is needed to solve the typical problems that an audio application mostly related to the communication between the processing core and the user interface.
The proposed research is to define an architecture that provides the logic to bind a data-flow definition built with a visual data-flow editor to a user interface defined with a visual GUI builder in order to build a full featured real-time audio applications.
Challenges to be addressed are two fold. On one hand, the architecture should solve all the programming issues that developers of real-time audio applications currently have to face, and do it in a transparent and generalized way. On the other hand we should face new challenges introduced by the prototyping architecture itself.
Real-time audio programming issues are discussed in detail in chapter chap:AudioApplications. Examples of issues to be solved are multi-platform audio devices access, communication between real-time and non-real-time threads, latency reduction, jitter handling...
Some of the issues that the prototyping architecture introduces are related to the fact that we need to locate and bind unknown elements of two two dynamically created structures. When programming an audio applications the developer has direct access to the objects to relate and their interface. The prototyping infrastructure should discover which elements are meant to be related. Moreover, in order to allow extensibility, the architecture should not limit the kind of elements to deal in both the processing and the user interface sides.
The proposed architecture provides the following features:
The set of applications we want to support are real-time processing applications which has a simple application logic. That is, just the application logic needed for starting and stopping a processing algorithm, configuring it, connecting it to the system streams (audio, MIDI, files, OSC, plug-in system...), visualizing the inner processing data and controlling parameters while running.
Given that limitation, the architecture to define would not claim to build every kind of audio application. For example, audio authoring tools, which have a more complex application logic, would be out of the scope, although the architecture would help to build important parts of such applications and the work on this thesis should help to define abstractions that would help to develop visual frameworks in the future. The architecture just will claim building applications such as synthesizers, real-time music analyzers or audio effects.
Also by `visual prototyping' we are not referring to a complete visual language that could allow build real-time system without programming. We just meant that the developer should address just the novel processing and interface components. Once all components are available, the full application can be built with visual builders.
We present here an overview of the goals of this thesis:
An actual implementation of the architecture is needed to be able to evaluate its feasibility and usefulness. As most authors in recent software engineering literature suggest [BeckXP][EvolvingFrameworksRobertsJohnson], early generalizations may lead to over-design. They recommend iterative work on the implementation to get to a proper generalization. Thus, the work presented in this thesis is the result of a iterative process of refining by considering different use cases and addressing different features in a incremental pace.
In this iterative process, several parallel activities have taken place (see figure CLAMDevelopmentProcess). While some activities sought the goal of having a more usable framework, others dealt with coming up with the appropriate abstractions and reusable constructs that can be reused beyond the framework. Two of such abstractions are an object oriented meta-model for multimedia processing, described in section 4mps and more deeply in [AmatriainThesis], and a pattern language for data-flow systems, described in section patterns and more deeply in [ArumiPlop06] and [ArumiDea].
Evaluation is a tricky problem in Software Engineering. An ideal environment would be having the same system developed in different ways just changing the aspects of the process to be evaluated and comparing how each aspect affects the development efficiency. That evaluation method is not viable because building complex systems is expensive, and, even in this ideal environment, we are not taking into account human factors that would make two identical experiments differ. Human factors forces us to use an statistical approach and, thus, evaluation would require more cases.
Because the clean room approach is not viable, the classical approach is to analyze the development process of existing real projects. This approach is very limited within the world of the proprietary software. The set projects a researcher is able to analyze tends to be very limited due to corporate and organizational boundaries and confidentiality requirements. Reproducing results is even harder as it requires peer researchers to setup a similar set of data. Moreover, accessible data on the development of such projects has a high risk of bias as actors of the process tend to hide things that does not work well. Fortunately, the large availability of open source projects and the visibility they offer to their development process give us a chance to obtain more significant metrics. Robles[GrexEmpiricalSWEngineering] addresses different ways of exploiting such data sources in order to reach insightful conclusions about the development and evolution of existing free and open source software.
The proposed evaluation method is to use the architecture to build several audio and music applications. Some of them from scratch and some of them reimplementations of existing open source software. Then we'll evaluate the effectiveness of the architecture either by comparing the programming effort to the original one when available, or applying some systematic qualitative criteria which are given along the chapter sec:ApplicationDevelopment.
But in order to rely in such evaluation, there are some aspects that must be considered carefully. Most of the expected efficiency boosts rely on components reuse. Of course, reuse is viable when the component already exists. So we should provide a criteria to estimate the likelihood for a given component to be already present and evaluate the development cost of such component consistently.
Another aspect to consider is the fact that a reimplementation would not need the exploration process the first implementation had. So, both development processes won't be comparable. The solution for this issue can be either considering metrics that evaluates just the final artifacts, or trying to reproduce the exploration process, which can be also valuable.
In summary, the proposed methodology is to iterate through the following steps
The proposed work would lead to the following outcomes:
This chapter has introduced the context of this work and it has set its goals and methodologies.
Chapter 2 describes related work about tools and methods to make the software development more efficient. It also explains how other authors have faced the the specific issues of audio software engineering and some of the tools that are available for such domain.
Chapter 3 does a domain analysis on the audio applications family of systems. The goal is to obtain a set of abstraction and related engineering concerns to be applied to analyse the engineering needs of a given audio application.
Chapter 4 describes the prototyping architecture at different levels of detail.
Chapter 5 evaluates the architecture by analyzing how it performs in several real use cases.
Finally, Chapter 6 includes conclusions, the main contributions and further perspectives of research.