Master Thesis: Visual Prototyping of Audio Applications

Related Work: Application Development

Index

This chapter reviews in the state of the art of application development and more concretely in audio application development. First, we review several general approaches that enable the efficient development of applications such as the use of frameworks, visual languages or domain specific languages. Then we introduce two concrete technologies such as data-flow languages to build audio processing applications, and visual GUI builders. We also review the state of art in the evaluation of such tools. Finally we review literature about specific engineering concerns and existing development environments for the audio domain.

The evolution of frameworks

Tools are one of the factors of the development process that can be modified to get an impact on its efficiency. Frameworks are very valuable tools to consider since they let you reuse both design and code. Roberts and Foote [JohnsonReusableClasses] define a framework as ``a set of classes that embodies an abstract design for solutions to a family of problems''. Other definition of framework, which emphasizes more the means than the goal is ``a reusable design of all or part of a software system described by a set of abstract classes and the way instances of those classes interact''[Johnson-ACMCommunications1997]. The later definition spots the fact that frameworks are not just a collection of classes but also a design on how they collaborate.

The history of software frameworks is very much related to the evolution of the multimedia field itself. Many of the most successful and well-known examples of software frameworks deal with graphics, image or multimedia. The first object-oriented frameworks to be considered as such are the MVC (Model View Controller) for Smalltalk [MVC] and the MacApp for Apple applications[MacApp]. Other important frameworks from this initial phase were ET++[WeinandET++89] and Interviews. Most of these seminal frameworks were related to graphics or user interfaces.

Roberts and Johnson [EvolvingFrameworksRobertsJohnson] explain the evolution patterns of a development framework. According to them, frameworks should follow certain evolution which involves incremental abstraction and refinement. First stages of a framework should consist in very few and simple abstractions from several existing applications. On later stages, the abstraction would be so high that the developer would be able to build a system just by using a domain specific language or visual builder.

A good visual builder should allow a domain expert with no programming knowledge to build a system just by drawing the components together just by using graphical conventions of the domain.

Visual programming languages

A visual programming language (VPL), is a programming language which uses relations and placement of objects in a 2D screen to articulate the execution of programs as opposed to textual programming languages (TPL) which uses linear text streams. Some illustrative general purpose VPLs are ProGraph and LabView.

Often their superiority to text based programming languages is defended with arguments such as:

Pictures are superior to tests in a sense that they are abstract, instantly comprehensible and universal. [HakarawaIchikawa1994]

Some critical authors call those claims the superlativist hypothesis (VPL are better than TPL) and the accessibility hypothesis (information in a diagram is instantly comprehensible an universal). Menzies [MenziesSEKE2001] and others warn that those claims are based on intuition and lack of scientific criteria. Menzies reports that studies comparing visual programming languages with textual programming languages often reach contradictory conclusions in similar conditions.

An study of Green, Petre and Bellamy [GreenPetreBellamyESP1991] rejected both claims. They rejected the accessibility hypothesis by observing that novices found more troubles reading a visual program than experts. They also rejected the superlativist claim by demonstrating that for some tasks TPLs outperformed VPLs. To explain why VPLs perform differently for different programming problems, Green and Petre formulated the match-mismatch conjecture that states that the efficiency of a visual language depends on how well it maps the problem.

Moher et al [MoherMakBlumenthalLeventhal-ESP1993] go further by stating that the effectiveness of a language not only depends on the target program but also on which task is to be done with the program. An illustrative example is forward versus backward reasoning. Some notation could facilitate the forward reasoning, given some inputs which is the output, but it could make backward thinking hard, which are the inputs that gave that output. Backward reasoning is very useful for some programming tasks such as debugging.

In a later study, Green and Petre [green96usability] used a relation of different cognitive dimensions used to evaluate notations[Green-HCI1989] to evaluate visual and text languages. Such dimensions were:

The closeness of mapping is the distance between the domain and the programming worlds. The need of using programming specific entities, forces the programmer to think in programming terms instead of domain terms.

The viscosity is the effort that user has to do to effect a small change. It is normally measured as the number of primitive actions performed, but as it is task dependant, it is hard to evaluate unless there is a clear trend. They identify the layout as a source of viscosity in visual languages.

The hidden dependencies evaluates how a part of a program is affected by the changes of another in a way that is not obvious to the programmer. For example, function side effects. Green et al. notice that such dependencies are harder to highlight on textual languages but that they also can be found on visual languages.

The hard mental operations criteria evaluates how hard is to understand a combination of primitives. For example, the amount of work to understand a conditional expression could be different if we represent it as English text, as a truth table or as a logic gates circuit. Also the kind of reasoning to perform affects the amount of work.

The premature commitment criteria tells whether the language forces taking decisions before the required information is available. For example, visual languages tend to force the user to guess the layout of the final design.

The secondary notation criteria evaluates which are the alternative means that programmers can use to communicate aspects of the program that are not explicit on the language notation. For example, although they are not required to, textual programmers indent to denote structure, insert blank lines to group code or use naming conventions to denote aspects of the named entities. Green spots the opportunities of visual languages on adding such secondary notation but also notes that mastering secondary notations is hard to novices.

The visibility criteria evaluates which is the required cognitive work to make a required aspect of the program accessible. Total visibility is not convenient but accessing a non-visible aspect should not be hard.

The consistency criteria evaluates how easy is to infer one part of the language by knowing the other part. Consistency is a key aspect to facilitate learning.

The progressive evolution criteria evaluates whether the programming is able to test a part of the program before having it completed.

Of course, all those criteria are still subjective and very dependant on the task and on the programmer skills. But at least they provide a systematic and multi-faceted way of evaluating languages.

Previously cited works concluded that the effectiveness of visual languages is dependant on the task. Burnett et al. [BurnettEtalComputer1995] say that most visual languages, even being successful for the task, have to face what they called the scaling-up problem. As the problems to solve get bigger and more complex, several issues tend to appear when using visual languages. They identified such issues and collected common solutions when available. The issues were classified as representation issues or programming issues. Representation issues include the availability of static representation, the effective use of the screen, or the suitability of the documentation facilities. On the other hand, programming language issues include procedural abstraction, interactive visual data abstractions, type checking, persistence and efficiency.

On a later publication [Burnett-SEKE2001], Burnett describes how visual languages will require, or will enable, the development of new software engineering tools. She explicitly talks about new documentation and code comprehension tools, and new testing, debugging and reusing tools.

My own opinion is that, textual languages are often specifications which are implemented by several vendors who provide their own development environment. The development environment is not part of the language but affects a lot to its usability. Conversely, visual languages definition are often coupled to the tool, and most of the previous concerns about the usability of the language refer to such tool.

This coupling to the tool, leads us to a new drawback with visual languages. Languages coupled to a tool are often coupled to a single vendor. The only rare case is UML and related standardization method, but UML standard is not executable although some trends tend to use it as executable specification [StarrExecutableUML].

Domain specific languages

Domain-specific languages (DSL), as opposed to general purpose languages (GPL), are languages that are restricted to a concrete domain. Deursen et al define them with the following statement:

A domain-specific language (DSL) is a programming language or executable specification language that offers, through appropriate notations and abstractions, expressive power focused on, and usually restricted to, a particular problem domain. [DeursenKintVisser-SIGPLAN2000]

The actual promise of DSLs is the focused expressive power. They allow solutions expressed in the idiom and at the level of abstraction of the problem domain. They are not usually focused on execution, instead they tend to be declarative[LaddRamming-UVHLLS1994] on the facts of the domain. Some of the benefits of using DSLs are:

But there also some drawback on their usage such as:

According to Deursen and Klint [DeursenKint-LittleLanguages], the creation of a DSL typically involves the following steps:

The key aspect of the process is the analysis. Neighbors [Neighbors-IEEETSE1984] identifies the role of the domain analyst which is similar to the system analyst but instead of producing one single system is able to support the development of families of related systems. This require expertise having built several applications within the domain as a system analyst.

We can merge the idea of visual programming languages with the idea of domain specific languages. Domain-specific visual languages should be able to use visual domain specific notations to specify a family of programs. By addressing a concrete family of programs and by using domain-specific notations, the problem mismatch, introduced in section sec:VPL, could be reduced. Also by using different language to specify different aspects of a program could reduce the task dependant mismatch.

In the following sections we present two common visual approaches to address different aspects of an audio application. On one side, data-flow languages to address the processing aspect, and the user interface visual builder to produce user interfaces.

Data flow languages

Data-flow models have a long tradition on system engineering. Signal processing area does an extensive use of them.

Visual builders that follow the data-flow paradigm are often called data-flow languages. Several of such data-flow languages exists for the audio and music domain. Beside being close to signal processing experts domain, data flow languages has more advantages. Firstly, being visual languages, a developer can get, at a glance, insight of the structure of the system. Data-flow languages also make more difficult to generate syntactically badly built systems. The language that such syntax generates is large enough to express a wide set of systems. Large interconnected systems are hard to understand visually, but the black box idea enables grouping a subset of interconnected subsystem as a subsystem itself and thus the implementation can be more scalable. And last but not least, having strict interfaces between subsystems, eases to reuse them in a different system.

On the other hand, data-flow languages just describe the data dependencies. Procedural details of the modules and their semantics need to be indicated using a secondary notation such as different iconic representation, naming, port coloring...

Amatriain [AmatriainThesis] described a Metamodel for Multimedia Systems (4MS), an object oriented model to model audio and music data flow systems. Arumi [ArumiDea] compiled a set of design patterns that addressed several design challenges one can find when trying to implement data-flow systems on the audio domain.

Visual user interfaces builders

So, data-flow is successful on providing a design language for application processing algorithms. What about building products up to the public? Commonly, audio and music products need a user interface to give the user control over the application and to provide feedback on what's happening on the system. So, that is a two fold function: control and visualization.

Often data-flow prototyping tools offer integrated controls and visualizations to plug into the data-flow. So, you might consider releasing the data-flow prototyping tool as the product. But, that will blur the functional intent of your product. Although this kind of interface could be perfectly suited for power users, it gives too much access to the inners of your product: User interface elements for data-flow building are adding noise to the user interface elements that the user is intended to use, that is control and visualization user interface.

A proper user interface can be prototyped visually. In fact, user interface domain was one of the first domains to be provided of visual builders [PastPresentFutureOfUISoftwareTools]. Visual interface building consists on visually setting the layout of the set graphical interface elements and setting their static properties. Some limited dynamic behaviour can be specified by using an event language [GreenEventLanguages]

This kind of prototyping shares a lot of the advantages with the data-flow based prototyping for the processing core but for the user interface domain. The resulting system is also a visual combination of the domain entities, which can be extended by the developer.

But visual user interface builder does not solve the full application building. It just solves the layout of graphical elements, their static properties and some responses to events that can be solved within the interface. Application logic is to be implemented by hand using the low level language the prototyping tool translates the prototype into.

Evaluating tools

Most of the tools and techniques presented before promise some benefit on the development process. But in the past some promising tools did not succeed even they provided some clear benefit.

Myers et al. [PastPresentFutureOfUISoftwareTools] analyzes past trends in user interface tools and identifies the traits that made each trend successful or not. They state that such traits can be used as criteria to foresee whether a current or future tool is going to be successful.

One of the main criteria is the target problem. The key element to know if a tool is at least promising or not is to analyze whether the target problem is a key problem on development or not, and whether the tool address it thoroughly and effectively. But that's not the only criteria in order to make the tool successful.

Other two important and related criteria are the threshold and the ceiling of the tool. The threshold indicates how hard is to learn the tool. The ceiling indicates how far you can go with it. The ideal tool would have a low threshold and a high ceiling. But those two concepts are closely related. When designing a tool often happens that by raising the ceiling we are also raising the learning threshold and the other way around, when lowering the threshold we are reducing the ceiling. This happens in a natural way as the tool is likely to put more elements into the game in order to be able to model a wider range of applications.

Myers observes that a cost-benefit analysis is not enough to justify a high threshold. Most users will not get pass it. He also gives two means to get high ceilings without raising the threshold too much: One is offering a trap door, which does not affect the usability of the regular tool. The other is offering an smooth path which allows progressive raising of both the threshold and ceiling.

Other interesting criteria is the path of least resistance. This criteria tells that a tool has more chance to succeed when eases more the proper way of doing things than the dirty or unsuitable one. That applies to both the resulting product and the development artifacts. For example, toolkits made it easier to reuse than to build components from scratch and to have a consistent look and feel, visual builders made it easier to separate the application logic and the layout logic, and event languages made it easier to build mode free interface than modal ones.

The last but not least criteria is the moving target. Technologies evolve fast and tools are reactive, the problem first appears and then someone thinks on a tool to address it. Also mastering the tools is something that may take long, so the target problem could have loose its importance before a critical mass of developers effectively use the tool that address it.

They observe that, in the case of user interface tools, for some years there has been an anomaly in this criteria due to the standardization of the desktop user interface. This exceptional situation let the tools mature. They warn that this situations is likely to change in the following years with the appearance of the ubiquitous computing and recognition interfaces.

Audio Software Engineering

Developing audio and music software implies addressing some specific issues. Extensive literature exists which analyzes the different software engineering aspects of audio applications. This section does an overview on it.

Pennycook [Pennycook-ComputerSurveys1985] describes the challenges of developing interfaces for musicians as they must support creativeness instead of coercing it. He does a survey on several user interfaces for audio and music software. The reviewed software is now obsolete but some of the insights are still valid. He identifies several categories: composition and synthesis languages, graphic score editors, performance instruments, digital audio processing tools, and computer aided instruction on music systems.

In several papers, Dannenberg et al. [DannenbergATIIS1989] [DannenbergAuraICMC04] [DannenbergInstrumentAndPerformanceModels] [DannenbergCMU], analyzes software engineering concerns in real-time multimedia systems including the handling of incoming events, timing, low latency and other more general engineering concerns such as portability, reliability and ease of development.

Real-time systems are commonly regarded as the most complex form of computer program due to parallelism, the use of special purpose input/output devices, and the fact that time-dependent errors are hard to reproduce. [DannenbergATIIS1989].

Dannenberg notes that the application should not wait for input as time and data dependent computations must take place, so he proposes a event driven architecture which inverts the control flow: instead of the program asking for incoming events, the systems calls the program whenever an event comes. This introduces new concerns on preemption and multi-threading communication. He also spots the problem of memory management, as the costs of standard memory allocation and deallocation is not deterministic. He proposes preallocation of memory and an algorithm to handle such memory in real-time conditions.

Hardware abstraction and portability is other source of engineering issues. Bencina [BencinaPortAudio03] abstracts common services to be provided by audio devices under the PortAudio API. Scavone [RtAudioICMC02] offers similar services under a object oriented API.

Audio analysis software have different needs than real-time software. While not having to deal with real-time restrictions they have to deal with more complex processing flow which is harder to generalize. Tzanetakis and Cook [TzanetakisMarsyasBook] describe architectural needs of audio analysis applications for audio information retrieval (AIR) presenting a general architecture to fit such needs.

Development environments for the audio domain

This section will give a brief survey of existing frameworks and environments for audio processing. Most of these environments are extensively reviewed in [AmatriainThesis].

The current arena presents a heterogeneous collections of systems that range from simple libraries to full-fledged frameworks and development environments. Unfortunately, it is very difficult to have a complete picture of the existing environments in order to choose one or decide designing a new one.

In order to contextualize our survey we will start listing the relevant environments not only for audio but also for image and multimedia:

\begin{itemize} \item \emph{Multimedia Processing Environments}: Ptolemy [PtolemyOverview], BCMT [BerkeleyCMT97], MET++ [AckermannMET++Time], MFSM [FrancoisMed2000], VuSystem [VuSystem96], Javelina [HebertJavelina], VDSP [MellingerVirtualDSP] \item \emph{(Mainly) Audio Processing Environments}: CLAM [xamatICMC05], The Create Signal Library (CSL) [PopeCSL], Marsyas [TzanetakisMarsyas3D], STK [CookSTK96], Open Sound World (OSW) [ChaudrayOSW], Aura[DannenbergAuraICMC04], SndObj [LazzariniSndObjDafx01], FORMES [CointeFormesOOConcurrentProgramming], Siren [PopeSirenBookChapter], Kyma [ScalettiKymaOOPSLA], Max [PucketteMax2002], PD[PD] \item \emph{(Mainly) Visual Processing Environments}: Khoros-Cantata [Cantata] (now VisiQuest), TiViPE [TiViPe], NeatVision [neatvision], AVS [avs], FSF [FrancoisMed2001] \end{itemize}

If we now focus in the Audio field, we can further classify the environments according to their scope and main purpose as follows:

\begin{enumerate} \item \emph{Audio processing frameworks}: software frameworks that offer tools and practices that are particularized to the audio domain.

\begin{enumerate} \item \emph{Analysis Oriented}: Audio processing frameworks that focus on the extraction of data and descriptors from an input signal. Marsyas by G. Tzanetakis is probably the most important framework in this sub-category as it has been used extensively in several Music Information Retrieval systems [TzanetakisMarsyas3D]. \item Synthesis Oriented: Audio processing frameworks that focus on generating output audio from input control signals or scores. STK by P. Cook [CookSTK96] has already been in use for more than a decade and it is fairly complete and stable. \item General Purpose: These Audio processing frameworks offer tools both for analysis and synthesis. Out of the ones in this sub-category both SndObj [LazzariniSndObjDafx01] and CSL [PopeCSL] are in a similar position, having in any case some advantages and disadvantages. CLAM, the target framework of the experiments in this thesis, should be included in this sub-category. \end{enumerate} \item \emph{Music processing frameworks}: These are software frameworks that instead of focusing on signal-level processing applications they focus more on the manipulation of symbolic data related to music. Siren [PopeSirenBookChapter] is probably the most prominent example in this category. \item \emph{Audio and Music visual languages and applications}: Some environments base most of their tools around a graphical metaphor that they offer as an interface with the end user. In this section we include important examples such as the Max [PucketteMax2002] family or Kyma[ScalettiKymaOOPSLA]. \item \emph{Music languages}: In this category we find different languages that can be used to express musical information ( note that we have excluded those having a graphical metaphor, which are already in the previous one). Although several models of languages co-exist in this category, it is the Music-N family of languages the most important one.\emph{Music-N languages} languages base their proposal on the separation of musical information into statical information about \emph{instruments} and dynamic information about the \emph{score}, understanding this score as a sequence of time-ordered note events. Music-N languages are also based on the concept of \emph{unit generator}. The most important language in this category, because of its acceptance, use and importance, is Csound [VercoeCSound]. \end{enumerate}

Summary, current directions and hypothesis

In this chapter we reviewed existing literature in several areas. On one side we considered means to make the development of general applications more efficient. We observed that frameworks and domain-specific languages succeeded on addressing a concrete domain, while visual language just give some benefit if are suited to the target program and to the development task. We reviewed literature about two domain-specific visual languages: data-flow systems and visual interface builders. We saw that they performed their task perfectly but there is a gap between them that still is not covered by such tools.

A first hypothesis of this thesis is that we can formulate an architecture which fills the gap between both visual builders to build a full audio application without text programming.

On the other hand, we did a review on existing literature about the concrete engineering concerns of audio applications and how to address them.

A second hypothesis of this thesis is that such issues can be systematically addressed and in some cases automatically covered given a high level description of the application logic requirements.