The DataTime
Processing Framework (DTPF) is a C++ framework facilitating the
creation of time-based data processing systems. While applicable to a
wide range of systems (audio/visual processing, sensory data
acquisition, digital control systems, etc.), the immediate intent is to
support the creation of computational models of sensory processing. For
example, models of audio-visual sensory integration in developing human
infants. Such models tend to be modular, and studies involving them
typically examine a set of similar models with variations or extensions
to a basic form, deal with large data rates (e.g., audio/visual data)
and use computationally expensive algorithms that can benefit from
hardware parallelization. DTPF has the primary goal of making it easier
for various programmers
and researchers to create sensory models in a modular manner. The first
phase will involve implementing the core framework for off-line (as
opposed to real-time) processing. This phase will also not emphasize
graphical user interface (GUI) aspects, though it will provide features
for later GUI integration.
Motivation
The initial purpose of this project is to provide a
well-engineered foundation for creating computational models of
audio-visual
sensory
integration and attention in human infants. This specifically includes
Epigentic
Sensory Models of Attention (ESMA) which are composed of distinct
components
performing specific functions, and share a general 'pipe and filter'
structure
with many other computational models which operate on sensory data. As
it
is the nature of these models to be highly modularized there is much
potential
for reuse of common modules in different experiments and models.
Modularization
and reuse also directly facilitates the style in which the models are
used
as studies involving them typically examine a set of similar models
with
variations or extensions to a basic form. These models also work with
high
rates of data, performing more complex processing than traditional
multimedia
applications (e.g., audio-visual mutual information calculation,
Hershey
& Movellan, 2000).
Even
for non real-time situations, the computationally expensive algorithms
used
can benefit from hardware parallelization. Although the
focus
here is on sensory models that can run without real-time constraints,
we
are beginning to work with hardware devices as well (e.g., robotic
pan-tilt
cameras,
SoDiBot).
Introducing soft real-time requirements (soft real-time
applications
such as video players can tolerate some indeterminacy in timing, hard
real-time
systems such as aircraft control in general do not have that luxury)
further
enhances the importance of being able to harness hardware and software
parallelism.
In previous work we have been developing customized
software programs (e.g.,
SoundStream,
SenseStream;
for applications of SenseStream see: Prince & Hollich, in press;
Prince, et.al., 2004).
While this can reduce the perceived initial amount of time and effort
required
to get a particular program up and running, it leads to more difficulty
in extending program functionality, maintenance problems, lower code
reusability
and many other software engineering evils. In short, while the
'one-shot'
approach can be seen as providing short term gains, any real gains
occur at the expense
of long term flexibility. Many of these issues were encountered in the
development and modification of SenseStream. SenseStream dealt with
many of the same processing issues as SoundStream did, but due to the
way these programs were designed and implemented, code from the earlier
SoundStream project found no reuse in SenseStream. The customized
nature of SenseStream has also made subsequent modifications more
difficult. One such addition to the SenseStream program was calculation
of Mel Frequency Cepstral Coefficients (MFCC) from audio data which
involved modifications to the user interface, configuration, audio
processing and mutual information calculation program code. Another
modification that has proved more intrusive is the ongoing integration
of the SoDiBot data acquisition. This has involved circumventing the
original audio and video input (which read data from a file), as well
as modification of SenseStream and the SoDiBot controlling software in
order to communicate data and perform synchronization. A more detailed
discussion of the structure of SenseStream, changes that have been made
to it, and how the DTPF will be used is available
here.
The modifications to SenseStream could have been
made considerably easier if a framework that enabled and encouraged
modularization and reuse had been used for the SenseStream program.
Development could have been
more focused on the processing problems addressed rather than having do
deal with more mundane issues of configuration, data communication and
synchronization. The program code itself could have also been more
directly
applicable to future projects such as the sensory models mentioned
above.
In addition to these software engineering issues, as
our work has progressed we have begun
collaborating with a number of psychologists who use computers running
Macintosh OSX almost exclusively. While our group has some access to
OSX computers there are a larger number of Solaris and Linux computers
available at our university. A solution supporting OSX while allowing
for some level of program code portability between different platforms
is desirable. However, the focus of this initial stage is on
flexibility; portability is a secondary objective.
These factors motivate the creation of a more
generalized framework for modular processing of time-based data,
capable of exploiting parallelism by concurrent execution of modules
across multiple processors and multiple computers, while allowing
multiple operating system platforms to be exploited. Such a framework,
if
well designed, could also be applied to the more traditional multimedia
domain or general problems involving processing of time-based data.
Stakeholders
The stakeholders involved with DTPF can roughly be
categorized as follows:
- End-users: The envisioned users of
DTPF are psychologists or other researchers conducting simulations and
experiments with sensory models. The non-programmer's interaction with
DTPF will of course be through end-user software. End-users are
considered stakeholders here to the extent that the framework must
support applications which meet their needs.
- Programmers: Programmers implementing model components or
applications
will interact with the framework through its application programming
interface (API) as well as end-user software.
- Implementors: Framework developers (who could be
considered application developers or end-users as well) will be
involved with implementing internals of the framework in addition to
making use of the API and end-user software.
The skills
and knowledge of these stakeholders can vary considerably, from student
programmers early in their undergraduate careers to professional
researchers with extensive to non-existent programming background.
Experience with different OS platforms and software packages can be
expected to show similar variation.
Risks
Several risk factors present themselves concerning
the creation of the DTPF. Planning and implementation of the first
phase discussed here entails considerable effort, potentially on the
order of one person-year. Limiting the scope of this first phase and
employing appropriate software engineering principles could reduce the
effort required. There is also the risk that the implemented DTPF will
not adequately meet our stakeholder's needs: programmers might find it
difficult to develop models using the API provided; end-users might
find the resulting end-user software too complicated or incomplete for
their purposes. This could be due, among other reasons, to missing,
incomplete or incorrect functionality that hinders the development of
sensory models or end-user software with the DTPF.
Considering these
risks, the potential long-term benefits of creating the DTPF are still
attractive. Exposure to these risks can be reduced by keeping them in
mind through the stages of planning and implementation.
Existing Software
While this project intends to create a new
framework, considering existing software is important. Existent
software can reveal successful design approaches and desirable
functionality, as well as drawbacks and potential pitfalls. We are also
interested in software toolkits that may be useful in creating the DTPF
and facilitating future projects. The discussion of this software is
available
here.
The first phase of
DTPF shall run on single desktop/laptop computers running Macintosh OS
X with necessary
support software installed.
The requirements
presented here are quite dependent on the architectural model (an
architectural model
gives a top-level view of the organization of a system, see Figure 1),
and
in
some cases even imply specific implementation. While this would be
inappropriate for some systems, specifying requirements of a framework
requires some reference to architectural structure.
Avoiding this would make clear specification of requirements much more
difficult. In short, for a framework we feel it is valid to require a
certain architectural model.
|
Figure 1: A top-level view of
the DTPF
architectural model. Note that the connections shown between nodes are
illustrative of a hypothetical situation. In general any node can be
connected to any other.
|
The architectural model adopted is roughly that of
the BeOS MediaKit. Processing is organized into
nodes which can
be connected together to send processing data from one to another.
Processing data is stored in
buffers and this
data is transfered by sending references to buffers from a producer
node to a consumer node. A
roster maintains
records of
various
resources, and controls certain shared resources. A client interface
allows
program code (node implementations or client applications for example)
to
communicate with the roster through service requests. A
plug-in loader
is responsible for loading plug-in nodes that can be specified
dynamically at run-time.
Many of the concepts presented here are related in
one way or another. As a result of this the current document contains
some redundancy between sections (particularly section IV.G discussing
the
roster). This will be resolved as this document progresses.