DataTime Processing Framework
Project Requirements, Phase 1: Core Framework


Home
 |  Users
 |  Programmers  |  Project Developers
 |  SourceForge Project Page
 |  Resources


i. Contents

I.
Problem
II. Background
III. System Environment
IV. Functional Requirements
V. Non-functional Requirements
VI. References
VII.
Glossary


I. Problem

    The DataTime Processing Framework (DTPF) is a C++ framework facilitating the creation of time-based data processing systems. While applicable to a wide range of systems (audio/visual processing, sensory data acquisition, digital control systems, etc.), the immediate intent is to support the creation of computational models of sensory processing. For example, models of audio-visual sensory integration in developing human infants. Such models tend to be modular, and studies involving them typically examine a set of similar models with variations or extensions to a basic form, deal with large data rates (e.g., audio/visual data) and use computationally expensive algorithms that can benefit from hardware parallelization. DTPF has the primary goal of making it easier for various programmers and researchers to create sensory models in a modular manner. The first phase will involve implementing the core framework for off-line (as opposed to real-time) processing. This phase will also not emphasize graphical user interface (GUI) aspects, though it will provide features for later GUI integration.


II. Background

Motivation
    The initial purpose of this project is to provide a well-engineered foundation for creating computational models of audio-visual sensory integration and attention in human infants. This specifically includes Epigentic Sensory Models of Attention (ESMA) which are composed of distinct components performing specific functions, and share a general 'pipe and filter' structure with many other computational models which operate on sensory data. As it is the nature of these models to be highly modularized there is much potential for reuse of common modules in different experiments and models. Modularization and reuse also directly facilitates the style in which the models are used as studies involving them typically examine a set of similar models with variations or extensions to a basic form. These models also work with high rates of data, performing more complex processing than traditional multimedia applications (e.g., audio-visual mutual information calculation, Hershey & Movellan, 2000). Even for non real-time situations, the computationally expensive algorithms used can benefit from hardware parallelization. Although the focus here is on sensory models that can run without real-time constraints, we are beginning to work with hardware devices as well (e.g., robotic pan-tilt cameras, SoDiBot). Introducing soft real-time requirements (soft real-time applications such as video players can tolerate some indeterminacy in timing, hard real-time systems such as aircraft control in general do not have that luxury) further enhances the importance of being able to harness hardware and software parallelism.
    In previous work we have been developing customized software programs (e.g., SoundStream, SenseStream; for applications of SenseStream see: Prince & Hollich, in press; Prince, et.al., 2004). While this can reduce the perceived initial amount of time and effort required to get a particular program up and running, it leads to more difficulty in extending program functionality, maintenance problems, lower code reusability and many other software engineering evils. In short, while the 'one-shot' approach can be seen as providing short term gains, any real gains occur at the expense of long term flexibility. Many of these issues were encountered in the development and modification of SenseStream. SenseStream dealt with many of the same processing issues as SoundStream did, but due to the way these programs were designed and implemented, code from the earlier SoundStream project found no reuse in SenseStream. The customized nature of SenseStream has also made subsequent modifications more difficult. One such addition to the SenseStream program was calculation of Mel Frequency Cepstral Coefficients (MFCC) from audio data which involved modifications to the user interface, configuration, audio processing and mutual information calculation program code. Another modification that has proved more intrusive is the ongoing integration of the SoDiBot data acquisition. This has involved circumventing the original audio and video input (which read data from a file), as well as modification of SenseStream and the SoDiBot controlling software in order to communicate data and perform synchronization. A more detailed discussion of the structure of SenseStream, changes that have been made to it, and how the DTPF will be used is available here.
    The modifications to SenseStream could have been made considerably easier if a framework that enabled and encouraged modularization and reuse had been used for the SenseStream program. Development could have been more focused on the processing problems addressed rather than having do deal with more mundane issues of configuration, data communication and synchronization. The program code itself could have also been more directly applicable to future projects such as the sensory models mentioned above.
    In addition to these software engineering issues, as our work has progressed we have begun collaborating with a number of psychologists who use computers running Macintosh OSX almost exclusively. While our group has some access to OSX computers there are a larger number of Solaris and Linux computers available at our university. A solution supporting OSX while allowing for some level of program code portability between different platforms is desirable. However, the focus of this initial stage is on flexibility; portability is a secondary objective.
    These factors motivate the creation of a more generalized framework for modular processing of time-based data, capable of exploiting parallelism by concurrent execution of modules across multiple processors and multiple computers, while allowing multiple operating system platforms to be exploited. Such a framework, if well designed, could also be applied to the more traditional multimedia domain or general problems involving processing of time-based data.

Stakeholders
    The stakeholders involved with DTPF can roughly be categorized as follows:
  1. End-users: The envisioned users of DTPF are psychologists or other researchers conducting simulations and experiments with sensory models. The non-programmer's interaction with DTPF will of course be through end-user software. End-users are considered stakeholders here to the extent that the framework must support applications which meet their needs.
  2. Programmers: Programmers implementing model components or applications will interact with the framework through its application programming interface (API) as well as end-user software.
  3. Implementors: Framework developers (who could be considered application developers or end-users as well) will be involved with implementing internals of the framework in addition to making use of the API and end-user software.
The skills and knowledge of these stakeholders can vary considerably, from student programmers early in their undergraduate careers to professional researchers with extensive to non-existent programming background. Experience with different OS platforms and software packages can be expected to show similar variation.

Risks
    Several risk factors present themselves concerning the creation of the DTPF. Planning and implementation of the first phase discussed here entails considerable effort, potentially on the order of one person-year. Limiting the scope of this first phase and employing appropriate software engineering principles could reduce the effort required. There is also the risk that the implemented DTPF will not adequately meet our stakeholder's needs: programmers might find it difficult to develop models using the API provided; end-users might find the resulting end-user software too complicated or incomplete for their purposes. This could be due, among other reasons, to missing, incomplete or incorrect functionality that hinders the development of sensory models or end-user software with the DTPF.
    Considering these risks, the potential long-term benefits of creating the DTPF are still attractive. Exposure to these risks can be reduced by keeping them in mind through the stages of planning and implementation.

Existing Software
    While this project intends to create a new framework, considering existing software is important. Existent software can reveal successful design approaches and desirable functionality, as well as drawbacks and potential pitfalls. We are also interested in software toolkits that may be useful in creating the DTPF and facilitating future projects. The discussion of this software is available here.


III. System Environment

    The first phase of DTPF shall run on single desktop/laptop computers running Macintosh OS X with necessary support software installed.


IV. Functional Requirements for the Core Framework


    The requirements presented here are quite dependent on the architectural model (an architectural model gives a top-level view of the organization of a system, see Figure 1), and in some cases even imply specific implementation. While this would be inappropriate for some systems, specifying requirements of a framework requires some reference to architectural structure. Avoiding this would make clear specification of requirements much more difficult. In short, for a framework we feel it is valid to require a certain architectural model.

DTPF architectural overview.
Figure 1: A top-level view of the DTPF architectural model. Note that the connections shown between nodes are illustrative of a hypothetical situation. In general any node can be connected to any other.

    The architectural model adopted is roughly that of the BeOS MediaKit. Processing is organized into nodes which can be connected together to send processing data from one to another. Processing data is stored in buffers and this data is transfered by sending references to buffers from a producer node to a consumer node. A roster maintains records of various resources, and controls certain shared resources. A client interface allows program code (node implementations or client applications for example) to communicate with the roster through service requests. A plug-in loader is responsible for loading plug-in nodes that can be specified dynamically at run-time.

    Many of the concepts presented here are related in one way or another. As a result of this the current document contains some redundancy between sections (particularly section IV.G discussing the roster). This will be resolved as this document progresses.

IV.A:
Data processing is divided into nodes with interfaces for configuration, control and data I/O.
  1. Nodes are implemented as subclasses of a framework class, and may be statically linked in application code or dynamically linked from shared libraries.
  2. To make a node available to the framework it is registered with the roster using a framework provided method. This involves communicating information to the roster that is needed to identify and communicate with the node. (See § IV.G)
  3. To make a registered node unavailable to the framework it is unregistered using a framework provided method.
  4. A registered node can be in stopped, starting, started and stopping states. A node is transitioned into the starting and stopping states by slot methods that are called by the framework. When these slot methods complete the node is considered to have transitioned to its next state.
    Registered node state diagram.
    Figure 2: Registered node states.
    • In the stopped state a node may be requested to perform all query and configuration operations it supports.
    • In the started state a node is assumed to be actively processing. Query and configuration operations may fail at the discretion of the node implementation.
    • The starting and stopping states are transitional, the framework prevents query or configuration operations from being performed when a node is in either of these states.
  5. Methods are provided to query and update a node's configuration parameters. (See § IV.F)
  6. Methods are provided for a node to publish its data input and output capabilities. These capabilities are represented by outputs and inputs. A node informs the roster of changes to its published inputs or outputs.
  7. A protocol is provided for connecting nodes which produce output data to nodes which accept input data. A connection establishes an agreement that an output node will produce a particular type of data, and that an input node will consume that data. (See § IV.C)


IV.B:
Node execution is event driven.
  1. Nodes are controlled by events corresponding to the various actions that can be performed on a node.
  2. A framework provided event looper is responsible for receiving and dispatching events by calling framework defined hook and slot methods.


IV.C: For communication of processing data (audio samples, video frames, etc.), a node which outputs data can be connected to a node which accepts data input.
  1. To establish a connection an input and an output for a desired data format type must be identified. (See § IV.A & IV.E)
  2. Client code specifies details of the desired data format.
  3. Once the input, output and desired format have been determined the client makes a request to establish the connection. (See § IV.I) If either of the nodes involved is not in the stopped state an error condition is raised, terminating the connection process.
  4. Format information and output identification is sent to the output (producer) node. The output node can specify preferred values for unspecified format attribute values. At this point the output node can terminate the connection process for any implementation dependent reason by raising an error condition.
  5. If the output node did not terminate the connection process, format information from the output node and input identification is sent to the input (consumer) node. The input node can inspect the format configured by the output node. At this point the input node can terminate the connection process for any implementation dependent reason by raising an error condition.
  6. Provided neither node raises an error condition the connection process is considered complete after the input node has been consulted. Both nodes should be ready to participate in their respective roles as producer and consumer.
  7. A connection must be associated with a buffer group before the nodes involved can be successfully started.
  8. Two connected nodes can be disconnected provided both nodes are in the stopped state.


IV.D:
Data passed between nodes is stored in buffers, organized into buffer groups.
  1. The framework provides operations for buffer group allocation, deallocation, lookup of existing buffer groups, and allocation/deallocation of buffer and buffer group descriptor objects.
  2. Program code can request constraints on the memory addressing and alignment used for buffer data when a buffer group is created. These constraints include absolute addressing, aligned addressing (i.e., buffer data starting on a page boundary), and non-paged buffer data.
  3. Each connection between nodes is associated with a buffer group that is used for the data passed over the connection.
  4. A buffer group can be associated with more than one connection between nodes provided all connections involve the same data format.
  5. A buffer group has an associated data format (See § IV.E) that is common to all buffers in the buffer group. This format is the same as the format of any connections the buffer group is associated with.
  6. Access rights to a buffer are acquired from the owning buffer group through the framework before use. These rights are transferred from producer to consumer node when a buffer is sent over a connection, and are released when the buffer is no longer needed. This is needed to ensure process and thread synchronization. Exclusive modification rights are a convention to be adhered to by node implementations to ensure data integrity.
  7. A buffer is sent to a consumer node using a framework provided method which results in the consumer node being informed that the specific buffer is available.


IV.E:
Data format descriptions are handled by the framework in a manner independent of specific format implementations.
  1. Specific format types have a unique name composed from the 'supertype' and 'subtype' (e.g. 'audio/raw', 'video/packed', 'raw', etc.). Project level organization (i.e. 'esma/audio-features') of formats is also possible. The 'supertype/subtype' style is not a requirement on the naming, it is used here for illustration.
  2. The framework provides default formats for 'audio/sampled', 'video/packed', 'video/planar' and 'raw' types. Wildcard or 'don't care' values are provided for the specific format attributes where applicable.
    Format name
    Description
    Attributes
    'audio/sampled'
    Describes sampled audio data Sampling rate, samples per buffer, sample data type, quantization type (linear, µ-law, etc.), channel count and sample layout.
    'video/packed'
    Describes video data organized into arrays of pixels containing all color components. Colorspace, image dimensions, pixels per image row, storage size per image row and video frame rate.
    'video/planar'
    Describes video data organized into contiguous 'planes' of data values with one plane for each color component. Colorspace,  image dimensions, pixels per plane row per plane, storage size per plane row per plane, and video frame rate.
    'raw'
    Describes data of primitive type. Data type (int8, uint32, real32, etc.) and the number of elements per buffer.
    All formats
    All data formats
    Frame rate (buffers per unit time) of the data.


IV.F:
Nodes can publish a list of parameters, where each parameter is described by a parameter model, organized into parameter groups. This allows uniform configuration by other program code, and saving and restoration of configurations.
  1. Parameter groups can contain any number of parameter models and other parameter groups. A there is one root parameter group for a node.
  2. Parameter models include a parameter name, value, textual label, and preferred user interface (UI) controller type to allow automatic construction of configuration UI's.
  3. Types of parameter models include on/off, finite set, mutually-exclusive group, bounded-range, file/directory, and single/multi-line text. The following table gives general descriptions of these models and GUI components typically associates with them. The GUI components are listed here to help illustrate future uses of these models.
    Parameter Model Type
    Description/Purpose
    Typical GUI Components
    On/off
    Yes or no parameter values.
    Checkboxes.
    Group
    List of choices allowing multiple selection. Lists, multiple selection combo boxes, groups of checkboxes.
    Mutually-exclusive group
    List of choices allowing only one selection.
    Radio buttons, single selection combo boxes.
    Bounded range
    Parameter values lying between two values, inclusive.
    Sliders, scrollbars.
    File/directory
    Parameters that refer to file system entries.
    File selection dialogs.
    Text
    Single or multiple line text for comments or display of information.
    Text input fields, textual labels.
  4. Parameter models indicate whether a parameter can be modified during processing (corresponding to a node in the started state), during setup only (corresponding to a node in the stopped state), or not at all (provided for informational purposes).
  5. Parameter models are configured to trigger an automatic parameter update upon modification, or to wait for program code to directly invoke an update. This configuration can be applied to parameter groups, affecting all parameter models and parameter groups contained within.
  6. Changes to a node's parameter values are propagated to the node by the framework. A node notifies the framework of changes the node makes to its own parameter values.


IV.G:
A roster manages framework resources, and acts as broker for framework services.
  1. A list of dormant nodes is maintained.
  2. A list of active nodes that are currently registered with the roster is maintained. Each registered active node is assigned a unique identifier by the roster. For each node lists of published inputs and outputs are maintained by the roster.
  3. A list of active connections between nodes is maintained. The connection information identifies the sending and receiving nodes, and the data format used.
  4. A list of buffer groups is maintained. Each buffer group is assigned a unique identifier. The roster tracks which connection(s) a buffer group is associated with.


IV.H:
A plug-in loader handles loading of dormant plug-in nodes.
  1. The plugin-loader handles requests to load a dormant node or to unload a previously loaded node.
  2. Multiple instances of the same plug-in node can be loaded, except where that is inappropriate for a node (e.g., a node requiring exclusive access to a hardware device). Node implementations enforce this constraint.


IV.I:
DTPF provides a client interface to the roster. The following operations are provided:
  1. Requesting a list of a node's inputs and outputs that are not yet connected, currently connected inputs and outputs, or all outputs and inputs. These operations can be constrained to inputs and outputs of a specified format type.
  2. Establishing a new connection between an output and an input.
  3. Breaking [ToDo: 'breaking' sounds bad, investigate alternative terminology...] a previously established connection.
  4. Registering a node with the roster.
  5. Unregistering a node.
  6. Requesting a list of nodes that have been registered with the roster. This can be constrained to nodes that support specified input and output format types.
  7. Requesting a list of dormant plug-in nodes that can be loaded by the plug-in loader. This can be limited to nodes that support specified input and output format types.
  8. Loading an instance of a dormant plug-in node.
  9. Releasing a plug-in node to allow the framework to deallocate the node.
  10. Requesting an active node's top-level parameter group.
  11. Applying changes to a node's parameters.
  12. Prerolling, starting and stopping an active node.
  13. Registering a listener to receive notification when:
    1. A node is registered or unregistered.
    2. A new connection is established or an existing connection is broken.
    3. Changes are made to a node's parameters.
    4. A node transitions to the prerolling, prerolled, starting, started, stopping or stopped states.
    Listeners can be registered to receive all notifications or a subset of notifications. The notifications delivered to a listener can also be limited to particular nodes.
  14. Unregistering a listener so it no longer receives notifications.


IV.J:
A driver program allows integration testing of the framework roster interface and a means to setup initial 'models'.


V. Non-functional Requirements for the Core Framework

V.A:
To facilitate use and reuse for diverse models, by diverse users, the framework design and code is 'well documented' for the programmer and user.
  1. The project development documentation will be maintained throughout the project. These documents include a project plan, project requirements (this document), design documents and diagrams, and test procedures and plans. [ToDo: Specify structure/content of these documents, link/refs to common formats, recommendations.]
  2. User documentation includes instructions on installing DTPF and needed support software, enumeration of supported and tested platforms and software configurations, and instructions detailing operation of framework applications.
  3. For model and application programmers the application programming interface (API) is thoroughly and consistently documented. This documentation includes the intent and proper use of classes, methods and members, and descriptions of method pre-conditions, arguments, return values and post-conditions. General discussion of important concepts, possible difficulties/problems and shortcomings of the software are also included. Design and other diagrams are to be included as appropriate.
  4. Documentation for project developers similar in detail to the API documentation is provided. This documentation covers the internals of the framework implementation.
  5. A source-level documentation system (such as Doxygen, see also: Williams, 2004) is used to generate programming interface documentation.


V.B:
The implementation language is ISO standardized C++. [ToDo: Make sure this is correct way to refer to 'standard' C++, link/ref]


V.C:
DTPF is released as open-source software under the Academic Free License, version 2.1 (AFL).
  1. All core framework libraries and applications are compatible with the AFL license.
  2. All software which the core framework depends upon is compatible with this license. ACE, YARP and MUSCLE satisfy this requirement.
  3. Tools used are not limited to platform. This includes tools used for planning, documentation, development and other tasks. Commercial or proprietary products are avoided where possible.
  4. Client applications and node implementations that are not part of the core framework, and are distributed separately from the framework, are not directly constrained by this license.
  5. DTPF is a SourceForge project. SourceForge's services are used for version control, website hosting, file releases and applicable aspects of project management.


V.D:
Platform details.
  1. The first phase of DTPF is implemented on Macintosh OS X.
  2. GNU/Linux is a secondary platform for this phase of DTPF. Primary development is focused on OS X. In the future GNU/Linux may be supported, but for this phase there is no concern to validate builds on GNU/Linux.
  3. Platform neutral coding practices are followed (i.e., a pointer is never cast to an integer, no assumptions are made about the size of 'long', no assumptions are made regarding byte ordering, etc.).
  4. For the first phase of implementation processing is not real-time. Any computer capable of running OS X is capable of running the framework. The nodes themselves are the major resources sinks and their requirements will impose stricter limitations.


V.E:
Allowances are made for future soft real-time processing and multiple machine distribution.


V.F:
The core framework consumes minimal resources.
  1. When not involved in handling requests for services the roster, plug-in loader, roster client and event loopers consume no more than 1% of processor time on a 500 MHz Macintosh G4 system running OS X. Ideally this will be 0% usage.
  2. The roster resources use no more than 1 KB of memory per entry. Each registered node, dormant plug-in node, input, output, format, connection, buffer group description, etc. is considered a resource here. Buffer groups themselves involve at least as much memory for the buffers they contain.


V.G:


VI. References

Academic Free License:

ACE:

BeOS MediaKit:

DataTime 0:

Doxygen:

IKAROS:

JMF:

MUSCLE:

Mutual Information Algorithm:

OpenHRP:

SenseStream:

SoDiBot:

SoundStream:

SourceForge:

YARP:


VII. Glossary

hook method
Polymorphic method defining an interface for which subclasses may optionally provide an implementation.

plug-in
A software component that can be dynamically loaded at run-time.

slot method
Polymorphic method defining an interface that must be implemented by subclasses.



Hosted on
SourceForge.net Logo

© 2005 Eric J. Mislivec, Last Modified: 12 July, 2005