MINDMAKERS Forum
Welcome, Guest. Please login or register.
September 08, 2010, 01:54:45 PM

Login with username, password and session length
Search:     Advanced search
NB: Spam bots are becoming smarter every day - we had to turn off regular registration. To become member, please send email to Kris Thorisson ([kris'_lastname] att ru dott is).
337 Posts in 99 Topics by 99 Members
Latest Member: peterwit
* Home Help Search Login Register
+  MINDMAKERS Forum
|-+  Projects
| |-+  Psyclone (Moderator: cmlabs)
| | |-+  Time in OpenAIR
« previous next »
Pages: [1] Print
Author Topic: Time in OpenAIR  (Read 1663 times)
eric
Newbie
*
Posts: 9


View Profile Email
« on: March 20, 2007, 03:37:00 PM »

This new section will focus on timing in OpenAIR.

We will run across the unavoidable related issues:

- synchronization
- deterministic system response
- latency
- speed
- scalability
- clocks and system granularity


First, we have to identify what we need for sure. And what we don't.

So here are a few use cases we'll have to deal with (fell free to expand the list)


Requirements

use case 1: media streams handling
relevance: live video input stream analysis for machine vision, live audio input stream for speech analysis, in-sync audio/video output stream for expression (feedback for human interaction)
needs:
- sync modules to external source, ex: word clock
- sync modules between themselves on a user-defined granularity, ex: lip sync
- sync computers
- processing time prediction, ex: being able to provide a rough - yet correct - image analysis on time, still retaining the ability to refine it in the background and
upgrade the answer later; this boils down to introducing a deadline concept and encompasses the networking overhead
- scalability, ex: adding a computer (or CPU, thread, GPU, or whatever) improves the processing speed


use case 2: interrupt handling
relevance: interfacing with a real-time device (ex: MIDI I/O manager)
needs:
- user-defined modules process an event exactly under 1 ms
- preempt current process on the local CPU
- speed and low latency


use case 3: path finding
relevance: tactical/strategic high-level motion planning: making and updating high-level decisions on high-level data
needs:
- processing time prediction, ex: being able to provide a rough - yet correct - motion direction on time and start moving, while refining it in the background and
upgrade the plan later


use case 4: motion control
relevance: steering a robot in the real world: making high-level decisions on low-level input, on time
needs:
- get and process sensor inputs on time: interrupt handling and preemption; ex: being able to estimate the rotation speed of a revolving door and its current position
- processing time prediction, ex: being able to effectively start breaking on time to avoid a collision requires the ability of having processed inputs before the next (and maybe last!) user-defined clock tick (in this case, a safe guard timer based on the vehicle speed)
- sync with external device, ex: a breaking system


use case 5: fault tolerance
relevance: providing safety in mission-critical operations
needs:
- having an accurate position of the system in its operation space, including time,ex: being able to fall back to emergency procedures before a crash can occur
- this means being able to anticipate the system's trajectory in its space: this requires time accuracy and system response predictability.


Background: experimental results

I've used/developed distributed systems in the following fields:
- manufacturing (welding robots, cellular workshops)
- live theatre (stage control, dynamic content analysis, dynamic content creation)
- real time graphics and broadcasting (holographic media on clusters)
- IT (not relevant for the current topic since those systems did not have to operate under critical constraints nor were they connected to external hardware devices)

The following are the results of my aforementioned experimentations / implementations.

Hard real time requirements cannot be met on the selected platforms (Linux, WinXP and OSX), but soft real time can.

This meant:
- going as fast as possible, as long as we can keep in sync. Keeping in sync should come before achieving speed; even if speed can help keeping in sync,
it should do so in a controllable way, i.e. at least the system should know if it is about to loose sync.

- must scale well (and not merely deporting a whole task on another processing unit, but having one task using simultaneously several resources)

- be fined grained:
    - for the reason above (fibering),
    - for ease of scheduling and predicting, since the underlying OS won't help us in that)
    - for ease of load balancing
    - for robustness

- ultra-low latencies:
    - using the right network for the job: fat-tree switched fabric (usually provide kernel bypassing communication handling)
    - avoiding overheads coming from the compliance to unneeded requirements (like general processing and OS built-in scheduler): for example
    a dedicated scheduler/RAM manager/IPC subsystem, running as the main task in the underlying OS, with user-modules being something like drivers having access to fixed-sized RAM cell pool and double-buffered unprotected shared memory entry points are commonly used.

- easing the task:
    - design for a friendly OS where all the features can be implemented (Linux/C++; for networking, kernel customization, system programming, acceptable driver availability, cost of operation, clustering capabilities and tools)
    - allow interoperation with unfriendly (closed, poor networking) platforms with only a subset of features (Windows, Mac, Java)
    - allow interoperation with friendly RTOSes (QNX,Softkernel, ...)


On that basis, here are some suggestions and caveats.

Suggestions

Strong requirements

It's almost impossible (nor is it desirable) to keep everything in sync with one external clock.
But we need the ability to plug many of them (ex: crossbar video-sync + audio word-clock + ...) and publish them in the system for resolution (by user modules, not the system) in meaningful timelines.

User-level processes must be provided the ability to perform their task at different time granularity (think of a hydraulic clutch) still retaining accuracy.

Low latency is a core issue to be solved for real-world interaction:
- critical for motion and robot self-conservation
- critical for interaction with humans and therefore for the system acceptance

As is processing time predictability.

Speed is not a strong requirement if scalability is provided; and, as far as I can tell, a well-designed system always scales well.

The system should be able to be debugged/monitored without adding any time overhead nor latency at the debugged level: another vote for different - yet in sync - coexisting time granularities


Nice-to-have features

Although not directly impacting the timing per say, having the following will certainly ease the task.

Speed:
- broadens the spectrum of real world utilization. Ex, if a robot is accurate but slow, it won't crash itself against an automatic door, but won't be able to enter the building. If it is accurate and fast it will.
- allow to drift slightly away from hardcore issues in the early stages of development for non critical parts of the system
- allow to keep away from hard-to-manage overheads (ex: TCP/IP congestion)

Scalability: allows the speed issue to be ranked second after time accuracy and predictability



Caveat

I learned to avoid:

- trying to please everybody and start from the boundaries towards the core; it is easier to go the opposite way, especially when it comes for the design of a time-handling policy which is actually the core of any system, instead of having at the very beginning to cope with renowned non time/latency-conscious platforms (more over, if the system is outstanding, people will accept it and comply with its requirements. Then it can be time to add interoperation flexibility, without endangering the concept).

- coarse grain: it has always been a plague threatening the systems I had to use/build and always introduced a significant impact on real time operation. Sooner or later, it brings up issues like (and worse, demands the solving of):
   - strong type hierarchies and then type repositories, dictionaries and fixed points (which are hard to maintain in fault tolerant critical systems) and then dynamic type updating issues etc...
   - difficulty to break up in small pieces for - real - distribution over computing resources (as opposed to the simple relocation of a whole piece of code to another network node)
   - overcomplicated scheduling and then coping with system overhead and then optimization in both speed and latency (and footprint too, when operating on scarce resources)
   - increasing cost of redundancy and fault tolerance (when we have to switch to an emergency procedure, on time)
   - requires optimization which of course don't perform well when it comes to manage high-speed, high frequency, lightweight and volatile data: typical example: clock ticks, interrupts, RDMA flows, MPI transfers, etc. All of those get buried and then require system defined built-in opaque circuitry to be handled correctly: some of the system semantics become unaccessible, requiring the system to provide a higher level of service (smart multi-tasking preemptive scheduling / message routing) whereas user-defined modules could have been programmed in a well behaved collaborative way, alleviating the requirements for the system (then gaining simplicity, speed and response accuracy)
   - silent, easy slope towards client/server organisation or partitioned systems
   - difficulty to compute and inject new ontologies at runtime, which is a necessity when it comes to build "intelligent" systems, which, to say the least, are expected to demonstrate some creativity, and therefore, to produce incrementally knowledge or know-how, potentially in incomplete states
   - growth of modules (opaque code) and complexification of (explicit) data, which yields:
     - difficulty to dynamically evolve the code of a black box
     - difficulty for a module to learn from or about a black box that does too many smart things in too many circumstances
     - difficulty to replace a black box everybody relies on (coupling) because they react on types instead of patterns, and then having to take care of legacy messaging and eventually, to perform data scheme migration (Cf databases) yet having to retain the already produced knowledge, and therefore having to regenerate the system state.

- too-many-layers architecture: typical examples: the TCP/IP stack, centralized services: although these used to be natural requirements for general purpose OSes, it is not so pregnant for us. Cf for example kernel-less experimental OSes (GO, Pebble, MMLite)

- multiplicity of (overlapping) concepts: ex: clock signal vs message when it comes to triggering modules. For example, it would be easy to provide two distinct triggering mechanisms (clock tick and incoming message): this would not solve the problem of having the messages delivered within a predictable time, but would already add unnecessary complexity to solve part of a short term goal and be pushing towards a more intricate situation.

- focus too strongly on backwards compatibility: when the system is still young and not so widely used, it is still time to correct things that might need to be, and regain freedom of movement, elegance and efficiency for future development at the expense of a not-so-expensive recoding.

- mix the operation domains: for example, human readability being a very nice feature to have, having messages implemented as readable text, when, most of the time they have to be read/invented/produced by machines over potentially narrow and slow networks, is a question which deserves a strongly motivated answer.


So, that's all for today. Please post your replies having in mind the definition and the must-haves of the system time handling, then we'll be able to move on towards the design, and the expansion/modification of the OpenAIR spec.

cheers
« Last Edit: March 20, 2007, 04:10:15 PM by eric » Logged
eric
Newbie
*
Posts: 9


View Profile Email
« Reply #1 on: March 27, 2007, 12:37:08 PM »

Hi everybody,

I posted in The OpenAIR section a proposal for a specification extension addressing time-awareness, scheduling and application responsiveness:

"OpenAIR Time Awareness / Dynamic Scheduling - Specification Extension Proposal"


Let's kick off the discussion!


Cheers


eric
Logged
Pages: [1] Print 
« previous next »
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.4 | SMF © 2006-2007, Simple Machines LLC Valid XHTML 1.0! Valid CSS!