MINDMAKERS Forum
Welcome, Guest. Please login or register.
September 09, 2010, 08:41:55 AM

Login with username, password and session length
Search:     Advanced search
NB: Spam bots are becoming smarter every day - we had to turn off regular registration. To become member, please send email to Kris Thorisson ([kris'_lastname] att ru dott is).
337 Posts in 99 Topics by 99 Members
Latest Member: peterwit
* Home Help Search Login Register
+  MINDMAKERS Forum
|-+  Projects
| |-+  SAIBA Multimodal Behavior Generation Framework
| | |-+  Digest of previous email discussions (part 6 of 7)
« previous next »
Pages: [1] Print
Author Topic: Digest of previous email discussions (part 6 of 7)  (Read 1284 times)
hannes
Administrator
Newbie
*****
Posts: 43


View Profile Email
« on: August 21, 2006, 05:51:41 PM »

============================================================================
From: Hannes Pirker <hannes_(at)_ofai.at>
To: Stefan Kopp <skopp_(at)_techfak.uni-bielefeld.de>
Date: Wed, 12 Apr 2006 19:46:14 +0200 (CEST)

Dear all...

I cannot resist shedding some more light on the burning question on
"what (the hell) is and not is the gesticon", because explaining its
"Geistesgeschichte" (history of ideas) should allow you to better
understand the intentions behind it, and thus make the relation to
FML/BML and SAIBA clearer, and hopefully helps us to streamline the paper...

I think it is necessary to reflect that throughout the ongoing design-
process for the gesticon we (unintentionally!?) changed the 'bearing'
several times, and maybe even get lost a bit.

So where do/did we start at?

I think the motivations for the current gesticon can be summarized
into different goals ( I am simplifying and talking of "gestures"
where multimodal-behaviour is meant ) - and the weighting of this goals
changed along the time-line:

A) Associate MEANING with FORM:

E.g. provide a mapping from communicative goals, semantic content
etc. (i.e. what we now call FML) to a CERTAIN GESTURE / FORM.
I.e. the gesticon is the ONE repository to allow for the selection of
gestures to be used for a certain communicative situation. I'd call
this: specify the SEMANTICS of a gesture.

B) Provide  information about the actual FORM of a gesture:

We called this the PHONETICS of a gesture. Here the focus has shifted
considerably on what the real PURPOSE of this description is, and
thus, WHAT it should CONTAIN.

B.I) provide (minimal) information about temporal (min/max/default
durations) & spatial properties (which channel do they occupy, which
start- and end- positions do they have, in order to calculate
necessary transition times to other gestures). I.e. (minimal)
information that is needed by the planning components in order to
specify the overall timing ("Does X, of which I do NOT care what it
really is looking like, PHYSICALLY FIT into the slot between Y and
Z?"). (We might call this the PHONOLOGY of a gesture?, if that makes sense to you)

Yesss, right: that's just the sort of gesticon-design Andrew was talking
about! We more or less came to identical solutions - a good sign, I'd say!

and there is

B.II) Provide a fine-grained description of what the gesture X is
really looking like. This is the part of the specification we have
been working on intensively when meeting in Paris in September, and
the text in the paper is focussing on that. I.e. as Stefan said: its
working towards some sort of MURML -- let's call it MURML++ as it also
tries to cover the WHOLE body (facial expressions & postures). The
purpose of this is to come-up with a general
"gesture-specification"-language which allows to re-use gestures in
different application contexts and players, have a universal language
to communicate about the FORM of a gesture... Brig, Catherine, I
hopefully paraphrase the motivation for this approach correctly.

(Andrew, you see, being able to compose gestures of other snippets
would be a nice feature of this gesticon, but is neither the only nor
the main purpose)

Before I am going to stab a knife in my own back by advocating for
dropping B.II) for now...

here
_ SOME (HI)STORY _

For us (Brigitte & me) the whole gesticon-business started with
problem-cases we encountered in the NECA project, where we produced
TWO Web-based ECA-applications, using TWO different (very simple)
Graphic engines/clients. One flash-based animation and one using the
Charamel player. The latter contained a fixed repertoire of
pre-fabricated whole-body movements of cartoon figures, and temporal
control was possible by changing the replay-velocity only.

To make things worse, "similar" animation-sequences ("gestures") for
the two cartoon figures contained in the player, differed in form,
naming and default-duration. I.e. "greet" for the MALE-person could
look different from "greet" for the FEMALE one. OR the "same" movement
could have completely different names for male & female etc.

I.e. a communicative goal ("greeting") had to be matched to "gest_1"
when actor A was involved and to "gest_1" when actor B was speaking.

Us being linguist ;-) we of course longed for a "general"/global
solution for the mapping-problem, also in the light of building
different application-scenarios. The "straightforward" thing was to
think of a "gesticon" in order to encode the mapping from MEANING
(e.g. FML).

The FORM - slot was more or less composed of the info of type (B.I),
i.e. min/max/default durations and a *very* rough description of the
wrist-position in space at the start- and end of the 'gesture'.

What happened next was, that we soon ran into problems when thinking
of ways to specify the MEANING slot. How to specify it? You see from
designing FML that you come up with using a convolute of theories,
which still only are able to describe a portion of the whole problem,
and there always is the danger of missing things and ending with
arbitrary solutions.  E.g. for iconic gestures you probably need some
sort of lexical semantics in order to tie the gesture to lexical
concepts. I.e. you need to come up with a repertoire of basic
"semantic" properties for gestures, say "box-like", "circular",
"huge-size", "progression", "altitude" ... Another issue was
"multi-culturality" (strongly demanded by the EU). So think of the
famous "nod", which usually is used for "yes" or "acknowledge"... but
NOT in Turkey. Does that mean, we need an extra attribute
"turkish="FALSE"Huh So where to draw the line??? ... You got the idea...

So here psychology kicked in, and in the context of the problem of
MEANING-encoding ("too difficult for now") it suddenly looked much
more promising to concentrate on the FORM aspect of the gesticon
first, and expand and refine the descriptions of type (B.I) and work
towards (B.II)

And this way purpose (A) somehow started to drift out of focus and
encoding B.I soon seemed to be too trivial, so we somehow get stuck in
B.II) and work on other aspects was postponed until a language for
B.II) was finished.

=================

NOW I'd suggest to re-evaluate this progress in development, as there seems
to be a discussion on how features from (gesticon.BII) are to be
"migrating" into BML -- if I correctly understand the current state of
affairs?

In my opinion the value of gesticon.BII is CRUCIALLY connected to the
availability of players/interpreters/renderer to make use of it. In the
absence of such technologies, it will be very hard to convince people
to actually use it, i.e. to write conversions to their players... if
possible at all.

I thus would opt for postponing this MURMLisation of the Gesticon and
BML. The Gesticon should for now NOT be the locus for the MAXIMALLY fine grained
physical description of behaviours (i.e. some sort of MURML++ or why
not H-ANIM+functional labels) but rather we should aim for finding the MINIMAL
INTERFACING CONDITIONS necessary to communicate between PLANNER and PLAYER.

But I think, after a loooong story, that we are quite close to this
anyway? I.e. I like the idea of providing mainly timing-information
via the "universal" start/ready/stroke/... points! Applications/Player
that are going to use BML are thus requested to "implement this
interface" virtually in the same sense Java-classes "implement"
abstract interfaces.

The FORM part of the Gesticon in this view is then NOT containing
information to be subsequently rendered by a player (e.g. exact wrist
positions, angles etc.) but rather is there to provide minimal
information for the PLANNING component. I.e. we use (FORM B.I) which
typically is more or less EXTRACTED from the player-specific encoding.

I.e. say, we are having a function/meaning "acknowledge"
Then the gesticon specifies 3 mappings for this:

A: which is a nodding (but don't use it in Turkey)
B: is a smile plus eye-brow lifting (if you feel sympathetic)
C: a thumb-up 'o.k' sign (but don't use it in Iran!)

The gesticon does NOT know what C is "looking like", but it gives
possible ranges for the location of the start/ready/ etc. parts, and
tells that the channel "right arm+hand" is affected and thus blocked.
Instead of working on models on how to describe the orientation of the
thumb we should  think on whether and how  to further specify
alignment conditions for the stroke-phase. (E.g. it might be fine to
have the stroke on the NUCLEAR accent of the phrase: i.e. on : "That's
a GREAT idea." "This idea is really GREAT" "This is JUST what i also
wanted to say")

ok ok ok

I think I rattled off MORE than enough for now.

What does this mean for the paper then?

Simpler things first:

I fully agree with Andrew's view on the speech-element. We were so
proud of our fancy XML-encoding of the phonological hierarchy... but
in the light of using BML as a MINIMAL and GENERAL interface I also
opt for getting rid of these and use the speech-element as described
on the Wiki. The good news is, that via the <SSML:mark> and especially
the <tm> element all the infos on syllables/phonemes etc. still CAN be
encoded if available (which it's only in the context of certain
speech-synthesizers). (Note: Information we are losing this way -- I
think -- is that we no longer prescribe a certain inventory of
"phonological units". That "w1" is a "word" and "s1232" is an
*accented* syllable is not part of the encoding anymore! "w1" and
"s1232" at this stage are abstract/ meaningless time-markers.)

I.e. I support replacing the "verbal behaviour" part with the info on
BML-speech-element from the Wiki. Andrew?

What else?

In my (too radical???) view this would also mean for the paper to skip
most of the text currently in "4.2 Behaviour form expression" -- and
replace or merge it with the info from the BML-wiki?

And re-reading Stefan's email, I find out, that this is more or less
what he already suggested, right?

Yours
"tool late to apologize for a much too long email"

Hannes(P)
============================================================================
Date: Thu, 13 Apr 2006 00:41:13 +0200
From: Catherine Pelachaud <c.pelachaud_(at)_iut.univ-paris8.fr>
To: Hannes Pirker <hannes_(at)_ofai.at>
Subject: Re: representations paper

Dear all,

I would simply like to add to HannesP's description that Brigitte and I
did say at Reykjavik we would be working on gesticon shortly after the
meeting in Iceland as part of the Humaine project. Unfortunately we did
not report on our work. We did use wiki page within the Humaine web page
but did not do great job with them (personally I still struggle when
dealing with wiki pages).
Thanks to Andrew and Hannes for clarifying controversies about BML and
Gesticon!

In the paper (attached to this mail) I have added a small paragraph on
APML and had a comment on section 3.
...
All the best,

Catherine
============================================================================
Date: Wed, 12 Apr 2006 16:08:29 -0700
From: Andrew n marshall <amarshal_(at)_ISI.EDU>
To: Hannes Pirker <hannes_(at)_ofai.at>
Subject: Re: representations paper

I wanted to respond to HannesP, even if Gesticon material doesn't get
into the paper.

Hannes Pirker wrote:
> B.II) Provide a fine-grained description of what the gesture X is
> really looking like. This is the part of the specification we have
> been working on intensively when meeting in Paris in September, and
> the text in the paper is focussing on that. I.e. as Stefan said: its
> working towards some sort of MURML -- let's call it MURML++ as it also
> tries to cover the WHOLE body (facial expressions & postures). The
> purpose of this is to come-up with a general
> "gesture-specification"-language which allows to re-use gestures in
> different application contexts and players, have a universal language
> to communicate about the FORM of a gesture... Brig, Catherine, I
> hopefully paraphrase the motivation for this approach correctly.
>
> (Andrew, you see, being able to compose gestures of other snippets
> would be a nice feature of this gesticon, but is neither the only nor
> the main purpose)
>
> Before I am going to stab a knife in my own back by advocating for
> dropping B.II) for now...   

I've thrown together an overview diagram that shows how I see these
roles mapping to the proposed architecture.  Very quickly (okay, not so
quickly), part A, the meaning to form mapping, is what Stacy refers to
when he means Gestuary and is the component used solely by the behavior
planner.

BML is the behavior realizer (a.k.a., player) independent behavior
description language.  This level of specification will not be a perfect
specification and will lead to many of the same problems HannesP refers
to.  It purposefully leaves several behaviors underspecified to prevent
over-constraining behavior realizer implementation.  Some of the
specification holes are designed to attach to other description
languages, like the speech behavior's relation to SABLE and SSML.  Other
specification holes are simple identifiers, such as animation names and
event names.

During the behavior realization, there needs to be a mapping from BML to
the primitives of the animation component.  Ideally, these should be the
components previously detailed in the paper's Gesticon section (also,
part B.2 above).  However, such an implementation represents an enormous
undertaking.  As such, I'd rather look to that specification as holy
grail of behavior primitives.  Implementors should look at this
specification as a target list of behavior primitives, or at least a
list of ideas and design guidelines when they lack the resources to
implement them directly.  Given a single implementation, there should
exist a vague and indirect overlap between it's behavior templates (the
forms of Gesticon A.) and the table of BML patterns used by the player
to look up primitives.

Also, I expect the precise specification of behaviors of BML to
gradually include more and more Gesticon B.2 primitives as they are
proven useful and easy to implement.  This is a result of better
knowledge after implementation and experimentation, as well as the
result of better behavior/animation tools available over time.


The implications for the paper are:
 * The prior work of the Gesticon does have a mapping to SAIBA
framework.  In particular, the message contents are analogous, but the
message structures need changing to fit the BML time reference notations.
 * The behavior realization does have its own library.  This library is
not the same as the behavior planner's.  Unfortunately, it is
underspecified for the paper at this point, but based upon the prior
Gesticon work and my proposed template system, probably more specified
than FML or the meaning-to-form  library.
 * The is such a thing as a behavior primitive, dependent on the world
rendering environment, sound and animation assets, and the available
controls for both.
 * Some of our future work involves specifying and developing a library
of these behavior primitives capable of realizing fine grained control
of the characters.

Anm
« Last Edit: August 21, 2006, 05:53:20 PM by hannes » Logged
Pages: [1] Print 
« previous next »
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.4 | SMF © 2006-2007, Simple Machines LLC Valid XHTML 1.0! Valid CSS!