MINDMAKERS Forum
Welcome, Guest. Please login or register.
September 09, 2010, 08:52:30 AM

Login with username, password and session length
Search:     Advanced search
NB: Spam bots are becoming smarter every day - we had to turn off regular registration. To become member, please send email to Kris Thorisson ([kris'_lastname] att ru dott is).
337 Posts in 99 Topics by 99 Members
Latest Member: peterwit
* Home Help Search Login Register
+  MINDMAKERS Forum
|-+  Projects
| |-+  SAIBA Multimodal Behavior Generation Framework
| | |-+  Digest of previous email discussions (part 3 of 7)
« previous next »
Pages: [1] Print
Author Topic: Digest of previous email discussions (part 3 of 7)  (Read 1270 times)
hannes
Administrator
Newbie
*****
Posts: 43


View Profile Email
« on: August 21, 2006, 05:37:22 PM »

============================================================================
From: Stacy Marsella <marsella_(at)_ISI.EDU>
Subject: Re: representations paper
Date: Wed, 29 Mar 2006 15:55:37 -0800
To: Catherine Pelachaud <c.pelachaud_(at)_iut.univ-paris8.fr>

Hi Catherine,

On Mar 29, 2006, at 2:52 PM, Catherine Pelachaud wrote:
> I agree with you. Several of us will be at ZIF next week. It has 
> been discussed at length at a first workshop at ZIF what 
> communicative gesture is? when does a gesture can be considered 
> communicative? there are much more gestures that are communicative 
> than the gesture we normally used for our ECA! so we do need to 
> state that!

Yes I think we need to be explicit about where it is going. What 
threw me a bit was that "Human communicative behavior"  suggests a 
large scope but the examples later in the sentence were just the 
verbal and coverbal behaviors.  So I am quite happy with the view 
that we should be ambitious in what BML encompasses.

By the way, I eventually would like  BML to be more general than even 
"communicative behavior". One of the BML limitations we currently 
face is  that our agents are not necessarily dialog centric - they 
can take actions in a virtual world. So it would be helpful if BML   
included also behaviors  that do not serve a communicative intent, 
whether explicitly or implicitly.
============================================================================
Date: Thu, 30 Mar 2006 23:40:13 +0200
From: Catherine Pelachaud <c.pelachaud_(at)_iut.univ-paris8.fr>
To: Stacy Marsella <marsella_(at)_ISI.EDU>
Subject: Re: representations paper

Hi Stacy,

Have you looked at PAR (Norman Badler)? In PAR behaviors are viewed in
the broad sense and do not correspond solely to co verbal gesture.
Best,

Catherine
============================================================================
From: Stacy Marsella <marsella_(at)_ISI.EDU>
Subject: Re: representations paper
Date: Thu, 30 Mar 2006 14:19:57 -0800
To: Catherine Pelachaud <c.pelachaud_(at)_iut.univ-paris8.fr>

Oh but of course I have looked at PAR. (for example we looked at it 
very closely when we defined the API for the old  MRE body)
One of the working groups in Iceland  (the one I was in) also spent a 
lot of time talking about PAR too.

If someone isn't already doing it, once this paper gets off some 
subset of people perhaps could look into how behaviors in this 
broader sense are expressed in BML. I have a kind of a chicken and 
egg problem that is driving my secret agenda Smiley
We are building this animated vhuman body called SmartBody and I want 
it to be BML compliant but BML is missing some elements we will need 
soon. I would rather BML be extended to include things like walking, 
sitting, grasping as opposed to having non-BML messages being sent to 
SmartBody. I am just not certain if other people are interested in 
BML going this way.

Stacy
============================================================================
Date: Fri, 31 Mar 2006 09:18:36 +0200
From: "Stefan Kopp, Dr.-Ing." <skopp_(at)_techfak.uni-bielefeld.de>
To: Stacy Marsella <marsella_(at)_ISI.EDU>
Subject: Re: representations paper

Hey,

I think it is ncessary that BML can eventually become powerful enough to
allow specifying more than vrbal/nonverbal behaviors. But that merits
some discussion again, since several languages may already have some
means of describing such things (e.g. in MURML we have some constructs
to describe locomotion by giving via points and orientations). It
actually sounds good that Stacey can go about working on this by
building not purely dialog-centred characters. So, lets do the chicken
first Wink

BUT, for now and this paper, lets be modest and leave this out and
concentrate on communicative behavior, i.e. any behavior that an ECA
would show in a f2f-conversation. I know there are no clear demarcation
lines, as e.g. locomotion can surely have a communicative meaning (and
maybe also intention), but specifying verbal and nonverbal behavior with
its subtleties and connotations is challanging enough already.

I agree that we must state what BML is aiming at on the longer run, and
to what it is focused here.

I also share Stacey's comments on FML, and Catherine is right, much of
the tags are actually from APML I think. In general, I think that we
need to have some section on FML where we state what it is supposed to
be and what is needed for it.  Collecting tags from existing languages
that pertain to this level is actually a good starting point to sort
things out and to demonstrate what has been done at this level so far.

As for the gesticon questions, I suggest we discuss them here Bielefeld
next week. I dont think that adding an arrow in the Fig will be the
solution.  The graphics-independence is an argument that is also not
easy to see, because this is exactly what BML is supposed to be!
Graphics-specific realisation of behavior should not come into play
before the BR box, i.e. AFTER the BML spec.

Finally, I find this email exchange stimulating but one is always
missing some argumnts. If there are any comments that we do not take
care of in this email discussions, please put them in the paper!

best,
Stefan
============================================================================
Date: Fri, 31 Mar 2006 11:48:37 -0800
From: Andrew n marshall <amarshal_(at)_ISI.EDU>
To: Hannes Vilhjalmsson <hannes_(at)_ISI.EDU>, Stacy Marsella <marsella_(at)_ISI.EDU>,
Subject: Re: representations paper

I'm not sure how many of you know of me, but I'm building the BML processor in Stacy's SmartBody implementation.  As such, I have a lot to say about the specification and the paper.  Hannes showed me a recent draft and your comments, and I wanted to follow up with my views.  Sorry for the long email, but I'm coming in to the middle of this and have lots of opinions.  I've added headers, so feel free to jump to the areas of interest on your first pass.

First off, I want to note the lack of unifying message structure.  On the ISI BML wiki (http://twiki.isi.edu/Public/BMLSpecification#Message_Structure), I've detailed what we are using in SmartBody.  In the examples there, I pulled out turn and participant from the <fml> and into the higher level <act> because they have value across the entire SAIBA pipeline.  That said, I've changed my mind about turn for reason I will detail below, but I still believe participant is more of a semantic (and discourse) entity than it is functional.

Regarding the diagram in section 3, the both the behavior planning and the behavior realization need access to the behavior library.  At the planning stage, it is a list of available behaviors and some constraints between them.  At the realization stage, it provides more detailed constraints and how they are composed from primitives.  Also, the arrows don't reflect the bidirectional nature of the links.  How we represent that feedback link should be drawn in a way that represents it a smaller data channel and is not composed of FML and BML messages.  At some point we will need to clarify what they are composed of.  A minor point, but... In the last paragraph of section 3, second to last sentence on my copy, there is a reference to "canned animations".  The phrases bugs me and belittles what we are trying to accomplish.  I prefer the term "animation sources", or in this case "primitive animation sources".  Canned implies static to me and I don't want to imply we can't work with procedural sources.

* FML Structure

In the FML section, there is references to "basic semantic units" and discussion the temporal scope, both of which are lacking from the spec.  Another recent conversation also brought up the issue of attaching functions to limited parts of the conversation.  For instance, how do attach the right affect to the contrasted parts of the phrase "beautiful good things and disgusting bad things".  In comparison, BEAT utilized references to the syntactic structure (and other grouping nodes inserted into the syntactic tree) so I could refer to the contrasted components individually.  Of course, this forced language generation to occur first.

After talking with Hannes (Vilhjalmsson, just noticed there are two) last night about these are other issues, he and I decided we needed some references into the discourse structure as attachment points for functions.  The discourse structure should provide a partial ordering for the functions before any language generation occurs.  Secondly, semantic units are insufficient because the functions associated with the semantics changes over time.  For example, in the first sentence of this paragraph I make two references to Hannes ("Hannes" and "he", and maybe an implicit third time in "we"), Hannes is introduced and thus emphasized in the first prepositional clause, then is all but glossed over in the second clause.  The trick is that some functions do need to refer to a semantic entity represented multiple times in discourse.

Obviously a specific discourse language is beyond our goals, and numerous discourse discourse structure languages already exist in the natural language domain.  If we can assume the language can be represented in XML (and thus both structured in a tree and something that can be inserted into a discourse structure placeholder inside <act>), then we can draw upon XPath (or similar variant) to build references to sets.  For example, a discourse reference of "//Hannes" might refer to all references to the semantic entity "Hannes", while the reference "clause[1]//Hannes" would only refer to references in the first clause.  An XPath processor will return sets of elements for each XPath query, from which the FML processor can determine when functions refer to the same discourse entities by comparing overlap of the element sets.  Still, I'm concerned about whether the assumption is sufficient for the range discourse languages.

I believe such a structure replaces the need for a FML level <content> tag, and might also fill the role of <participant> tags.  Neither are really communicative functions like the other tags.

* Turn Taking Function

Since SmartBody is a real-time implementation, I've shared the same concerns Kristinn mentioned in the comments.  The turn taking of commands of request, take, keep, and give are really insufficient to describe what really happens, but they do represent the agent's communicative intent, which is what FML is really about.  So we need another layer that describes how separate SAIBA messages compose with each other over time in a way shared by all the processing modules.  Off hand, I came up with four cases I would be interested in: replace previous acts immediately, additively integrate with previous acts immediately, interrupt then continue, and start after a specific act (immediately if already complete).

Separating this from turn taking allows behavior planning reduces the state maintenance and inference at the behavior planning level.  It also returns control of realized turn taking decisions back to the agent mind.  For example, if the agent is talking and receives a percept about another agent beginning to talk, does it choose to continue the talking over the other agent (no new act), or does it interrupt itself to give the urn, or does it signal to keep the turn (stop hand sign at other agent while still talking, or clear throat and continue).  The same applies to unexpected events outside the conversation: does the agent pause or stop after the loud noise, or does it briefly glance in the direction while continuing to talk.

* BML

I have a handful of details I want clarified in the BML spec, that never made it from our local wiki (if it got that far) into the paper.  Some of these may be too detailed for the level of the paper, but I want to make sure people are aware of the constraints and the reasoning. First, it isn't obvious that a behavior's time reference attributes are optional (section 5, second paragraph).  Our conversations at ISI have always assumed co-occurrences are optional to give the behavior realization the maximum flexibility in fitting to the constraints.  Related to that, behaviors can only refer to previously declared behaviors (in term of the order within the XML).  This enforces an acyclic time dependency graph.  This does not constrain the expression (end="latter:start" becomes start="previous:end"), but drastically simplifies the implementation.

The -1.0 default value for time references does seem necessary.  In XML APIs, if you query a non existing attribute, it will return an empty string, and we should utilize this default value as our marker for unspecified time references.  Secondly, time references aren't even strict numbers; they are abstract shared references with possible offsets.  As such, any mapping to data structures will likely utilize NULL (or equivalent) for unspecified values, rather than a placeholder number.

I am unconvinced of the need of a cmd= attribute, and Hannes has not been able to clarify its use to me.  The "stop" and "reset" values seem very implementation specific, and I wonder what other people think of them. Similarly, the lookup= attribute seems suspect for every behavior tag.  I understand the relation to the Gesticon (which I detail my views on below), but several BML tags  (according to the BML specification wiki) already have name tags serve this role.  Referencing a external entity either breaks the specification (everyone has their own means of each behavior hidden by an lookup id) or complicates and constrains the implementation (everyone needs to support the Gesticon view and surrounding infrastructure).
In the behaviors list, the <wait> behavior(http://twiki.isi.edu/Public/BMLSpecification#Behavior_Element_wait_code).  I think this was a critical innovation to previous BML specifications, not only allowing pauses (which can now also be generated via the shorthand offset notation), but beginning to provide hooks to the virtual environment's event system.  For systems where behaviors are also sources of perceptual events to other agents, it also provides the basis of multi-agent coordination.

Similarly the <speech> tag is missing lots of detail.  It is one of the best documented behaviors at the wiki  (http://twiki.isi.edu/Public/BMLSpecification#Behavior_Element_speech_code).  Like the Gesticon, the verbal behavior section of the paper dictates one view of how to implement speech.  By using the speech behavior as a container for other languages, I wanted to give BML implementations a choice of structure for what fits their needs.  Some implementations, like Tactical Language, might have a database of prerecorded speech indexed by plain text.  SmartBody uses  SSML, and could trivially use VoiceXML.  Everything in the Verbal Behavior section could easily fit within the <speech> behavior as one possible implementation.

Speaking of coordination, should BML describe multi-agent acts in a single message?  If so, behaviors need a actor= attribute.

* Gesticon

Regarding the Gesticon, I have had a similar idea in the back of my head as I've implemented SmartBody, except mine has a stronger tie to BML structure.  The behavior library not only provides a means to compose multiple behaviors into one, but also needs to describe the metadata about each of the available behaviors.  This metadata includes the constraints of the behavior, the body channels it uses, and the minimum and maximum durations, which Gesticon begins to describe.  It also has to detail the BML time references in order to be used by the BML processor.  Lastly, it needs fit into the BML representation in an
obvious way.

Let me demonstrate by a couple of examples.  First, a library entry as a source of metadata:
Code:
    <!-- Parent element mimics BML behavior elements
        so it is obvious when this metadata should be used. -->
    <animation name="RArm_Beat_Low">
       <!-- Time refs refer to the times in the animation source,
          before any time scaling in the BML processor.
          Here, the source animation only describes ready to relax.
          In practice, SmartBody is beginning to
          put this info into the actual animation files.
       -->
       <ready time="0" />
       <stroke time="0.1" />
       <relax time="0.35" />
       <end time="0.35" />

       <!-- Duration in seconds here, but could also be a % of source duration. -->
       <duration default="0.35" min="0.3" max="2.1"/>

       <!-- Some description of body parts involved.
           Needs to relate back to the skeleton description, not presented here. -->
       <channels/>

       <!-- Posture hints may be provided to assist transitions. -->
       <posture start="RArm_GestureSpace_Low"end="RArm_GestureSpace_Low"/>
    </animation>
Now this animation may be used to build the possible micro-plans for BML beat gestures:
Code:
   <gesture type="BEAT" space="small">
       <ready time="0.1" />
       <stroke time="0.2" />
       <relax time="0.45" />
       <end time="0.65" />

       <!-- Component parts refer to parent time refs.
             Here, the animation only covers the critical ready to relax
             and start and end are undeclared.
       -->
       <animation name="RArm_Beat_Low"
            start="ready" stroke="stroke" end="relax"/>
       <!-- Posture and channel metadata is inferred
            from the components,if not specified.  -->
    </gesture>
Using a local id= attribute, components can refer to each other just like they can in BML.  For example, <animation id="anim2" .../> could use start="anim1:end", but those ids are only valid within the context of their parent element. I'm also skeptical of the utility of the library as a cross-implementation standard.  At some point you need a set of primitive behaviors, and I don't believe you can guarantee the same primitives between implementations without over constraining the research platform.  Gesticon, as presented, assumes all implementations will be able to compose all gestures from very small parts.  For the limited BML implemented in Hannes's Tactical Language project, all animations were full body animations, and for SmartBody we simply haven't had the time to build a sufficient library of micro-controllers and are currently relying on a library of full body animations and postures as a basis for the few controllers we do have.

Anm
« Last Edit: August 21, 2006, 05:53:54 PM by hannes » Logged
Pages: [1] Print 
« previous next »
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.4 | SMF © 2006-2007, Simple Machines LLC Valid XHTML 1.0! Valid CSS!