|
hannes
|
 |
« on: August 21, 2006, 05:49:04 PM » |
|
============================================================================ Date: Tue, 04 Apr 2006 13:44:05 -0700 From: Andrew n marshall <amarshal_(at)_ISI.EDU> To: Hannes Vilhjalmsson <hannes_(at)_ISI.EDU> Subject: Re: representations paper
Hannes Vilhjalmsson wrote: > Prosody elements can also use the same behavior notation (should we > define it as a special BML element?): > > <prosody id="p1" type="H*" stroke="s1:tm2"/> > <gesture id="g1" type="beat" stroke="p1:stroke"/> > > This also added a beat gesture with a stroke that coincided with the > pitch accent.
Since prosody is a description of how to perform a speech, the speech can remain the time reference behavior: <speech id="s1" type="application/ssml+xml"> Allows <mark name="wb1"/> <prosody pitch="high"> word <mark name="wb2"/> break <mark name="wb3"/> </prosody> references. </speech> <gesture id="g1" type="beat" stroke="s1:wb2"/>
If we separate prosody (and other speech modifiers) from speech, then how would we implement this case: <speech id="s1" start="0.0" type="audio/x-wav" ref="utterance1.wav"> <tm id="tm1" time="0.1" /> <tm id="tm2" time="1.1" /> ... </speech> <prosody start="s1:tm1" end="s1:tm2" ... />
Here we have described speech as a wave file. Is the prosody behavior a description of the audio already rendered, or does the author intent the BML processor to modify the audio to simulate the prosody effects (if that is even technically feasible). By assuming prosody is strictly a subcomponent of the speech behavior, we only encounter it when we use a higher level speech description language like SSML.
>> d) synchronization between communication partners. For instance, what >> would be the realization of the following: communication partner A and >> B; >> B seeks eye contact with A; when eye contact is made, B points to some >> cake and produces the utterance "give me the cake" with an eye brow >> raise on cake. A's eye gaze follows the deicitic movement of B. A >> frowns while following B's gesture. > > <bml actor="A"> > <gaze id="gzA" target="B"/> > <speech id="s1" type="application/ssml+xml" start="B:gzB1:stroke"> > give <mark name="wb1"/> > me <mark name="wb2"/> > the <mark name="wb3"/> > cake <mark name="wb4"/> > </speech> > <gesture id="g1" type="point" target="cake1" > ready="s1:start" stroke="s1:wb2"/> > <face type="eyebrows" amount="0.9" stroke="s1:wb3" relax="s1:wb4"/> > </bml> > <bml actor="B"> > <gaze id="gzB1" target="A" start="A:gzA:stroke+0.5"/> > <gaze id="gzB2" target="cake1" start="A:g1:ready+0.1"/> > <face type="mouth" shape="frown" ready="gzB2:stroke"/> > </bml> I'm much more hesitant to declare we have the multi-agent problem solved. And I'm also not sure HannesV's solution is the right approach. At ISI, we have discussed, but not yet implemented, a wait behavior that response to events. Combined with a event emitting behavior, we could provide a more general event interaction:
Character A: <bml> <gaze id="gzA" target="B"/> <event msg="AgentA looking at B" start="gzA:stroke" /> <wait id="wA1" event="AgentB looking at A"/> <speech id="s1" type="application/ssml+xml" start="wA1"> give <mark name="wb1"/> me <mark name="wb2"/> the <mark name="wb3"/> cake <mark name="wb4"/> </speech> <gesture id="g1" type="point" target="cake1" ready="s1:start" stroke="s1:wb2"/> <event msg="AgentA pointing at cake1" start="A:g1:ready" /> <face type="eyebrows" amount="0.9" stroke="s1:wb3" relax="s1:wb4"/> </bml>
Character B: <bml> <wait id="wB1" event="AgentA looking at B" /> <gaze id="gzB1" target="A" start="w1+0.5"/> <event msg="AgentB looking at A" start="gaB1:stroke"/> <wait id="wB2" event="AgentA pointing at cake1" /> <gaze id="gzB2" target="cake1" start="w2+0.1"/> <face type="mouth" shape="frown" ready="gzB2:stroke"/> </bml>
This is the same interaction as HannesV provided, but now it is event based. The event language is undefined (this example assumes simple string matching), so it should hook into any simulation message passing pretty easily. The system is not limited to agent-to-agent interaction and so it can also handle reactions to world events, human interaction events (by broadcasting the results of the input device), or even simple monologue pauses (the wait can include a duration without an event).
Our wait behavior tag supports a no-event tag to determine how to proceed if the event is not encountered (effectively, a timeout mechanism). The default response (wait behavior with a duration but no no-event= attribute) is to continue as if it did happen. We can emit a message before continuing, probably to alert the agent mind, by setting no-event to"MESSAGE: message-content". Or we can abort the remainder of the whole act by setting no-event to "FAIL" or "FAIL: message-content". I could also see use for aborting only behaviors with a dependency on the wait.
Also, let me be clear, these wait event, like all other behaviors, overlap in time. That is, a wait behavior does not delay another behavior unless that behavior has a time dependency on the wait behavior. For example, while an agent is speaking, the wait behavior allows the character's gaze to react to others:
<speech id="A_s1" ... /> <wait id="A_w1" event="AgentB looking at A" start="s1:start" end="s1:end" /> <gaze id="A_g1"target="B" start="A_w1"/>
In writing these examples, I refered to the behavior ("w1") instead of the behavior's time reference ("w1:end") only because its use is currently underspecified. In the last example, I use the wait behavior end= attribute to bound the wait period, so starting A_g1 on A_w1:end seems like it binds to A_s1:end by transitive. It seems like w1:stroke is the best description of when a wait behavior receives the event. This also means the wait behavior's stroke cannot depend on other time references (the wait tag cannot have a stroke attribute), since it is inherently an unplanned time.
> Stacy and Andrew, do you want to comment further on the "par"/"seq" > compatibility or issues with regards to this?
Early on, we considered the par and seq structure of SMIL and other languages. The first problem is the notation only refers to the end points of an action. So to sync a beat gesture onto a word, you needed to break the beat gesture into two behaviors. Related to this is the lack a prepatory action. That is, you can start actions together by grouping them in parallel, but there is now description for stopping them together. While it could be added to the notation, it still assumes behaviors are broken up across their most important point, which can be tricky to animate smoothly. Instead, we describe a stroke point, along with ready and relax, to describe co-occurrence.
The more critical problem is the hierarchy of seq/par timing. Because of temporal overlap, the following cannot be described in seq/par notation:
<speech id="s1">This <tm id="wb1"/> not <tm id="wb2"/> that. </speech> <gesture type="POINT" name="right_hand_point" target="THIS" stroke="s1:ready" relax="s1:wb2" /> <gesture type="POINT" name="left_hand_point" target="THAT" start="sq:wb1" stroke="s1:wb2" />
Here, I want the right hand to remain in space between "this" and "that", and I don't want the left hand to move until I have competed my verbal reference to "this".
Anm ============================================================================ From: Hannes Vilhjalmsson <hannes_(at)_ISI.EDU> Date: 5. aprÃl 2006 20:40:12 GMT+02:00 To: Andrew n marshall <amarshal_(at)_ISI.EDU>, Brigitte Krenn Subject: Re: representations paper
Hi All,
The <wait> and <event> tags that Andrew described (I'm copying that portion of his message below) are in fact a very important addition we propose to BML. In addition to allowing agents to interact, I think it is a great way to (1) get notifications about behavior progress from the animation engine and (2) tie behaviors to non- agent events in the world.
Kris, I think this is particularily important for addressing your real-time concerns. This way you could for example know whether or not a stroke of a gesture had been reached when an interrupt occurs.
Does anyone have a better idea for solving these problems?
Cheers, -= hannes ============================================================================ From: Stacy Marsella <marsella_(at)_ISI.EDU> Subject: Re: representations paper Date: Wed, 5 Apr 2006 14:33:05 -0700 To: "Kristinn R. Thorisson" <thorisson_(at)_ru.is>
Kris
You may recall that this mechanism also came up in our Iceland working group discussion.
Stacy ============================================================================ Date: Tue, 11 Apr 2006 17:18:14 +0200 From: Stefan Kopp <skopp_(at)_techfak.uni-bielefeld.de> To: Hannes Vilhjalmsson <hannes_(at)_ISI.EDU> Subject: Re: representations paper ... ---- GESTICON ---- The Gesticon is a lexicon of behavior definitions. It is NOT a new language and its entries define behaviors (like gestures) in a detailed, player-independent from in BML. This has several consequences for the paper:
1. The Gesticon can be thought of as part of behavior planning, whose job is to first derive an abstract BML specification, where occurrence and timing of behaviors are defined, and then concretizing it to descriptions of how the behaviors look like. During BOTH steps the description being fleshed out is in BML, which must thus provide appropriate elements for both purposes. This is now described in that way in the paper (I have changed Fig. 1 and Sect. 3 accordingly). Please check!
2. Behavior realization does NOT have access to the Gesticon, since it is supposed to be player-independent. Andrew wrote that you are using a lexicon with primitives etc. For sake of modularity and exchangeability, that should actually be a lexicon on its own and only for the realizer.
3. Gesticon is not a language on its own and its constructs should be fused with BML. Sect. 5+6 now became one Sect. 4. I did NOT do any fusions of text, just changed the section structure. Brigitte, Catherine, Hannes and Andrew - can you take care of this part? I suggest to do the following: ... ---- ONGOING DISCUSSION ---- Let me address some issues shortly, along with suggestions who could contribute to the paper on this part:
-- What kinds of behavior do we address with BML? I think we agree that it is not at all clear what a communicative behavior is. However, as Brigitte has pointed out, we don't want to mix declarative definitions of behaviors with too much of a procedural definition of a script that an agent executes in the world. I suggest lets restrict FOR NOW to any behavior that may be performed in F2F-interaction. Also, BML should dscribe only a single-agent act in a single message. We can say that, at a later time, we want to come up with a more general agent scripting language and our current BML specs will be part of this. ... -- Timing In the discussion, timing has been a prominent issue and it is crucial for sure. I think we should, on the level of behavior planning and BML, ONLY use relative, event-based timing. All absolute time points would mean that we refer to the course of an eventual realization of a behavior or timeline. Also, I like the idea of having synchronization points/events, instead of par/seq elements, which is simply not as flexible (see Andrew's mail). Yet, let's not go into this discussion on multi-agent behavior and events for this one, but concentrate on events between modalities within one single-agent act. <wait> and <event> I think it's cool and important to have. ... -- Speech As Hannes, I dont agree with Brigitte that speech synthesis belongs to behavior planning (not even partly). If we want to generate prosody, this is a task for behavior planning, based on the FML description of eg rheme/theme. The result will be prosodic commands like in ToBI or SABLE, which should thus be possible to impart to a BML description. I thus agree with Andrew that having a choice of structure would be good. ... best, Stefan ============================================================================ From: Stacy Marsella <marsella_(at)_ISI.EDU> Subject: Re: representations paper Date: Tue, 11 Apr 2006 08:36:48 -0700 To: Stefan Kopp <skopp_(at)_techfak.uni-bielefeld.de>
On Apr 11, 2006, at 8:18 AM, Stefan Kopp wrote: > 1. The Gesticon can be thought of as part of behavior planning, > whose job is to first derive an abstract BML specification, where > occurrence and timing of behaviors are defined, and then > concretizing it to descriptions of how the behaviors look like. > During BOTH steps the description being fleshed out is in BML, > which must thus provide appropriate elements for both purposes. > This is now described in that way in the paper (I have changed Fig. > 1 and Sect. 3 accordingly). Please check!
That is consistent with how we use a Gesticon, so that works for me.
> 2. Behavior realization does NOT have access to the Gesticon, since > it is supposed to be player-independent. Andrew wrote that you are > using a lexicon with primitives etc. For sake of modularity and > exchangeability, that should actually be a lexicon on its own and > only for the realizer.
Currently, we don't use it - it is more proposed work. I would suggest we don't muddy the waters discussing it. ...
|