Rich Representation Language

The Rich Representation Language often abbreviated as RRL is a computer animation language specifically designed to facilitate the interaction of two or more animated characters.^[1]^[2]^[3] The research effort was funded by the European Commission in the framework of the NECA Project.^[3]

The language design specifically intended to lessen the training needed for modeling the interaction of multiple virtual agents and to manage much of the animation via words perhaps as voice. Due to the interdependence of components such as facial features on the spoken words, no animation is possible in the language without the text context.^[4]

Language design issues

The application domain for RRL consists of scenes with two or more virtual characters. The representation of these scenes requires multiple information types such as body postures, facial expressions, semantic content of conversations, etc. The design challenge is that often information of one type is dependent on another type of information, e.g. the body posture, the facial expression and the semantic content of the conversation need to coordinate. An example is that in an angry conversation, the semantics of the conversation dictate the body posture and facial expressions in a distinct from which is quite different from a a joyful conversation. Hence any commands within the language to control facial expressions must inherently depend on the semantic content of the conversation.^[3]

The different types of information used in RRL require different forms of expression within the language, e.g. while semantic information is be represented by grammars, the facial expression component requires graphic manipulation primitives.^[3]

A key goal in the design of RRL was the ease of development, to make scenes and interaction construction available to users without advanced knowledge of programming. Moreover, the design aimed to allow for incremental development in a natural form, so that scenes could be partially prototyped, then refined to more natural looking renderings.^[3]

Scene description

Borrowing theatrical terminology, each interaction session between the synthetic characters in RRL is called a scene. A scene description specifies the content, timing, and emotional features employed within a scene. A specific module called the affective reasoner computes the emotional primitives involved in the scene, including the type and the intensity of the emotions, as well as their causes. The affective reasoner uses emotion dimensions such as intensity and assertiveness.^[3]

Although XML is used as the base representation format, the scenes are described at a higher level within an object oriented framework. In this framework nodes (i.e. objects) are connected via arrows or links. For instance, a scene is the top level node which is linked to others. The scene may have three specific attributes: the agents/people who participate in the scene, the discourse representation which provides the basis for conversations and a history which records the temporal relationships between various actions.^[3]

The scene descriptions are fed to the natural language generation module which produces suitable sentences. The generation of natural flow in a conversation requires a high degree of representational power for the emotional elements. RRL uses a discourse representation system based the standard method of referents and conditions. The affective reasoner supplies the suitable information to select the words and structures that correspond to specific sentences.^[3]

Speech synthesis and emotive markers

The speech synthesis component is highly dependent on the semantic information and the behavior of the gesture assignment module. The speech synthesis component must operate before the gesture assignment system because it includes the timing information for the spoken words and interjections. After interpreting the natural language text to be spoken, this component adds prosodic structure such as rhythm, stress and intonations.^[3]

The speech elements, once enriched with suitable prosodic information and emotional markers are passed to the gesture assignment system.^[3] RRL supports three separate aspects of emotion management. First, specific emotion tags may be provided for scenes and specific sentences. A number of specific commands support the display a wide range of emotions in the faces of animated characters.^[3]

Secondly, there are built in mechanisms for aligning specific facial features to emotive body postures. Third, specific emotive interjections such as sighs, yawns, chuckles, etc. may be interleaved within actions to enhance the believability of the character's utterances.^[3]

Gesture assignment and body movements

In RRL the term gesture is used in a general sense and applies to facial expressions, body posture and proper gestures. Three levels of information are processed within gesture assignment:^[3]

Assignment of specific gestures within a scene to specific modules, e.g. "turn taking" being handled in the natural language generation module.

Refinement and elaboration of gesture assignment following a first level synthesis of speech, e.g. the addition of blinking and breathing to a conversation.

Interface to external modules that handle player-specific renderings such as MPEG-4 facial animation parameters (FAPs).

The gesture assignment system has specific gesture types such as body movements (e.g. shrug of shoulders as indifference vs hanging shoulders of sadness), emblematic movements (gestures that by convention signal yes/no), iconic (e.g. imitating a telephone via fingers), deictic (pointing gestures), contrast (e.g. on one hand, but on the other hand), facial features (e.g. raised eyebrows, frowning, surprise or a gaze).^[3]

References

'^ Intelligent virtual agents: 6th international working conference by Jonathan Matthew Gratch 2006 ISBN 3540375937 page 221
^ Data-driven 3D facial animation by Zhigang Deng, Ulrich Neumann 2007 ISBN 1846289068 page 54
^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l ^m ⁿ P. Piwek, et. al RRL: A Rich Representation Language for the Description of Agent Behaviour in "Proceedings of the AAMAS-02 Workshop on Embodied conversational agents", July 16 2002, Bologna, Italy.
^ Interactive storytelling: First Joint International Conference, edited by Ulrike Spierling, Nicolas Szilas 2008 ISBN 3540894241 page 93

[Gratch-1] '^ Intelligent virtual agents: 6th international working conference by Jonathan Matthew Gratch 2006 ISBN 3540375937 page 221

[2] Data-driven 3D facial animation by Zhigang Deng, Ulrich Neumann 2007 ISBN 1846289068 page 54

[Piwek-3] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l ^m ⁿ P. Piwek, et. al RRL: A Rich Representation Language for the Description of Agent Behaviour in "Proceedings of the AAMAS-02 Workshop on Embodied conversational agents", July 16 2002, Bologna, Italy.

[4] Interactive storytelling: First Joint International Conference, edited by Ulrike Spierling, Nicolas Szilas 2008 ISBN 3540894241 page 93

[1]

[2]

[3]

[4]

Language design issues

Scene description

Speech synthesis and emotive markers

Gesture assignment and body movements

See also

References