Faux Pas of an Unattended Avatar

Below is Chapter 1 from the book:
Virtual Body Language
by Jeffrey Ventrella

Faux Pas of an Unattended Avatar

Catherine was one of the most popular residents in a new virtual world that had just come onto the scene. She was creative, friendly, and entrepreneurial. She was quite attractive, and she was good at programming various aspects of her little piece of the world. But soon after she had established herself, Catherine started to get a bad reputation: people were calling her a snob. This surprised Catherine, who had always made a point to be friendly to all the residents. Despite extra attempts at being sociable, Catherine’s reputation continued.

Then one day she overheard one of the new residents talking about her: “That Catherine—she’s a snob. I had just gotten set up with my new account, and I decided to go find her, and introduce myself. As soon as I introduced myself, she turned her head to look at me, stared at me for a while without saying a word, and then turned her head back, like I wasn’t even there. How rude!”

At first it was a mystery. Catherine had no recollection of ever snubbing a newbie. She would never act this way in person, nor would she act this way in a virtual world. But eventually, Catherine figured out what was going on.

The virtual world I am describing is Second Life, and the woman is Catherine Omega. When she enters into this virtual world she takes the form of an avatar—a digital character that represents her embodiment. The software engineers at Linden Lab, makers of Second Life, designed the system so that avatars would automatically respond to the utterances of other avatars by turning their heads towards them. This was meant to make the avatars appear more natural—after all, in the real world, people usually look at each other when they are talking. But there is one problem with this notion: Second Life is not the real world. In fact, it is very different! Let me explain.

Here is a typical scenario to describe what was happening: Catherine was logged into Second Life, chatting away with other avatars, and doing the various things that people—as avatars—do in Second Life. Then Catherine (the real woman) stepped away from her computer for a moment, while Catherine (her avatar) was still standing there among other avatars. An unsuspecting newbie walked her avatar up to Catherine’s avatar and started chatting. Since the real Catherine was not present to respond with a chat, her avatar looked over at the new avatar (because of the automatic avatar “lookat” behavior). Then, because there was no communication coming from Catherine, her avatar’s lookat mode timed-out, and resumed its usual gaze at nothing in particular. Catherine’s bad reputation, it turns out, resulted from Catherine not being there. Her avatar was generating unintended body language in her absence.

Catherine, it seems, was already prone to this kind of virtual faux pas. Even though she was chatty in general, she rarely triggered avatar animations or used smileys and other visual signs of expression. Her responses tended to be short, and spaced apart in time. The reason for this is not readily apparent unless you know her personally: Catherine’s mind is like a rotating mirror ball. She typically has several instant messaging channels going at once, and is always bouncing around between simultaneous conversations. She would be the perfect poster child for the ills of Continuous Partial Attention, were it not for the fact that she is a clever blogger and self-aware master of the internet. She deftly taps the pluralistic, fragmented, connectionist intelligence of the internet, fully aware of its nature.

So, the problem lies not with the fragmented attention of Catherine the woman, but with the apparent continuous attention of Catherine the avatar. The problem of presence in virtual worlds has been researched by such scholars as Ralph Schroeder, editor of The Social Life of Avatars (2002), who uses the term “copresence” to describe not just “being there” but “being there together”. This is the locus of virtual body language.

Mediated Selves
Am I suggesting that Catherine should be more attentive to her avatar? No—that would go against the nature of avatarhood. Most would agree that users want some form of detachment. It takes a lot of time and energy to be a fulltime puppeteer, and most people would rather not be encumbered by this job. The avatar stands as a persistent external persona that allows the user to privately engage in a hundred distractions—to be sloppy, erratic, timid, anxious, or confused, or perhaps amused, sarcastic, and snickering. Or, in Catherine’s case, just plain busy. The avatar allows the user to step away for a “bio-break”. The avatar is also a way for people to express their multiple personalities, sometimes even having several avatars in-world simultaneously. While having different identities has always been a part of human social life, as highlighted by sociologist Erving Goffman in the 1950’s, the phenomenon has increased due to the prevalence of new media. Sherry Turkle, author of Life on the Screen (1995), points out that our natural multiple personalities are easily expressed and manifested in cyberspace.

Jim Bumgardner, creator of the early virtual world, The Palace, was inspired by Scott McCloud’s concept of “masking” in comics (1993). Avatars, according to Bumgardner, allow users to maintain partial anonymity. This partial anonymity allows people to relax and become more expressive and creative in their interactions.

The Identity Leash
How we design avatar systems—and how people use them—determines where the avatar lies on a continuum of control. The low-end of the continuum is mostly hands-off (the avatar is semi-autonomous). The high-end of the continuum is detailed, focused control (the user is a full-time puppeteer). Early avatar-maker Steve DiPaola refers to this continuum as the “identity leash”. With avatars, “…identity itself is a dynamic construct that, like a leash, can be pulled in tight or given generous slack. In this way, one does not have to choose between the extremes of either playing a role or strictly being oneself, but instead can meander through identity space of this role of the self” (DiPaola 2000). A long identity leash means that the avatar has lots of autonomy. It’s allowed to do things that don't represent what the user is doing from moment to moment. A short leash means that the avatar is being puppeteered with high focus and attention. Both short and long identity leashes are appropriate for different purposes and at different times. And these modes are actually reflected in our minds as we shift focus and attention from moment to moment.

Catherine prefers a long identity leash. It allows her to engage in various private activities offscreen, to still be herself, and not have to share her every twitch, her every glance, and those snickering side comments with her IM buddies. It allows her to be somewhat removed, but still maintain a perpetual proxy of herself in virtual space that she occupies on a part-time basis.

Hybrid Spaces
Let’s imagine that Catherine’s avatar had NO gaze behavior, and never turned its head in response to another avatar’s chat. On the one hand, her avatar would be boring and antisocial. But on the other hand, her avatar would not be in danger of giving false or ambiguous signals.

Now let’s step back for a moment and consider something peculiar. Catherine’s avatar responds to another avatar’s chats by shifting its gaze to that avatar’s head. Text chat in Second Life is normally configured to appear somewhere near the bottom of the screen, in a rectangular text area. So, wouldn’t it make more sense for Catherine’s avatar to set its gaze on the actual stimulus? Shouldn’t her avatar look at that 2D text window where the chats are coming in?

A chat appearing in a text window causes a gaze reaction in the 3D world

I’m asking this question, not because I think avatars should look at incoming lines of text, but simply to point out an ambiguity in the visual scene: a semi-realistic humanoid avatar makes a semi-realistic response (turning the head in response to some stimulus)… but the response is to a completely disembodied event (the creation of a text chat in an arbitrary 2D location on a computer screen). There is something wrong with this picture!

Most prominent virtual worlds incorporate the technologies of chat rooms, instant messaging, textual conversations. We have a collision: computergraphical humanoid 3D models with text-chat. These are strange bedfellows—occupying different cognitive dimensions. This is illustrated in the screenshot below of the early virtual world, Cybertown. I had noticed earlier in such worlds that people engaged in communication tend to keep their eyes focused on the text, and rarely glance up to the 3D scene. All the interesting action seems to be going on in Text Land, while the avatars stand idly like statues in a surreal wax museum.

3D avatar space contrasted to instant messaging text space in Cybertown

The avatars in the late virtual world, There.com, were able to send visual symbols (moodicons) back and forth to each other. It was a way to bring some embodied communication into the avatar space. But the way users typically triggered these visual expressions was by way of text, or “emotes”, such as :) or ‘shrug, which would appear in their chat balloons along with their normal text. But even if the user was not using text to generate these moodicons, the text would still appear in the chat balloons. WTF? So, you end up with the following strange scenario:

Generating moodicons causes redundant chat balloon text in There.com

One might argue that these convergent media collisions are the basis of innovation—and that the real solutions are yet to come. Let’s agree on one thing: the solutions to avatar interaction design will not be arrived at simply through a better understanding of human social behavior, as some might naïvely assume. It will require an understanding of the uses and constraints of these colliding technologies. Persson (2003) points out that avatar nonverbal communication doesn’t even have to be synchronous—in other words, instead of you and me expressing visually in realtime, our expressions can be sent in chunks separated by arbitrary spans of time. With asynchronous communication (Twitter, Facebook, email, etc.), users are not compelled to respond to each other at natural conversational rates. The modality of asynchronous communication permits users to customize nonverbal avatar animations with the same degree of care and craft as they might compose their emails and blog posts.

Thus, virtual body language may just as naturally arise within the scattered temporal cracks of our busy lives, being folded into similar timeslices as our text-based communications. Time is fragmented. But so is presence—those bits and pieces of identity and attention that accumulate in the nooks and crannies of the internet. It’s a paradigm shift: “…once one accepts the state of distributed presence, inevitably this means acceptance of a group consciousness, which itself shifts our perception of time and even productivity (Vesna 2004).

Help, I'm Lost And I Have No Pants
Verhulsdonck and Morie (2009) make a call for developing nonverbal communication standards in virtual worlds, yet they acknowledge that the range of communication affordances in virtual worlds causes confusion, and so these standards will have to evolve along with the technology. The technology we’re talking about is a cobbled hybrid mashup of computer games, cinema, texting, 3D rendering, and physical simulation. And the cloud (Boellstorff 2010). It is in a fledgling state. It has not yet been fully calibrated to our human natures and lifestyles. Catherine’s faux pas, and its underlying cause, has a solution. We just haven’t worked it out yet.

Susan Woita was a Customer Service employee at There.com. She recalls having to help users with some pretty strange problems. She had intended to write a book about her experiences, but never got around to it. That book was going to be called, Help, I'm Lost and I Have No Pants!!—Real Life in a Virtual World. That quote is verbatim from a message she got from a newbie. Her job was to find out why the pants were missing from his avatar. Turns out, he sold them to a scammer, and somehow didn't make the connection between that interaction and the fact that his pants were missing (Woita 2010).

In virtual reality, whether or not someone has just taken off your pants is not always obvious. The fact that gravity pulls things downward is not always a given. That has to be programmed. Or not. Internet visionary Ted Nelson sees today’s computer world as “…a nightmare honkytonk prison, noisy and colorful and wholly misbegotten” (2008). That is both a cause for anxiety and a cause for celebration. That is the honkytonk origin of avatars and virtual worlds.

References
Boellstorff, T. 2010. “Culture of the Cloud”. Journal of Virtual Worlds Research: The Metaverse Assembled. Volume 2, No 5. May, 2010.

DiPaola, S. 2000. Notes from the Siggraph 2000 Panel on Interactive Storytelling.

McCloud, S. 1993. Understanding Comics, The Invisible Art. Tundra Publishing.

Persson, P. 2003. “ExMS: an Animated and Avatar-based Messaging System for Expressive Peer Communication”. Proceedings of GROUP 2003 conference, Nov. 9-12, 2003.

Schroeder, R. (editor). 2002. The Social Life of Avatars: Presence and Interaction in Shared Virtual Environments. Springer Verlag.

Turkle, S. 1995. Life on the Screen. Touchstone.

Verhulsdonck, G., and Morie, J. F. 2009. “Virtual Chironomia: Developing Non-verbal Communication Standards in Virtual Worlds”. Journal of Virtual Worlds Research. Volume 2, number 3. October, 2009.

Vesna, V. 2004. “Community of People with No Time: Collaboration Shifts”. First Person: New Media as Story, Performance, and Game. Ed. Wardrip-Fruin, N., Harrigan, P. MIT Press.

Woita, S. 2010. personal communication. http://www.linkedin.com/in/susanwoita.

(The text above is a chapter from Virtual Body Language, by Jeffrey Ventrella)