Thinking About Accessibility

I attended a demo today in Saint Paul College’s virtual Wonderland classroom that showed how you can do a loose integration of a speech recognizer and the Wonderland chat pane. Computer Careers instructor Eric Level has a student who is deaf in his introductory Java class, so he decided to try using Dragon NaturallySpeaking to automatically transcribe everything he said in the class. To accomplish this, he brought up Wonderland on a laptop with Dragon installed and set input focus to the chat pane. Everything he said appeared in the chat pane, as shown below.

Speech recognition output in Wonderland chat pane

It’s a feature of the Dragon application that speech recognition results can be output to any application that accepts text input.

While this technique does work, it got me thinking about how one might design a more streamlined solution. One idea is to put the recognizer on the server with the voice bridge so that everyone’s audio can potentially be transcribed, not just the presenter’s. This would not only eliminate the need for a separate computer to devote to the recognizer, but it would also be possible to annotate the recognition stream with the names of the speakers since the voice bridge can determine who is speaking at any given time. The downside to recognition on the server is that the audio may not be as clear as it would be if captured locally on the client. Also, it might not be as easy to train the recognizer on the presenter’s voice.

From the user interface point of view, in 0.5 it might be interesting to build a "subtitle channel" that users could subscribe to. Rather than have the recognition output typed into the chat window which everyone sees, it could be displayed in an optional HUD panel, which could be resized, positioned, or magnified as needed.  This would leave the chat pane available for back-channel communication among in-world participants.

Providing accessibility to Wonderland for users with a range of disabilities is an important area of focus. For anyone who might be interested in doing a project in this area, here are a few open source resources to start with:

Sphinx-4 – open source speech recognizer

eSpeak or FreeTTS – open source speech synthesizers

GNOME Accessibility Project – accessibility solutions for graphical user interfaces

Advertisements

One Response to Thinking About Accessibility

  1. larrytek says:

    I don’t know how large these learn databases get but perhaps one idea is to have the user’s database accessible through either an "inventory" item carried by the avatar, or stored on the remote server and associated to the avatar name. Perhaps have 2 ways to populate the database: 1. upload a database already created, or 2. have an in-world trainer that can scroll text and record your voice at the push of a button.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: