Click here for Other situated corpora collected in the SLaTe lab
OSU Quake 2004 Dialog Corpus:
A publicly available corpus of collaborative dialog in a virtual world
This small corpus is intended for researchers who want to study
collaboration between humans working on a situated task. In this
case, two human partners perform a treasure-hunt task in a
graphically-rendered virtual world that is portrayed on their computer
monitors. The partners spoke to each other in order to coordinate
their activity on the task. The problem domain was chosen to simulate
the search-and-rescue domain.
Here's a little sample of the interaction in this domain.
The corpus has the following properties:
- Two-party dialog about solving a search-and-rescue style task.
- The partners could move about in the world independently, so their knowledge of the world diverges.
- The partners could see each other and also see objects and events in the world, so they frequently make exophoric reference to items in the world.
- The partners could both perform the same actions in the world, so the initiative for completing the task is left open.
- Dialog behaviors such as turn-taking and disfluencies happen naturally.
The technical report OSU-CISRC-8/05-TR57 describes the data collection conditions, subject instructions, etc.
If you publish results that include the use of this corpus, please cite:
Donna K. Byron and Eric Fosler-Lussier.
"The OSU Quake 2004 corpus of two-party situated problem-solving
dialogs."
In Proceedings of the 15th Language Resources and Evaluation
Conference (LREC'06), 2006.
» PDF, BibTeX
The corpus
To date we have transcribed and annotated 5 problem-solving
sessions. The data is available to any researchers who want to use it it subject
to the terms of use.
Links to tools we used
If you would like to perform a similar experiment, here are some of the tools you will need:
- Quake II software. The game engine client/server
code and the rendering software are opensource, and they can be
downloaded from idsoftware.com. You also need
the game pak files on the QuakeII original game CD, which can be
purchased on the used market. We used QuakeII because it is opensource, but you
could also use NeverWinter Nights, Halo, HalfLife, or Unreal Tournament. These
are all game engines that are used by AI researchers to study embodied
interaction.
- QE Radiant or Quark are the level editing tools we used to build the little
quake world. We used wally to make the wall textures.
-
Transcriber from ldc can be used to align the transcript with the
audio stream.
-
Soundscriber is a tool for transcription that automatically loops
each section of audio a number of times. It makes transcribing easier.
This distribution only runs on Windows.
- Annotation tools: mmax
- Audio Alignment: sonic
-
Our audio recording equipment setup is described in the TR.
Local Resources
Guidelines for transcribing the audio
 |
SLaTe: Speech and LAnguage TEchnologies Lab
Dept. of Computer Science & Engineering
580 Dreese Labs
The Ohio State University
|
All content on this page is Copyright © 2005, THE OHIO STATE UNIVERSITY