Click here for Other situated corpora collected in the SLaTe lab
OSU Quake 2004 Dialog Corpus:
This small corpus is intended for researchers who want to study
collaboration between humans working on a situated task. In this
case, two human partners perform a treasure-hunt task in a
graphically-rendered virtual world that is portrayed on their computer
monitors. The partners spoke to each other in order to coordinate
their activity on the task. The problem domain was chosen to simulate
the search-and-rescue domain.
A publicly available corpus of collaborative dialog in a virtual world
Here's a little sample of the interaction in this domain.
The corpus has the following properties:
The technical report OSU-CISRC-8/05-TR57 describes the data collection conditions, subject instructions, etc.
If you publish results that include the use of this corpus, please cite:
- Two-party dialog about solving a search-and-rescue style task.
- The partners could move about in the world independently, so their knowledge of the world diverges.
- The partners could see each other and also see objects and events in the world, so they frequently make exophoric reference to items in the world.
- The partners could both perform the same actions in the world, so the initiative for completing the task is left open.
- Dialog behaviors such as turn-taking and disfluencies happen naturally.
Donna K. Byron and Eric Fosler-Lussier.
"The OSU Quake 2004 corpus of two-party situated problem-solving
In Proceedings of the 15th Language Resources and Evaluation
Conference (LREC'06), 2006.
» PDF, BibTeX
To date we have transcribed and annotated 5 problem-solving
sessions. The data is available to any researchers who want to use it it subject
Links to tools we used
If you would like to perform a similar experiment, here are some of the tools you will need:
- Quake II software. The game engine client/server
code and the rendering software are opensource, and they can be
downloaded from idsoftware.com. You also need
the game pak files on the QuakeII original game CD, which can be
purchased on the used market. We used QuakeII because it is opensource, but you
could also use NeverWinter Nights, Halo, HalfLife, or Unreal Tournament. These
are all game engines that are used by AI researchers to study embodied
- QE Radiant or Quark are the level editing tools we used to build the little
quake world. We used wally to make the wall textures.
Transcriber from ldc can be used to align the transcript with the
Soundscriber is a tool for transcription that automatically loops
each section of audio a number of times. It makes transcribing easier.
This distribution only runs on Windows.
- Annotation tools: mmax
- Audio Alignment: sonic
Our audio recording equipment setup is described in the TR.
Guidelines for transcribing the audio
All content on this page is Copyright © 2005, THE OHIO STATE UNIVERSITY
SLaTe: Speech and LAnguage TEchnologies Lab
Dept. of Computer Science & Engineering
580 Dreese Labs
The Ohio State University