Project Overview
Background
Relation to other Projects
People involved in the Project
References
|
|

A Platform for Multimodal Spoken Language Corpora
Proposal for the Swedish Language Technology Programme.
Contact: Prof. Jens Allwood, <jens@ling.gu.se>
The project in its planned form consists of a pilot project (currently running) and a larger, full project, for which funds are currently being applied for.
Full Project
The aim of the full project is to build a platform for a multimodal spoken language corpus, and to create such a corpus by adapting the existing Göteborg Spoken Language Corpus to this platform. The platform will consist of a set of tools enabling the establishment and use of a spoken language corpus.
The targeted users of the corpus and tools are linguistics researchers both from university and industry who wish to analyze spoken language in all its facets, ranging from visual to statistical. Results from such research can be used in the development of practical applications such as multimodal human-machine interfaces in industry, education and entertainment.
The planned tools in the platform are the following:
A tool for digitizing audiovisual corpus data
A tool for synchronous alignment of audio/video and transcribed text
- A multimodal corpus presentation tool, preferably accessible from the internet. What we have in mind is an audio/video presentation with simultaneous scrolling in a partiture-formatted transcription with highlighting of the phrase being spoken.
A transcription coding tool that will include simple corpus presentation using standard and/or partiture format, with optional use of an audio/video presentation extension.
A statistics tool able to process information from the coding tool and the corpus.
We will make use of existing tools for digitization and statistics. The project will thus be concerned with developing tools 2,3 and 4.
All implementations should be platform-independent and some should be possible to access from the internet, e.g. the presentation tool. This suggests that programming languages like Java and Tcl/Tk should be used, in conjunction with Prolog.
Pilot Project (currently running)
The pilot project has the following aims:
Investigation and evaluation of avalilable multi-modal corpora, such as the Map Task Corpus [1], and of possible shell environments for the platform, e.g. GATE and the SVENSK grammar [2].
Digitalization of video and audio tapes. A part of the large library of video and audio tapes existing at the department of linguistics and constituing the source of the transcribed corpus will be digitized and stored on CD-roms. Some recordings are audio only, but most are both video and audio.
A suitable format needs to be decided on, taking into account conflicting requirements such as sufficient resolution and effective use of memory space.
Specification of a tool for synchronization (alignment) of audio/video and text. This requires specification of a format for mapping between points in the transcription and points in the video/audio file. Also, a tool should be implemented enabling semi-automatic or (if possible) automatic alignment. Preferably, several different methods of alignment should be available.
The synchronization tool uses an audio/video viewer tool. This tool should be a standalone application that can be incorporated as a module in the coding tool and the multimodal corpus presentation program, as well as in other future tools that are to be used on the multimodal corpus.
Prototypes of both of these tools have already been implemented in Tcl/Tk, AppleScript and FaceSpan (the audio/video viewer tool prototype is currently available for the Macintosh only).
Specification of a multimodal corpus presentation tool. This presentation should include several transcription formats, audio/video and possibly also some coding (see below). The end user should have access to several settings for controlling the presentation, which should be available using a WWW browser such as Netscape or Internet Explorer. Java seems like a good candidate for implementing the presentation tool.
In the case of presentation of coding, the presentation tool should be able to read and represent coding made in any of several different formats, e.g TagLog [6] format, GATE format or simple tables.
Specification of a transcription coding tool. This application will include simple corpus presentation using standard and/or partiture format, with optional use of the audio/video presentation extension. This extension will be especially helpful when coding e.g. gestures and facial expressions.
The coding tool will allow interactive creation of tagging schemas and easy coding, requiring no special knowledge in programming or computers. In relation to TagLog, this can be seen as an updating client specialized for coding interaction (including e.g. gestures) in spoken language dialogue transcriptions in the Göteborg Spoken Language Corpora. The end result of coding will be TagLog databases (classification theories) which than can later be analyzed using TagLog or a separate statistics tool and visualized using the presentation tool.
To implement this in a short time, a scripting language like Tcl/Tk may be suitable. A prototype of the coding tool, TRACTOR, is being implemented in Tcl/Tk and will shortly be available for Macintosh and UNIX.
[ Previous |
Next ]
This page was built on an Apple Macintosh. Last modified 1997-10-15. Copyright © 1997, Dept. of Linguistics, Göteborg University. Comments to the webmaster: dario@cling.gu.se.
|