1. Project Objectives
Spoken language dialogue systems, such as automated telephone enquiry systems and hands-free in-car device control, are rapidly becoming a commercial reality. SIRIDUS aims to improve the understanding of what is required to provide reusable, robust and user-friendly spoken dialogue systems. The project demonstrators include an automated telephone operator, and an integrated toolset for dialogue researchers.
Particular concerns in SIRIDUS are:
- achieving robustness when user utterances are unpredictable, and speech recognition is noisy
- showing that generic strategies for dialogue management can be applied to a wide range of dialogues including "command" dialogues and negotiative dialogues, not just information seeking dialogues.
- providing architectures which allow appropriate sharing of information between modules, for example, enabling dialogue systems to generate appropriately stressed output e.g. Did you mean the KITCHEN light or the HALL light vs. Did you mean the kitchen LIGHT or the kitchen FAN?
The partners in Siridus are the Universities of Gothenburg, Saarland and Seville, and Telefonica. SRI was the coordinator for the first two years. In the final year the administrative coordination was undertaken by the University of the Saarland, and the scientific coordination by the University of Gothenburg. To provide continuity, Gothenburg subcontracted Linguamatics Ltd. to provide the former SRI personnel to work on the project in the final year.
The partners contributed the following expertise:
|University of Gothenburg:||advanced, flexible dialogue, Swedish language|
|University of Seville:||dialogue management and language engineering, Spanish language|
|Telefonica I+D:||speech recognition and text to speech in Spanish|
|SRI International:||language engineering, project management|
|University of the Saarland:||information structure and advanced spoken understanding|
2. User Requirements and Market Prospects
The market for dialogue systems has been developing rapidly, with proven return on investment in the call centre market. The Voice XML standard is increasingly used for information seeking dialogues, and incorporates some flexibility, enabling users to answer more than one question at once. By pushing the limits of the kinds of flexibility that can be achieved in practical systems, the Siridus project provides a good basis for the future development of user-friendly dialogue systems, not only for information seeking dialogues but also for command and control, negotiative, and tutorial dialogue. By emphasising reconfigurability and robustness, we also hope to meet the challenge of providing greater user-friendliness without incurring lower reliability or higher deployment costs.
The market for telephone systems that allow dialling by name has been shown by voice operated personnel assistants such as Wildfire. By allowing more natural voice based exchanges the Siridus telephone demonstrator provides a similar service that is appropriate for untrained users in a corporate environment. Deliverable D3-1 provided user requirements for the telephone operator system, and architecture requirements for advanced dialogue systems were provided in Deliverable D6-1.
3. Project Results and Achievements
3.1 Research Innovation
The Siridus project has provided innovative research that is applicable for real systems in the near term. Here we will outline what we see as some of the key theoretical results.
Information state view of dialogue
The Siridus project adopted an overall view of dialogue management in terms of information state update. The key to this approach is to provide structured .information states.. These can be regarded as modelling the mental states of the dialogue participants. For dialogue systems, this approach encourages the use of abstract data structures to encode dialogue state (rather than unanalysed program states). We believe that this not only allows better comparison of systems and theories, but also makes for more transparent and maintainable dialogue systems. It also makes it very simple to experiment with using state information in unconventional ways in improving recognition or synthesis. Key work in Siridus has been to use the information state to control prosody in synthesised output (see Deliverable D5.1).
Analysis of a broad range of dialogue phenomena
Much spoken dialogue research has concentrated on information seeking dialogue (especially flight booking). Siridus was keen to broaden the research, and has performed data collection, analysis and implementations for natural command and negotiative dialogue. Natural command language dialogues immediately take us outside simple slot filling based on filling in multiple parameters for a single task, since multiple tasks are often specified in the same utterance e.g. "Call Heather and transfer incoming calls to Peter". The work on negotiative dialogue is challenging to conventional views of how dialogue is structured. Corpus examples show that questions can remain unanswered, and that users negotiate with the agent as to which parameters they want to fix. This requires new ways for a system to structure a dialogue. We also discussed how the challenges arising in modelling tutorial dialogue could be met. This concerns, in particular, hinting as a tutorial strategy.
User centred dialogue
Siridus has developed new approaches to allow more user centred dialogue. Primarily this has focussed on allowing users a wider range of expression: providing corrections, or asking questions. It has also included work on providing more informative system responses. For example, if there is no flight at 4pm on US Air it is more helpful to say that there is a flight at 5pm than just say that no flights are available. A range of dialogue phenomena is examined in Deliverable D1.4.
The Siridus project investigated the use of task-based methods for robust interpretation (such as keyword spotting) and reconstruction methods based on putting together parse tree fragments. Two novel techniques were developed. The first is a .semantic chart" which provides a distributed semantic representation. This enables the use of information from keywords, linguistic structure and the information state within a unified framework (see Deliverable D4.1). The second technique is ``semantic-based composition". This uses ontological information to provide connections between concepts, and hence between utterance fragments (see Deliverable D4.4).
3.2 Project Deliverables
All deliverables have been provided on time according to the original Technical Annex, and the new Technical Annex agreed after SRI.s withdrawal. Note that in the new Technical Annex, Deliverable 5.2 was replaced with Deliverable 1.4 according to the advice of the mid-term reviewers.
Key deliverables since the last review are D1.4, D2.3, D3.3, D3.4, D4.4, D5.1 and D6.4
D1.1 Dialogue Moves in Natural Command Languages
This reports on the notion of a dialogue move as it can be applied to Natural Command Languages. The notion is examined theoretically and in the light of a corpus of collected user utterances in Spanish for the telephone operation scenario. A classification scheme is proposed, and applied to example dialogues.
D1.2 Dialogue Moves in Negotiative Dialogues
This reports provides a theoretical analysis of negotiative dialogue, plus a look at concrete case studies in travel planning and calendar appointment negotiation. The deliverable also includes some novel theoretical ideas for analysis/implementation of negotiative dialogues.
D1.3 Dialogue Move Specifications for the Dialogue Move Engine
This report specifies the information state updates required for negotiative dialogues and natural command languages. For negotiative dialogues, the report concentrates on the scenario where the user negotiates with the system over which parameters to supply for a particular task (e.g. an arrival time, rather than a departure time). The domain chosen is travel planning. For natural command languages, the report specifies how GODIS can be used for the telephone operation scenario.
D1.4 Flexible Dialogue
This deliverable presents work on flexible dialogue management, i.e. general mechanisms needed for dealing with dialogue phenomena that fall outside the scope of current commercial systems, such as those based on (plain) VoiceXML. Examples include feedback, reraising, clarifications and conditional responses.
D2.1 Associating the Dialogue Move Engine with Speech Input
This deliverable discusses some of the issues in using information states to improve recognition performance. It also provides a grid containing a summary of features of competing recognisers.
D2.2 Associating the Dialogue Move Engine with Speech Output
This report outlines the considerations when choosing and interfacing a speech synthesiser to a baseline system. There is a summary of features of competing synthesisers, and these are matched to the particular requirements of a dialogue system.
D2.3 Possibilities for Enhancing Speech Recognition by Consulting Information States
This deliverable looks at ways in which speech recognition can be improved by the introduction of knowledge from dialogue state information. It discusses the integration of speech recognition with semantics and the dialogue manager and also looks into ways in which - with less integration - state information can be used to choose the correct utterance from an n-best list.
D3.1 User Requirements on a Natural Command Language Dialogue System
This report provides discusses general user requirements for an automated Spanish-language telephony assistant, and the functions required by users. There is also a description of the system requirements. Sample dialogues and corpora are presented and three similarly targeted systems (Bell Labs, Watson and Wildfire) are examined.
D3.2 Design of a Natural Command Language Dialogue System
This report provides a detailed design for the telephony assistant dialogue system. The design includes the hardware architecture, a multi-agent distributed software architecture, the dialogue manager, and the natural language understanding modules. The report also includes a complete list of the dialogue moves and types for the application domain.
D3.3 Implementation of a Natural Command Language Dialogue System
This document describes the implementation of the telephony assistant dialogue system. Hardware and software architecture are described, as well as the implementation of the different software agents, including a detailed description of the dialogue manager. An installation and a user guide complete the description of the implemented system.
D3.4 Evaluation of Contribution of the Information State Based View of Dialogue
This deliverable describes the information state approach and the two instantiations developed during the Siridus project, the TrindiKit and the Delfos system. These are described at 4 levels: framework, basic-system, genre-specific, and application level. The deliverable shows how the information state view has informed and influenced the implementations.
D4.1 Robust Linguistic Processing Architecture (baseline)
This report describes two approaches to robust linguistic processing. The first approach aims to repair the input to get a fully specified syntactic analysis. In contrast, the second approach works directly with partially specified syntactic analyses, and tries to balance partial information from the utterance against what is expected from the context. The report concludes by discussing whether it makes sense to combine the approaches, and how this might be done.
D4.2 Adding Linguistic Value to Task-centred Dialogue Management
This report examines the nature of a linguistically motivated dialogue system, and the relationship between linguistic motivation and the generality/robustness of different approaches. The report examined the reasons for many demonstrators being linguistically oriented only in that they use computational linguistics technology rather than demonstrating a systematic, theoretical approach. The main focus however was on methods of improving robustness, flexibility etc. in a genuinely linguistically motivated framework. The discussion centres on selection methods to choose a particular linguistic analysis from a set of hypotheses, using non-linguistic methods, typically task-specific information.
D4.3 Adding Task-centred Information to Linguistically Orientated Dialogue Management
This report describes a task-centred dialogue system that has been augmented with linguistic knowledge and processing. It builds on the work described in D4.1 and describes a phrase spotting component of a dialogue system which uses additional sources of information wherever possible; these include linguistic knowledge, task-centred knowledge, etc., to produce an analysis which gracefully degrades from full linguistically structured output to the spotting of single words.
D4.4 Exploiting the Advantages of Task and linguistically Orientated Dialogue Management
This deliverable examines different approaches to interpreting spoken utterances, and establishes some of the circumstances which favour the use of grammar based approaches, keyword spotting, or a full semantics. We discuss two novel techniques that have been developed in Siridus. The first is a ``semantic chart" which provides a distributed semantic representation. The second technique is ``semantic-based composition". This uses ontological information to provide connections between concepts, and hence between utterance fragments.
D5.1 Improving System Output Using the Information State (Associating Information Structure with Prosodically Varied Speech Output)
The default intonation patterns provided by current speech synthesisers often differ from what is required in a particular context. This deliverable considers the use of the information state as a resource for deciding on variations in the realisation for generated utterances, in particular for deciding on the appropriate prosody.
We employ information structure as a level of meaning representation that unifies a range of interacting contextually dependent aspects of utterance realization. We define a set of rules for determining the information structure partitioning of utterance meaning according to the information state in the GoDIS system. Our main concern is contextually appropriate variation of prosodic realization, but we also discuss the use of information structure to determine word order and to choose between full vs. short utterances.
For the generation of contextually varied spoken output, we use off-the-shelf text to speech synthesis systems for which we define mappings from our internal information structure annotation to intonation annotation. We describe the variations we can produce in out experimental implementations for English and German in the GoDIS system and for Spanish in the system developed at Telefonica.
D6.1 SIRIDUS System Architecture and Interface Report (baseline)
This deliverable discuss the requirements for a dialogue system architecture for the Siridus project and, after evaluating different architectures (DARPA Communicator, Open Agent Architecture, Verbmobil and Trindikit) a first proposal for the architecture is made.
D6.2 Implemented SIRIDUS system Architecture (baseline)
This report describes the baseline Siridus system architecture. After describing the architecture of asynchronous Trindikit, which the Siridus architecture is based on, the different components of this architecture are described. User requirements are discussed, and a validation of the architecture against TRINDI Tick-List and DISC grids is made.
D6.3 SIRIDUS System Architecture and Interface Report (Enhanced Version)
This deliverable gives a description of the enhanced Siridus conceptual architecture and of the systems developed respecting this conceptual architecture. Trindikit, an integrated toolkit for building and experimenting with dialogue systems, is presented.
D6.4 Implemented SIRIDUS System Architecture (enhanced)
This deliverable consists of TrindiKit 3.0 and documentation. TrindiKit is a toolkit for building and experimenting with dialogue systems based on the Information State Update approach, intended to support reusability, reconfigurability and flexible dialogue. Note that this is a draft for the deliverable due in month 36.
D7.1 Installation of Current Trindi Software at Telefonica and the University of Seville
This was a very brief report outlining the steps involved in installing the Trindikit at Telefonica and Seville.
3.3 Project Demonstrators
The Siridus project has built two main demonstrators. The first is the telephone operator dialogue system mentioned above. This demonstrator is in Spanish, and allows a user to conduct a dialogue such as the following (translated from Spanish)
U: Hello I would like to place a collect call
The demonstrator allows a user to call people by name (e.g. "Phone Fred Smith") to transfer calls (e.g. "Transfer my calls to Fred Smith") and to arrange conference calls. This saves the effort of first looking up e.g. the corporate directory over the web before making a call. The dialogue history is also used to enable functions that are not primitive operations of the PABX exchange e.g. "retry last call". The deliverable is described in detail in Deliverable D3.3.
S: Please specify a destination for the collect call
U: To the number 123456789
S: Placing the collect call. Would you like to continue?
U: Yes please
S: Please specify a function
U: I would like to transfer my calls to Juan Perez
The second demonstrator is a toolkit, TrindiKit, for dialogue researchers, based on the Information State update view of dialogue. It enables implementation and comparison of different theories of dialogue, and experimenting with new modules e.g. interpreters or generation components which access dialogue state information such as the last move. TrindiKit provides a flexible and general system architecture including an extensible library of modules, allowing dialogue researchers to plug in their particular module to test its effect on a whole system. Existing modules include shells for speech recognition (e.g. Nuance and IBM ViaVoice) and synthesis (e.g. Nuance Vocaliser, ViaVoice, and Festival). TrindiKit can run serially or asynchronously, and there is an interface to SRI's Open Agent Architecture designed to make it easier to plug and play different components in different programming languages. TrindiKit is supported under Windows, Linux and Solaris. The current Trindikit can be downloaded from the Siridus website.
On top of TrindiKit there are several example dialogue systems and applications, including a telephone operator system in Spanish, and a travel booking system in Swedish and English. Further demonstrators illustrate particular pieces of theoretical work. Robust interpretation is incorporated within the Linguamatics home demonstrator. Modules for controlling intonation, and for generating conditional responses have been implemented within the TrindiKit.
The project results have been disseminated widely in the computational linguistics and dialogue community through refereed conference papers presented at ACL2000, Gotalog 2000, Bi-Dialog 2001 NAACL 2001, SEPLN 2001, WISP 2001, IJCAI 2001, SIGDIAL 2002. Edilog 2002, and TSD2002. Journal articles and book chapters are in preparation.
A particularly good showcase for the project was the advanced course, .The Information State Approach to Dialogue Management: Theory and Implementation., which was held as part of the European Summer School in Logic Language and Information in Helsinki 2001. Work in both Siridus and its predecessor project Trindi was presented to a wide audience.
The project also organised two workshops. The first, in Seville in April 2002, invited three leading experts in dialogue systems, Johanna Moore, Candy Sidner and David Traum for in depth comparison and discussion of Siridus results. The second, a participants' forum at European Research 2002 in Brussels, presented the Siridus results to a less specialist audience. The project also had an invited presentation at the Edilog 2002 conference (semantics and pragmatics of dialogue).
The project created an International Consultation and User Group (ICUG). All members were informed of project deliverables and of major project events. The project web site includes links to electronic versions of relevant publications and project deliverables. We intend to augment this with a project showcase including examples of demonstration systems before the end of the project. The site includes instructions for downloading the latest version of the Trindikit that was developed during the Siridus project. There have been 75 registered downloads. Trindikit is now also available through SourceForge.
5. European added value
The project has brought together complementary expertise from across Europe. The European dimension of the project has also enabled a multilingual perspective, with demonstrators in Spanish, Swedish and English. The work on prosody has been especially informed by cross-linguistic concerns, in particular, the differing ways of realising focus phenomena in German, Swedish, Spanish, Czech and English.
6. Outlook and Exploitation
The Siridus project is a basic research project, and much of its exploitation will be indirect through improvements to European research in advanced spoken dialogue. We believe the project was especially timely, with increasing interest in finding ways to move from detailed descriptions of potential dialogues, to more natural flexible dialogue based on abstract descriptions of a task or domain. By emphasising reconfigurability and robustness, the project has provided techniques that should provide more user-friendly dialogues without incurring lower reliability or higher deployment costs.
Several partners in the EU 5th Framework D.Homme project used the generic techniques developed in Siridus as the basis for demonstrators of spoken dialogue for control of networked home appliances. The project results have also fed into the Homey EU project, which is investigating spoken dialogue in medical applications.
The Siridus partners are keen to continue further with the research, and are actively exploring ways of working together within the 6th Framework Programme.
The Siridus project was a successful project with well disseminated research results, and the development of software tools which are being much used within the research community. We believe it has provided good foundations for the next generation of spoken dialogue system.
8. Annex: Publications and deliverables
The SIRIDUS publications and deliverables are listed and can be downloaded here.