|
1. Project Overview Automated spoken dialogue provides a natural way of communicating with networked home devices. This has applications for the elderly and disabled, but will also have wider
applications as homes become networked as a matter of course, and we become used to services being provided by multiple devices. The D'Homme project addresses the theoretical challenges in language understanding and
dialogue management for controlling and querying multiple networked devices from inside or outside the home, but stops short of issues such as microphone placement or speaker identification (we assume the use of a
mobile phone or a hand held remote controller with integrated microphone).The partners in D'Homme are the Universities of Gothenburg, Edinburgh and Seville, and the companies, SRI, netdecisions and Telia. The partners contributed the following expertise:
- Edinburgh: advanced knowledge representation and reasoning, language engineering.
- Gothenburg: advanced, flexible dialogue, Swedish language.
- Seville: dialogue management and language engineering. Spanish language.
- Telia: advanced dialogue and action management, networking real devices.
- netdecisions: language engineering (grammar based models), English language
- SRI: language engineering (statistical models), project management
Given the exploratory nature of this one year project, and the existence of considerable background at each site, we decided that partners should adapt their own dialogue systems rather than attempting to build a
single new system. However, partners were encouraged to share modules wherever possible, and this has been widespread: partly due to the use of an agreed functional architecture, partly due to different partners
concentrating their efforts on different modules. The results of the project have been considerable in such a small time frame. There have been publications in 6 different refereed workshops/conferences,
and a large number of substantive project reports outlining the project achievements and challenges. The project demonstrators show the feasibility of spoken dialogue for networked devices, and show the directions for
the future. They have had wide visibility, both in presentations to companies and at academic events. The
Audio-Visual presentation gives a less-technical introduction to the project, and short slide shows of some of our demo systems in operation are available:
(Note that the HTML presentations have been generated using Powerpoint and may not work on all versions of all browsers. The Powerpoint presentations all contain audio.) 2. Project Objectives D'Homme aimed to facilitate the use of natural spoken language
interfaces in communicating with small programmable devices in the home environment. Our key research questions were:
- What sorts of dialogues will humans want and be able to have with networked domestic programmable devices?
- What processing architectures and representations best support such dialogues?
- What demands do reconfigurable device networks and language processing components impose?
Specific objectives were to:
- Build baseline demonstrators in English, Spanish and Swedish as proof of concept
- Explore and evaluate methods for reconfigurability in the light of plug-and-play device networks
- Evaluate our progress, consortium profile and user involvement strategy with a view to a second, more ambitious project
3. Project Results and Achievements The project as a whole has
shown the feasibility of using spoken dialogue with networked devices. It has also stimulated research in advanced dialogue more generally. The home control domain is very different from traditional information
seeking domains, and challenges many assumptions about the way in which spoken dialogue systems should be constructed. User-initiatated complex commands, such as "turn on the light and the heater", which may require
follow up, do not fit easily into the form filling paradigm exemplified by the current standard for dialogue systems, VoiceXML. Dealing with plug and play issues, where devices may be plugged in and out of a network
provides a huge challenge to conventional approaches to language modelling and knowledge management in dialogue systems.To address the plug and play issue for knowledge management, the D'Homme project
has provided proposals involving distributing information between individual devices or device types, involving the use of multiple dimensional inheritance to ensure maximum reuse of information, and involving the use
of semantic networks to conveniently embody necessary inferences (e.g. a dimmer is a kind of light). To address plug and play for language modelling in grammar based systems there is a proposed solution and
implementation using a high level generic grammar, with specialisations via a feature system. For statistical language modelling, the challenges are greater, and we are investigating hybrid statistical/grammar
based solutions. The key objectives of the project have all been met. We collected a corpus of commands and queries by getting users to interact with early versions of the project demonstrators. A wizard of
Oz data collection effort was beyond the scope of a one-year project. We have provided discussion of the architectures and representations required in this domain, especially with regard to reconfigurability. Finally we
have provided demonstrators in English, Spanish and Swedish which are much more advanced than the baseline demonstrators anticipated. Brief outlines of the public project deliverables
are provided below, followed by some discussion of the project dissemination activities. D1.1 Standards in Home Automation This deliverable examines leading emerging industry standards for common device interfaces and abstractions in Home Automation, and VoiceXML, the current standard for
dialogue specification. It concludes by discussing how these standards might be leveraged in the D'Homme project.D2.2 A D'Homme Demonstrator in English, Spanish and Swedish D2.2
describes the implementations built by each site (and across sites) and highlights the areas of particular interest for each, giving examples of each system in use for the command and control of real and simulated
devices. The functional architecture common to the systems is laid out along with the interfaces to key modules. Finally, the way in which modules have been put together to form each demonstrator system is discussed. D3.1 Configuring Linguistic Components in a Plug and Play Environment This report discusses the configuration of linguistic components for the home domain, including speech
recognition, interpretation and contextual interpretation. The deliverable provides an evaluation of grammar-based and statistical language modelling at the word, sentence and semantic level. This evaluation was
performed on a reasonably substantial corpus of utterances collected for the domain from users of prototype demonstrator systems. D3.1 also explores the of the notion of plug and play and there is a
well worked-out proposal for grammar based approaches. The deliverable includes a comprehensive set of dialogue moves for the domain, and provides several interesting approaches to contextual interpretation.
D4.1 Knowledge and Action Management in the Home Device Environment This deliverable examines different aspects of knowledge and action management in the D'Homme domain. For complex devices
such as VCRs and mobile phones, we show how existing knowledge from menu-based systems can be reused to provide more flexible spoken dialogue control. For setting up a dialogue system in a particular home, a Home Setup
Agent which captures necessary inferential relationships is described and for checking consistency of user instructions and the state of the home, a model building approach is developed. Finally the deliverable
investigates modelling plug and play at the knowledge management level, looking at issues of distribution of knowledge and inheritance. D5.1 The D'Homme Device Selection
D5.1 identifies sets of core and advanced devices for which interfaces in English, Spanish and/or Swedish have been built. Basic devices such as lights are core, while advanced devices include VCRs and autonomous
sensors. For each device, D5.1 documents its interface (using UPNP) and typical interaction scenarios. D5.2 D'Homme Devices This deliverable describes how the devices in D5.1
have been integrated into the demonstrators and outlines the practical and theoretical issues in demonstrating the various interaction scenarios. The devices covered include real appliances on X10 or Lonworks networks,
and simulated devices such as VCRs, and autonomous sensors. D6 Evaluating D'Homme (not yet completed) D6 evaluates the D'Homme demonstrators with regard to general usability,
coverage of the intended interaction scenarios described in D5.1, plug and play characteristics and language portability. Dissemination A project web site has been set up at
http://www.ling.gu.se/projekt/dhomme. It includes links to a series
of overview slides for the project, electronic versions of all relevant publications and project deliverables, and demonstration systems. The demonstations currently include dry-run screenshots, and a live, interactive,
demo using a simulated home. Invitations to join the international consultative and user group have had a good response and the ICUG has grown to around 20 members. The project has also been presented
at the ESSLLI Summer School, and to various potential industrial users (details are provided in the Dissemination and Use Plan). The publication rate for a one year project has been excellent with refereed papers at 6
different conferences and workshops (see the Annex for a listing). 4. Methodologies In this section of the report we will firstly consider direct competitors to the approach used in D'Homme, then two key alternative approaches: speech recognition for
individual devices, and menu based interaction with networked devices.Spoken dialogue for networked devices There are already some commercial offerings providing speech recognition
for home control e.g. PowerHome and HomeVoice. These allow the user to set up the system, associating particular strings of words to particular commands, or sequences of commands. These systems do not have advanced
language understanding or dialogue capability, and do not support plug and play, except in the sense that the user can always change the association of strings and commands. Users need to be prepared to invest in the
initial set up of the system, and to remember the precise way in which to express a command. Slightly more advanced spoken dialogue capability is offered within Smart House demonstrators, e.g. the
Orange Smart House in the south of England. This uses a finite state grammar which makes it easier to map a large number of alternative utterances to a particular command However, as far as we are aware the system does
not have plug and play capability or advanced language/dialogue capabilities. Speech recognition embedded in individual devices Recognition in individual devices has the advantage
that there is no need for the devices to be on a network, and there are no concerns over plug and play. However, this approach loses a lot of functionality: users can no longer interrogate the state of play of the whole
house (e.g. "have I left anything on") or switch off every light with one command. Embedded recognisers also tend to be of poorer quality. The approach is not suitable for remote control of the house, or for the
disabled who may want to control devices outside the room they are in e.g. security lights, door locks, curtains etc. Use of touch screens on a PDA We can imagine controlling the house
through menus, or a touch screen. This may be quicker for e.g. turning on a single device, and the possibility is provided as an alternative to speech in one of the D'Homme demonstrators. This method can be used for
remote control of a networked home, and within the home. A disadvantage is that it requires more familiarity with computers, which may be a problem for the elderly. It is also not so suitable for complex commands or
queries. Once we start to want to program our devices natural language provides much more flexibility, as shown by the work in D'Homme on converting from menus to dialogue. 5. European added value The project has brought together complementary
expertise from across Europe: advanced dialogue management (Gothenburg), a Smart House with real networked appliances (Telia), statistical language modelling for dialogue systems (SRI), grammar based language
understanding (netdecisions), inference based knowledge management (Edinburgh), and linguistic engineering and semantic networks (Seville). Bringing together different research groups with different
theoretical leanings into a common objective has undoubtedly led to a project with a wider theoretical underpinning than one which would have resulted from a project.based at a single site, or within a single country.
An early project meeting (9/10th January) and regular all-site meetings were crucially important in ensuring a fast start up and in maintaining connectivity between the sites The European aspect
of the project has also allowed it to address multilinguality issues, with demonstrators in Spanish, Swedish and English. Portability between languages has been illustrated by English versions of the Spanish and Swedish
demonstrators. 6. Outlook The project partners
are actively pursuing commercial opportunities with disabled organisations, and with home construction/security companies. We also anticipate that the project will feed into work in the 6th
Framework Programme on ambient intelligence, which has a similar emphasis on distributed device networks, and user friendly human computer interaction. 7. Conclusions This has been a highly successful project. The key
objectives have been met, and the demonstrators are more advanced than expected. All sites have been very committed, and interest has increased as the project has proceeded. The home domain provides very clear and
interesting practical and research challenges to spoken dialogue systems. The section above has outlined some of the direct possibilities for exploitation of the project. Moreover, the solutions to problems in this
domain also have wider applicability. For home control it was obvious that we needed to address plug and play issues. However, any advanced dialogue system which deals with a changing domain (whether in modelling
different users, or in adapting to changing data) has elements of the same problem. 8. Annex:
Publications and Deliverables The D'Homme project publications and Deliverables are here.
Electronic Resources Web site:
http://www.ling.gu.se/projekt/dhomme/ which includes:
Technical Showcase (this document)
Audio-Visual Presentation
|