D'Homme
About D'Homme
Showcase
Technical Showcase
Partners
Contact
D'Homme Demos
Publications
Technical Showcase

1. Project Overview

Automated spoken dialogue provides a natural way of communicating with networked home devices. This has applications for the elderly and disabled, but will also have wider applications as homes become networked as a matter of course, and we become used to services being provided by multiple devices. The D'Homme project addresses the theoretical challenges in language understanding and dialogue management for controlling and querying multiple networked devices from inside or outside the home, but stops short of issues such as microphone placement or speaker identification (we assume the use of a mobile phone or a hand held remote controller with integrated microphone).

The partners in D'Homme are the Universities of Gothenburg, Edinburgh and Seville, and the companies, SRI, netdecisions and Telia. The partners contributed the following expertise:

  • Edinburgh:  advanced knowledge representation and reasoning, language engineering.
  • Gothenburg:  advanced, flexible dialogue, Swedish language.
  • Seville:  dialogue management and language engineering. Spanish language.
  • Telia:   advanced dialogue and action management, networking real devices.
  • netdecisions:  language engineering (grammar based models), English language
  • SRI:   language engineering (statistical models), project management

Given the exploratory nature of this one year project, and the existence of considerable background at each site, we decided that partners should adapt their own dialogue systems rather than attempting to build a single new system. However, partners were encouraged to share modules wherever possible, and this has been widespread: partly due to the use of an agreed functional architecture, partly due to different partners concentrating their efforts on different modules.

The results of the project have been considerable in such a small time frame. There have been publications in 6 different refereed workshops/conferences, and a large number of substantive project reports outlining the project achievements and challenges. The project demonstrators show the feasibility of spoken dialogue for networked devices, and show the directions for the future. They have had wide visibility, both in presentations to companies and at academic events.

The  Audio-Visual presentation gives a less-technical introduction to the project, and short slide shows of some of our demo systems in operation are available:

(Note that the HTML presentations have been generated using Powerpoint and may not work on all versions of all browsers. The Powerpoint presentations all contain audio.)

2. Project Objectives

D'Homme aimed to facilitate the use of natural spoken language interfaces in communicating with small programmable devices in the home environment. Our key research questions were:

  • What sorts of dialogues will humans want and be able to have with networked domestic programmable devices?
  • What processing architectures and representations best support such dialogues?
  • What demands do reconfigurable device networks and language processing components impose?

Specific objectives were to:

  • Build baseline demonstrators in English, Spanish and Swedish as proof of concept
  • Explore and evaluate methods for reconfigurability in the light of plug-and-play device networks
  • Evaluate our progress, consortium profile and user involvement strategy with a view to a second, more ambitious project

3. Project Results and Achievements

The project as a whole has shown the feasibility of using spoken dialogue with networked devices. It has also stimulated research in advanced dialogue more generally.  The home control domain is very different from traditional information seeking domains, and challenges many assumptions about the way in which spoken dialogue systems should be constructed. User-initiatated complex commands, such as "turn on the light and the heater", which may require follow up, do not fit easily into the form filling paradigm exemplified by the current standard for dialogue systems, VoiceXML. Dealing with plug and play issues, where devices may be plugged in and out of a network provides a huge challenge to conventional approaches to language modelling and knowledge management in dialogue systems.

To address the plug and play issue for knowledge management, the D'Homme project has provided proposals involving distributing information between individual devices or device types, involving the use of multiple dimensional inheritance to ensure maximum reuse of information, and involving the use of semantic networks to conveniently embody necessary inferences (e.g. a dimmer is a kind of light). To address plug and play for language modelling in grammar based systems there is a proposed solution and implementation using a high level generic grammar, with specialisations via a feature system. For statistical language modelling,  the challenges are greater, and we are investigating hybrid statistical/grammar based solutions.

The key objectives of the project have all been met. We collected a corpus of commands and queries by getting users to interact with early versions of the project demonstrators. A wizard of Oz data collection effort was beyond the scope of a one-year project. We have provided discussion of the architectures and representations required in this domain, especially with regard to reconfigurability. Finally we have provided demonstrators in English, Spanish and Swedish which are much more advanced than the baseline demonstrators anticipated.

Brief outlines of the public project deliverables are provided below, followed by some discussion of the project dissemination activities.

D1.1 Standards in Home Automation

This deliverable examines leading emerging industry standards for common device  interfaces and abstractions in Home Automation, and VoiceXML, the current standard for dialogue specification. It concludes by discussing how these standards might be leveraged in the D'Homme project.

D2.2 A D'Homme Demonstrator in English, Spanish and Swedish

D2.2 describes the implementations built by each site (and across sites) and highlights the areas of particular interest for each, giving examples of each system in use for the command and control of real and simulated devices. The functional architecture common to the systems is laid out along with the interfaces to key modules. Finally, the way in which modules have been put together to form each demonstrator system is discussed.

D3.1 Configuring Linguistic Components in a Plug and Play Environment

This report discusses the configuration of linguistic components for the home domain, including speech recognition, interpretation and contextual interpretation. The deliverable provides an evaluation of grammar-based and statistical language modelling at the word, sentence and semantic level. This evaluation was performed on a reasonably substantial corpus of utterances collected for the domain from users of prototype demonstrator systems.

D3.1 also explores the of the notion of plug and play and there is a well worked-out proposal for grammar based approaches. The deliverable includes a comprehensive set of dialogue moves for the domain, and provides several interesting approaches to contextual interpretation.

D4.1 Knowledge and Action Management in the Home Device Environment

This deliverable examines different aspects of knowledge and action management in the D'Homme domain. For complex devices such as VCRs and mobile phones, we show how existing knowledge from menu-based systems can be reused to provide more flexible spoken dialogue control. For setting up a dialogue system in a particular home, a Home Setup Agent which captures necessary inferential relationships is described and for checking consistency of user instructions and the state of the home, a model building approach is developed. Finally the deliverable investigates modelling plug and play at the knowledge management level, looking at issues of distribution of knowledge and inheritance. 

D5.1 The D'Homme Device Selection

D5.1 identifies sets of core and advanced devices for which interfaces in English, Spanish and/or Swedish have been built. Basic devices such as lights are core, while advanced devices include VCRs and autonomous sensors. For each device, D5.1 documents its interface (using UPNP) and typical interaction scenarios.

D5.2 D'Homme Devices

This deliverable describes how the devices in D5.1 have been integrated into the demonstrators and outlines the practical and theoretical issues in demonstrating the various interaction scenarios. The devices covered include real appliances on X10 or Lonworks networks, and simulated devices such as VCRs, and autonomous sensors.

D6 Evaluating D'Homme (not yet completed)

D6 evaluates the D'Homme demonstrators with regard to general usability, coverage of the intended interaction scenarios described in D5.1, plug and play characteristics and language portability.

Dissemination

A project web site has been set up at http://www.ling.gu.se/projekt/dhomme. It includes links to a series of overview slides for the project, electronic versions of all relevant publications and project deliverables, and demonstration systems. The demonstations currently include dry-run screenshots, and a live, interactive, demo using a simulated home.

Invitations to join the international consultative and user group have had a good response and the ICUG has grown to around 20 members. The project has also been presented at the ESSLLI Summer School, and to various potential industrial users (details are provided in the Dissemination and Use Plan). The publication rate for a one year project has been excellent with refereed papers at 6 different conferences and workshops (see the Annex for a listing).

4. Methodologies

In this section of the report we will firstly consider direct competitors to the approach used in D'Homme, then two key alternative approaches: speech recognition for individual devices, and menu based interaction with networked devices.

Spoken dialogue for networked devices

There are already some commercial offerings providing speech recognition for home control e.g. PowerHome and HomeVoice. These allow the user to set up the system, associating particular strings of words to particular commands, or sequences of commands. These systems do not have advanced language understanding or dialogue capability, and do not support plug and play, except in the sense that the user can always change the association of strings and commands. Users need to be prepared to invest in the initial set up of the system, and to remember the precise way in which to express a command.

Slightly more advanced spoken dialogue capability is offered within Smart House demonstrators, e.g. the Orange Smart House in the south of England. This uses a finite state grammar which makes it easier to map a large number of alternative utterances to a particular command However, as far as we are aware the system does not have plug and play capability or advanced language/dialogue capabilities.

Speech recognition embedded in individual devices

Recognition in individual devices has the advantage that there is no need for the devices to be on a network, and there are no concerns over plug and play. However, this approach loses a lot of functionality: users can no longer interrogate the state of play of the whole house (e.g. "have I left anything on") or switch off every light with one command. Embedded recognisers also tend to be of poorer quality. The approach is not suitable for remote control of the house, or for the disabled who may want to control devices outside the room they are in e.g. security lights, door locks, curtains etc.

Use of touch screens on a PDA

We can imagine controlling the house through menus, or a touch screen. This may be quicker for e.g. turning on a single device, and the possibility is provided as an alternative to speech in one of the D'Homme demonstrators. This method can be used for remote control of a networked home, and within the home. A disadvantage is that it requires more familiarity with computers, which may be a problem for the elderly. It is also not so suitable for complex commands or queries. Once we start to want to program our devices natural language provides much more flexibility, as shown by the work in D'Homme on converting from menus to dialogue.

5. European added value

The project has brought together complementary expertise from across Europe: advanced dialogue management (Gothenburg), a Smart House with real networked appliances (Telia), statistical language modelling for dialogue systems (SRI), grammar based language understanding (netdecisions), inference based knowledge management (Edinburgh), and linguistic engineering and semantic networks (Seville).

Bringing together different research groups with different theoretical leanings into a common objective has undoubtedly led to a project with a wider theoretical underpinning than one which would have resulted from a project.based at a single site, or within a single country. An early project meeting (9/10th January) and regular all-site meetings were crucially important in ensuring a fast start up and in maintaining connectivity between the sites

The European aspect of the project has also allowed it to address multilinguality issues, with demonstrators in Spanish, Swedish and English. Portability between languages has been illustrated by English versions of the Spanish and Swedish demonstrators.

6. Outlook

The project partners are actively pursuing commercial opportunities with disabled organisations, and with home construction/security companies. We also anticipate that the project will feed into work in the 6th Framework Programme on ambient intelligence, which has a similar emphasis on distributed device networks, and user friendly human computer interaction.

7. Conclusions

This has been a highly successful project. The key objectives have been met, and the demonstrators are more advanced than expected. All sites have been very committed, and interest has increased as the project has proceeded. The home domain provides very clear and interesting practical and research challenges to spoken dialogue systems. The section above has outlined some of the direct possibilities for exploitation of the project. Moreover, the solutions to problems in this domain also have wider applicability. For home control it was obvious that we needed to address plug and play issues. However, any advanced dialogue system which deals with a changing domain (whether in modelling different users, or in adapting to changing data) has elements of the same problem.

8. Annex:

Publications and Deliverables

 The D'Homme project publications and Deliverables are here.

Electronic Resources

Web site: http://www.ling.gu.se/projekt/dhomme/ which includes:

 

 

[D'Homme] [About D'Homme] [Showcase] [Technical Showcase] [Partners] [Contact] [D'Homme Demos] [Publications]