The Internet of Things describes a network of devices capable of connecting hundreds of billions of devices, and then sensing and communicating information required for a wide range of uses such as healthcare, vehicular systems, and industrial environments. As one of the most natural ways of communication, speech will increasingly be used as the primary form of interaction between humans and Internet of Things devices. In recent years, research has shown that there are clear links between the emotional and mental state of an individual and certain patterns in the individual's speech. If these patterns are detected in a timely fashion, it is possible to build emotion-aware Internet of Things solutions. This could be used to adapt a system to better meet the needs of the user, to prevent human error, to detect and prevent potentially malicious user activities, and to initiate medical interventions. Therefore, the overarching goal of this project is to advance speech-based emotion analysis to enable the design of such emotion-aware Internet of Things solutions. The project will also enrich the team's ongoing outreach and educational goals, including mentorship of minority and high-school students, revision of existing and development of new courses aligned with the research challenges in the project, and tight integration of research activities and undergraduate education.
The technical challenges in the project are organized into three main thrusts. First, the project will develop and evaluate multi-modal emotion detection systems, where speech analysis is coupled with other physiological metrics such as heart rate, galvanic skin response, or skin temperature, to more accurately determine an individual's emotional state. Second, the work will apply the concept of topic modeling to perform context-aware analysis of speech data, which will also assist in differentiating short-term emotions (i.e., the current mood of an individual) from long-term emotions (e.g., depression). Topic modeling is an increasingly popular technique to learn, recognize, and extract the topics of spoken commands or conversations, providing additional context information for more accurate emotion analysis. The primary outcomes of the first two thrusts will be new insights into the design and development of emotion-aware systems. However, to achieve this goal, a comprehensive database containing speech and physiological data (annotated with the emotional states of the users) will be required, and therefore, the third thrust of the project will build such a database. When completed, this database will contain speech samples and other data from over 500 individuals and the database will be made available to the general scientific community to advance research beyond the team's institution.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.