20th International Conference on Speech and Computer


Tanja Schultz

Cognitive Systems Lab, University of Bremen, and Carnegie Mellon University, Pittsburgh

Advances in Biosignal-Based Spoken Communication

Abstract: Speech is a complex process emitting a wide range of biosignals, including, but not limited to, acoustics. These biosignals – stemming from the articulators, the articulator muscle activities, the neural pathways, and the brain itself – can be used to circumvent limitations of conventional speech processing in particular, and to gain insights into the process of speech production in general.

In my talk I will present ongoing research at the Cognitive Systems Lab (CSL), where we apply machine learning methods to process and interpret a variety of speech-related activities such as muscle and brain activities with the goal of creating biosignal-based speech processing devices for communication applications in everyday situations and of gaining a deeper understanding of spoken communication. Several applications will be described such as Silent Speech Interfaces that rely on articulatory muscle movements captured by electromyography to recognize and synthesize silently produced speech, Brain-to-text interfaces that use brain activity captured by electrocorticography to eventually recognize imagined speech, as well as brain computer interfaces based on near infrared spectroscopy.

Biography: Tanja Schultz received her doctoral and diploma degree in Informatics from University of Karlsruhe, Germany, in 2000 and 1995. Prior to these degrees she completed the state exam in Mathematics, Sports, Physical and Educational Science from Heidelberg University, Germany in 1989. She joined Carnegie Mellon University, Pittsburgh, PA in 2000 and is an adjunct Research Professor at the Language Technologies Institute. From 2007 to 2015 she was a Full Professor in Informatics at the Karlsruhe Institute of Technology (KIT) in Germany before she became a Professor for Cognitive Systems at the University of Bremen, Germany in April 2015. Since 2007, she directs the Cognitive Systems Lab, where her research activities include multilingual speech processing and the processing, recognition, and interpretation of biosignals for human-centered technologies and applications.

Dr. Schultz is an Associate Editor of ACM Transactions on Asian Language Information Processing (since 2010), serves on the Editorial Board of Speech Communication (since 2004), and was Associate Editor of IEEE Transactions on Speech and Audio Processing (2002-2004). She was President (2014-2015) and elected Board Member (2006-2013) of ISCA, and a General Co-Chair of Interspeech 2006. She was elevated to Fellow of ISCA (2016) and to member of the European Academy of Sciences and Arts (2017). Dr. Schultz was the recipient of the Otto Haxel Award (2013), the Alcatel Lucent Award for Technical Communication (2012), the PLUX Wireless Biosignals Award (2011), the Allen Newell Medal for Research Excellence (2002), and received the ISCA / EURASIP Speech Communication Best paper awards in 2001 and 2015.

Sebastian Möller

Quality and Usability Lab, TU Berlin, and German Research Center for Artificial Intelligence, DFKI

Quality Engineering of Speech and Language Services

Abstract: Speech- and language-based services have reached a high popularity, reaching an increasing group of users. In order to guarantee high acceptance on the long run, the quality and user experience has to be considered in the service development cycle in a systematic way. The term “quality engineering” has been used as an umbrella term for such systematic approaches.

In the talk, such approaches will be illustrated for two exemplary services. The first one is a spoken dialogue service where (synthetic) speech production, dialogue management, and speech perception need to be considered from a human quality point-of-view. Instrumental prediction of text-to-speech quality, dialogue simulation, as well as instrumental recognition of speech and speaker characteristics are building blocks to ensure high quality experience. The second service is a language translation service, in which automatic as well as human intelligence is combined in order to reach best possible outcome. Here, crowdsourcing approaches are used to translate and evaluate language data. For each of the two services, experimental data is presented which illustrates the state-of-the-art performance, but also open research questions which need to be answered to improve quality and user experience.

Biography: Sebastian Möller studied electrical engineering at the universities of Bochum (Germany), Orléans (France) and Bologna (Italy). From 1994 to 2005, he held the position of a scientific researcher at the Institute of Communication Acoustics (IKA), Ruhr-University Bochum, and worked on speech signal processing, speech technology, communication acoustics, as well as on speech communication quality aspects. From 2005 to 2015, he worked at Telekom Innovation Laboratories, an An-Institut of TU Berlin. He was appointed Full Professor for the subject "Quality and Usability" at TU Berlin in April 2007. From 2015 to 2017, he was Vice Dean for Research of the Faculty for Electrical Engineering and Computer Science at TU Berlin, and since April 2017, he serves as the Dean of this faculty. He also leads the research department "Language Technology" at the German Research Center for Artificial Intelligence, DFKI.

He received a Doctor-of-Engineering degree at Ruhr-University Bochum in 1999 for his work on the assessment and prediction of speech quality in telecommunications. In 2000, he was a guest scientist at the IDIAP in Martigny (Switzerland) where he worked on the quality of speech recognition systems. He gained the qualification needed to be a professor (venia legendi) at the Faculty of Electrical Engineering and Information Technology at Ruhr-University Bochum in 2004, with a book on the quality of telephone-based spoken dialogue systems. He worked as a Visiting Fellow/Visiting Professor at MARCS Auditory Laboratories, University of Western Sydney, at the Universidad de Granada (Spain), at the Ben Gurion University of the Negev in Be'er Sheva (Israel), and at NTNU in Trondheim (Norway). Since 2012, he is Adjunct Professor at the University of Canberra. His most recent book on "Quality Engineering" was published in 2010, and his co-edited book on "Quality of Experience: Advanced Concepts, Applications and Methods" in 2014.
Sebastian Möller was awarded the GEERS prize in 1998 for his interdisciplinary work on the analysis of infant cries for early hearing-impairment detection, the ITG prize of the German Association for Electrical, Electronic & Information Technologies (VDE) in 2001, the Lothar-Cremer prize of the German Acoustical Association (DEGA) in 2003, a Heisenberg fellowship of the German Research Foundation (DFG) in 2005, and the Johann Philipp Reis prize in 2009. Since 1997, he has taken part in the standardisation activities of the International Telecommunication Union (ITU-T) on transmission performance of telephone networks and terminals. He was acting as a Rapporteur for question Q.8/12 from 2001-2016. He headed the special interest group on speech acoustics of DEGA from 2009 to 2015, is board member of the ITG since 2015, and of the International Speech Communication Association (ISCA) since 2016. He served as General Chair for Interspeech 2015 in Dresden.

Dongheui Lee

Human Robot Interaction for Automation Systems Group, Technical University of Munich

Robot learning through Physical Interaction and Human Guidance

Abstract: As a fundamental cornerstone in the development of intelligent robotic assistants, the research community on robot learning has addressed autonomous motor skill learning and control in complex task scenarios by working on a variety of fundamental sub-problems: movement primitive representation, reaction and adaptation, the link between perception and action, learning under supervision, and learning from self-practice. Imitation learning provides an efficient way to learn new skills through human guidance, which can reduce time and cost to program the robot. Robot learning architectures can provide a comprehensive framework for learning, recognition and reproduction of whole body motions. Also, the architecture can be integrated with different types of teaching modalities and be applied even in situations with incomplete measurement data. The inference mechanism can support not only to learn the robot's free body motion but also to learn physical interaction tasks, including human robot interaction. I will discuss incremental learning in different problem domains including the refinement of learned skills via heterogeneous learning modalities, enhancement of human-robot cooperation tasks over time, and improvement of stability in bipedal walking by iterative learning control. Empirical evaluation on several robotic systems will illustrate the effectiveness and applicability to learn control of high-dimensional anthropomorphic robots.

Biography: Dongheui Lee is Associate Professor of Human-centered Assistive Robotics at the TUM Department of Electrical and Computer Engineering. She is also director of a Human-centered assistive robotics group at the German Aerospace Center (DLR). Her research interests include human motion understanding, human robot interaction, machine learning in robotics, and assistive robotics. Prior to her appointment as Associate Professor, she was an Assistant Professor at TUM (2009-2017), Project Assistant Professor at the University of Tokyo (2007-2009), and a research scientist at the Korea Institute of Science and Technology (KIST) (2001-2004). After completing her B.S. (2001) and M.S. (2003) degrees in mechanical engineering at Kyung Hee University, Korea, she went on to obtain a PhD degree from the department of Mechano-Informatics, University of Tokyo, Japan in 2007. She was awarded a Carl von Linde Fellowship at the TUM Institute for Advanced Study (2011) and a Helmholtz professorship prize (2015). She is coordinator of both the euRobotics Topic Group on physical Human Robot Interaction and of the TUM Center of Competence Robotics, Autonomy and Interaction.

Eduardo Mizraji

Universidad de la República de Uruguay Montevideo

Improving neural models of language with input-output tensor contexts

Abstract: Tensor contexts enlarge the performances and computational powers of many neural models of language by generating a double filtering of incoming data. As an example, tensor contextualization allows for a single layer neural network to efficiently compute the exclusive-or operation. Applied to a linguistic domain, its implementation enables a very efficient disambiguation of polysemous and homonymous words. For the neurocomputational modeling of structured language, the simultaneous tensor contextualization of inputs and outputs inserts into the models strategic passwords that route words towards key natural targets, thus allowing for the creation of meaningful phrases. Moreover, these input-output tensor contexts provide us with a clustering procedure to pack topics efficiently in the representation of semantic spaces. In this keynote, we present several formal properties of these classes of contextualized models and describe possible ways to use these contexts to clarify the neural representation of organized sequences of words. We include an illustration of how these contexts generate topographic or thematic organization of the data. Finally, we show that double contextualization opens promising ways to explore the neural coding of episodes, one of the most challenging problems of neural computation.

Biography: Eduardo Mizraji obtained his MD from the Universidad de la República of Uruguay (UdelaR) and a DEA in Applied Mathematics from the University of Paris 5. He is Professor of Biophysics and Head of the Biophysics Section, Faculty of Sciences, UdelaR, Principal Investigator of the Programme for the Development of Basic Sciences (PEDECIBA), and Investigator Level III (the higher) of the National System of Investigators (SNI-ANII) of Uruguay. He integrates the Editorial Boards of the Bulletin of Mathematical Biology and of The International Journal of General Systems.

His main research interest concentrated during some years on the matrix memory models developed by many authors at the beginning of the 1970’s. These models exhibited associative abilities and conceptualization powers remarkably similar to those displayed by human memory. Nevertheless, the modeling of the influence of contexts within these matrix memories was an extremely difficult task. In 1987, he developed one of the first models (published in 1989) capable of making the associations of matrix memories dependent on semantic contexts. His model used a tensorial interaction between inputs and contexts that drastically enlarges the neurocomputational abilities of the memories. The contexts refine the structure of these memories, allowing the representation of many cognitive activities as a network of networks. Obviously, language is a natural target for this kind of approach, and recently using these context-sensitive memories, Mizraji and his colleagues demonstrated how the succession of linguistic clues about a given unknown object allows, if adequate information is stored in memories, to arrive at reliable diagnoses. On the other hand, this modular approach suggests operational analyses of the structure of language using graphical representations. In these representations, emerging discourse trajectories can be analyzed with the techniques originating from information theory. He and his colleagues have recently applied this approach to the characterization of pathological alterations of discourse, specifically schizophrenia.

As a byproduct, this context-dependent model has led him to the discovery of a novel approach to logical computation based on associative memories, an approach that promises a realistic neural theory of reasoning. Recently, Mizraji and colleagues started to explore some aspects of human reasoning via strategic associative memories. An interesting result emerging from these memory models is that the language includes some words that act as “keywords” that give access to neural modules capable of performing these logical computations. In particular, an exciting discovery is that this “neural logic” can help to explain some highly complex neural computations (eg. the computation of logical modalities or linguistic prepositions) that humans produce in an astonishingly simple way using some strategic words.