Data mining continuous speech: Modeling infant speech acquisition by extracting building blocks and patterns in spoken language


Complex use of language, and in particular speech, is one of the defining characteristics of humans, setting us apart from animals. In the last few decades, speech recognition has found many applications and is now, for example, a standard feature on modern smartphones. However, the flexible and powerful learning capacities of human infants have still not been equalled by any machine. Young children find a way to make sense of all the speech they hear and generalize it in a way that the patterns in the speech sounds can be disentangled, understood and repeated.

In a separate line of research, the field of machine learning and data mining, algorithms have been developed to discover patterns in data. The information that can be extracted from all the available data has become an important aspect of business, if we look at video recommendation systems or the financial sector.

The idea of my research is to develop and study techniques inspired by these data mining algorithms, in order to extract patterns from speech. The inherent difficulties of continuous and noisy speech have to be overcome, as it cannot just be processed in the same way as discrete and exact data.

After adapting these methods and applying them to speech, I will use them in the scientific research on the building blocks of speech, evaluating their relevance and validity. Furthermore, using these, I will investigate what aspects of speech children need, and subsequently use, to learn about these building blocks.

Involved members: