Document sans titre

Work Package 1: Speech databases for the study of phonetic convergence

Five speech databases are used to examine the phonetic and prosodic characteristics of imitation in conversational exchanges between adults and between children.
Two of them (the CID and CLAPI) already exist while the others are recorded in the framework of the project.

The CID database (Corpus of Interactional Data):

The CID (Corpus of interactional data, Bertrand et al., 2006) has been made up from eight 1-hour dialogues. For each dialogue, the talkers are of the same sex, familiar with each other, and ask to hold an informal conversation around two prestablished topics. The CID has been fully transcribed and annotated at the orthographic, morphosyntactic, intonational and phonemic levels and segmented into words and phonemes. The analyses conducted for the purposes of the present project has been more specifically concentrate on the developpement of an automatic tool for the hetero-repetition detection (the fact for a speaker to repeat a part of what the interlocutor said before).

The CLAPI database:

CLAPI is a database of spoken language in naturally occurring interactions. It consists in 120 hours of transcribed audio and video recordings. An important work has been done to select the corpus, identify and classify the data and cut it into relevant specific phonema in order to study repetitions.

The Manipulated-Voice database:

This new set of speech data has been collected with a view to more specifically examine potential convergence effects in fundamental frequency in conversations. In the experimental method we use, participants are visually separated from each other and communicate via microphones and headphones. Each participant height voice is on-line manipulated across the conversational exchange thanks to a protocol we developed.

The GMUP database:

GMUP which stands for “Group ’em up !” is an interactive task in wich two participants engage in a collaborative game and which leads these participants to produce a number of purpose-built names. The names were expected to be pronounced in a different way depending on dialects. This game allows us to elicit the production of accurately controlled phonological material in an interactional situation that preserves the spontaneity of the participants’ verbal exchanges.

The Children Map Task database:

This database contains recordings of dialogues that are elicited using a French version of the Map Task. In this kind of task, two talkers sit opposite one another and each has a geographical map which the other cannot see. Talkers must collaborate verbally to reproduce on one participant's map a route printed on the other's. We have designed a version of the Map Task adapted to 7- to 10-year-old French-speaking children. This entailed using a restricted set of landmark names on simplified maps.