Work Package 1: Speech databases for the study of phonetic convergence
Five speech databases are used to examine
the phonetic and prosodic characteristics of imitation in conversational exchanges
between adults and between children.
Two of them (the CID and CLAPI) already exist while the others are recorded
in the framework of the project.
The CID database (Corpus of Interactional Data):
The CID (Corpus of interactional data, Bertrand et al., 2006) has been made
up from eight 1-hour dialogues. For each dialogue, the talkers are of the
same sex, familiar with each other, and ask to hold an informal conversation
around two prestablished topics. The CID has been fully transcribed and annotated
at the orthographic, morphosyntactic, intonational and phonemic levels and
segmented into words and phonemes. The analyses conducted for the purposes
of the present project has been more specifically concentrate on the developpement
of an automatic tool for the hetero-repetition detection (the fact for a speaker
to repeat a part of what the interlocutor said before).
The CLAPI database:
CLAPI is a database of spoken language in naturally occurring interactions.
It consists in 120 hours of transcribed audio and video recordings. An important
work has been done to select the corpus, identify and classify the data and
cut it into relevant specific phonema in order to study repetitions.
The Manipulated-Voice database:
This new set of speech data has been collected with a view to more specifically
examine potential convergence effects in fundamental frequency in conversations.
In the experimental method we use, participants are visually separated from
each other and communicate via microphones and headphones. Each participant
height voice is on-line manipulated across the conversational exchange thanks
to a protocol we developed.
The GMUP database:
GMUP which stands for “Group ’em up !” is an interactive
task in wich two participants engage in a collaborative game and which leads
these participants to produce a number of purpose-built names. The names were
expected to be pronounced in a different way depending on dialects. This game
allows us to elicit the production of accurately controlled phonological material
in an interactional situation that preserves the spontaneity of the participants’
verbal exchanges.
The Children Map Task database:
This database contains recordings of dialogues that are elicited
using a French version of the Map Task. In this kind of task, two talkers
sit opposite one another and each has a geographical map which the other cannot
see. Talkers must collaborate verbally to reproduce on one participant's map
a route printed on the other's. We have designed a version of the Map Task
adapted to 7- to 10-year-old French-speaking children. This entailed using
a restricted set of landmark names on simplified maps.