Corpus PAROLE


Heather Hilton
Department of Applied Foreign Linguistics
University of Lyon

website

John Osborne
Linguistics
Université de Savoie

website

Participants: 95
Type of Study: tasks / storytelling
Location: France
Media type: audio
DOI: doi:10.21415/T5TP4X

Browsable transcripts

Download transcripts

Media folder

Citation information

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by the above reference.

Additional References:

Project Description

The Corpus PAROLE (PARallèle Oral en Langue Etrangère) was compiled by members of the Langages research team (Laboratoire LLS) at the Université de Savoie (Chambéry, France), to investigate the characteristics of different L2 proficiency levels. The particularity of the corpus is our attempt to incorporate temporal elements of spoken production in the main transcription line, along with more classic coding of errors and retracings.

PAROLE is composed of oral productions by 68 young adult learners of three foreign languages (English, French, Italian), as well as a benchmark corpus of productions by 27 native speakers performing the same tasks. Transcripts and recordings of three tasks (two summaries of a video clip immediately after viewing, and a short autobiographical narrative) will constitute the PAROLE corpus. Task details are provided in the PAROLE Manual (PAROLE_documents folder).

In addition to the speaking tasks, all the non-native subjects completed a battery of tests and questionnaires, furnishing complementary data on their L2 knowledge, experience, motivation for L2 study, and two aspects of language-learning aptitude (nonword repetition and morpho-syntactic analysis). Test results for the learner subjects are available in the subject_data file (PAROLE_documents folder), and references for the tests used are provided in the PAROLE Manual (same folder). Pdf files of the subject profile and the motivation questionnaires used (English L2 subjects) are also included in the documents folder.

PAROLE was funded through a global research grant given to the Laboratoire LLS by the French Ministère de l'Education Nationale, as part of the contrats quadriénnaux between the Ministry and the Université de Savoie for 2003-2006 and 2007-2010. The Ministry also provided funds for two doctoral students working on the corpus.

Procedure

We began pre-testing production triggers and assembling test materials in 2003; most of the French L2 and English L2 subjects were recorded in 2005 and the native speakers in 2006, and transcription work began in earnest in 2006. Due to illness and a shortage of personnel, the Italian recordings and transcriptions are lagging behind English and French; the first wave of Italian files should be available on-line by the end of 2008 (and we apologize for this frustrating delay).

We have attempted to adhere to CHAT conventions as closely as possible; major innovations concern the scoped timing of "hesitation groups" (unbroken sequences of hesitation phenomena, such as silent pauses, filled pauses, and certain paralinguistic noises). We have also made a distinction between words produced in the learners' L1 (coded with the new suffix "@l1"), and words produced in another foreign language (coded "@s").

See the PAROLE Manual for detailed descriptions of our use of CHAT coding symbols, occasional additions to the code base, our criteria for utterance delimitation, error coding, etc. (PAROLE_documents folder).

Participants in the learner corpus (54 females, 14 males):

Participants in the native-speaker corpus (20 females and 7 males): All participants were enrolled in a French or Italian university (either in a normal or study-abroad program) at the time of recording. See the subject_data file for detailed information on each participant in the PAROLE_documents folder.

The corpus consists of audio files (.wav format) and transcripts for each participant performing two short video summary tasks ("task A," "task C"), and one short autobiographical narrative ("task E"; on-line publication planned in late 2008). Sound files and transcripts are segmented according to task. All transcripts have been carefully linked to the digital sound files with bullet points in Sonic Mode. We recommend that researchers wishing to work with PAROLE organize their files with sound files and transcripts in the same folder, for optimal comparison between the transcripts and the productions. Carefully disambiguated tagged files are stored together in a special folder for each language. Key to file names (three-digit numbers refer to each subject): L2 English learners: 0 L2 Italian learners: 2 L2 French learners: 4 British and NZ English: N0 North American English: N1 Italian native-speakers: N2 French native-speakers: N4 The single letter (a, c, or e) following the subject number indicates which task is involved: file "010a.cha" is the CHAT transcript for English learner 010 performing task A (first video description); file "010a.wav" is the sound file corresponding to this transcript & task; file "010a.pst.cex" is the tagged transcript.

All recordings took place in a small, closed classroom or office, without distractions or interruptions. Video support material ("triggers") were presented on a portable computer, and integrated into .html pages that the subject manipulated directly. See PAROLE Manual for details of interview structure, video presentation, interviewer behavior, recording equipment, etc.

Acknowledgements

Andrew Yankes reformatted this corpus into accord with current versions of CHAT.