FFLOC Corpus

	Florence Myles Department of Language and Linguistics University of Essex fmyles@essex.ac.uk website		Emma Marsden Department of Education University of York emma.marsden@york.ac.uk website
	Rosamund Mitchell Applied Linguistics - emeritus University of Southampton r.f.mitchell@soton.ac.uk website

Participants:	60
Type of Study:	tasks
Location:	UK
Media type:	audio
DOI:	doi:10.21415/T5NS31

Citation information

Myles, F. (2002). Full Report of Research Activities and Results. Linguistic Development in Classroom Learners of French. www.regard.ac.uk/research_findings/R000223421/report.pdf

In accordance with TalkBank rules, any use of data from this corpus m ust be accompanied by the above reference.

Project Description

LINGDEV or Linguistic Development in Classroom learners of French: a Cross sectional Study: This directory contains sound files and corresponding transcripts from an ESRC-funded one year project which ran from October 2001 to September 2002 (ESRC grant R000234754). One of its aims was to provide a database of learner language for years, 9, 10 and 11 of secondary education in the UK context. The Project Director was Florence Myles and the other team members were Emma Marsden, Rosamond Mitchell and Sarah Rule.

Three groups of twenty learners in each of years 9, 10 and 11 (i.e. in their 3rd, 4th and 5th year respectively of learning French in the UK educational context; age 13-14, 14-15, 15-16 respectively) in a local secondary school were tested. In the LingDev files, children's ages are given as 13; for Year 9, 14; for Year 10, and 15; for Year 11. The Progression data are from Years 7, 8, and 9 and ages are given as 12, 13, and 14.

A gender-balanced sample from the three different year groups, and containing pupils of all the ability range, as judged by the teachers and the pupils' school grades, was used in the study. The sample is however slightly biased towards the top ability pupils, as they are more likely to show signs of further development. The participants were numbered 1 - 20 for each year group. However as this was a short term cross-sectional study if a cohort pupil was absent then a replacement pupil carried out the task and these were given random numbers between 60 and 90. This ensured that the number of pupils in each year that carried out a particular task was always 20. In selecting and involving informants in the research, the project followed the Recommendations on Good Practice in Applied Linguistics of the British Association of Applied Linguistics (1994) on the responsibility of researchers in respecting the privacy of participants, ensuring confidentiality of personal details and in maintaining openness about the goals of the research.

Procedure

4 oral tasks were administered to all 60 subjects, on a one-to-one basis with a researcher. The tasks used were the same for all years, in order to enable a comparison of results. Moreover, some of the tasks were the same as those used in the 'Progression Project' (to enable comparisons to be drawn). The tasks were as follows:

Cartoon story (Loch Ness Monster): in this task, learners have to tell a story on the basis of a series of cartoon pictures. This task was developed and used in the Progression Project. It also provides valuable information on learners' developing discourse level skills. Task Code L
Interrogative elicitation task: this task is an information gap activity in which the subjects have to find out missing information from the researcher in order to reconstruct a drawing. The main purpose of this task is to elicit interrogative constructions and pronominal reference, as well as gender markings. This task was also developed and used in the Progression Project. Task Code I
Photos task: One-to-one interview with a researcher: this is a directed conversation with a researcher in which the subject has to respond to a number of questions, as well as ask questions based on photographs brought by the researcher. The main purpose of this task is to elicit a wide range of structures, with a particular focus on verbal morphology (past tense, future). A version of this task was used in the Progression Project, although we modified it in order to ensure elicitation of a range of temporal reference (as we were dealing with more advanced learners). Task Code P
Negative elicitation task: learners have to describe a famous person by saying what they do and do not do (following picture cues), and the researcher has to guess who the famous person is on the basis of the learner's description and a series of possible celebrities. Task Code N

All tasks were recorded digitally, and took around 15 minutes each, in a one-to-one situation with a researcher, making a total of around one hour of spoken language per pupil.

Additional Conventions

In this section, we describe some of the general decisions we have taken in the transcribing of French interlanguage oral data, as well as some of the adaptations we have made to the CHILDES system, in the context of L2 data. As will become obvious, many of the decisions were dictated by our research agenda in both the Linguistic Development and the Progression projects, and our choice to use the automatic morphosyntactic parser. And although it means that sometimes, the transcription is somewhat deviant from the actual phonological shape of the words produced by learners, we felt it is not too much of a problem as other researchers interested in e.g. phonology, can listen to the sound files as they read the transcripts, and add their own level of coding. The data has been transcribed orthographically. This is necessary in order to use the French morphosyntactic parser on the completed transcripts, as it will not recognise non-words. There is no extensive coding of errors and overlaps are not marked, since they can be heard in the sound files. Learner utterances have been carefully segmented into distinct utterances, but this has not been done for the researcher.

If a participant exactly repeats the researcher (or another participant in the case of pair tasks), it has been coded as follows:
*32N: [- eng] how do you say he goes?
*ADR: il va
*32N: il@g va@g au cinema
@g is added after every repeated word. @g has been added to the special form marker file sf.cut file in the French MOR program. @g is used to ensure the imitation is not included for analysis by the French morphosyntactic parser, as this could give misleading information about the current grammar of the learner .

In order for the French MOR program to ignore the English we coded whole utterances as follows:
*SAR: [- eng] yes you begin by asking questions
*43P: [- eng] how do you say dog?

Indeterminate Forms

In beginner datasets, it is often difficult to determine which form a learner has intended, as learners often produce something very approximate. There are four examples of this use of indeterminate forms that occur consistently in our data and we coded them as follows:

Definite articles which sound like something between le and la: le@z:m
Indefinite articles which sound like something between un and une: un@z:m, une@z:m
First person subject pronoun which sound like something between je and j'ai: je@z:m
Something between il and ils: il@z:m, ils@z:m
A verb form which sounds like something between a and est: a@z:m

The Files are labelled in the following way:
Soundfiles: 01L9SAR.wav
Transcriptions: 01L9SAR.cha (01 is the number of the student, L is the task code, 9 is the student's year, SAR is the abbreviation for the researcher)

Acknowledgements

Anthony Kelly reformatted this corpus into accord with current versions of CHAT.