Vercellotti Corpus

Mary Lou Vercellotti
Department of English
Ball State University
mlvercellott@bsu.edu
website

Participants:	188
Type of Study:	classroom
Location:	Pittsburgh
Media type:	audio
DOI:	doi:10.21415/T5W88X

Citation information

Publications using these data should cite:

Vercellotti, M. L. (2017). The development of complexity, accuracy, and fluency in second language performance: A longitudinal study. Applied Linguistics, 38(1), 90-111.

Additional publications include:

Vercellotti, M. L. & Packer, J. (2016). Shifting structural complexity: The production of clause types in speeches given by English for Academic Purposes students. Journal of English for Academic Purposes, 22, 179-190.

Vercellotti, M. L. (2018). Finding variation: Assessing the development of syntactic complexity in ESL speech. International Journal of Applied Linguistics.1-15. DOI: 10.1111/ijal.12225

Vercellotti, M. L. & McCormick, D. E. (2018). Self-correction profiles of L2 English learners: A longitudinal multiple-case study. TEFL-EJ, 22(3), 1-25.

Vercellotti, M. L., Juffs, A., & Naismith, B.. Multiword seuences in English language learners' speech: The relationship between trigrams and lexical variety across development.. System, 98 , https://doi.org/10.1016/j.system.2021.102494

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

This project investigates the development of complexity, accuracy, and fluency in the speech of English language learners. This research is classroom-based, pedagogy-driven language development research. Participants were adult learners entering an Intensive English Program (IEP) in the United States during the year 2010. The longitudinal data were collected during class meetings, as part of the speaking curriculum at every instruction level in the IEP. The data, two-minute monologues on a given topic, were collected by each student in the IEP multiple times per academic semester, and many students remained in the IEP across multiple semesters. The topics varied by semester, by level, and sometimes by class section for pedagogical reasons. The speeches were transcribed by a native speaker experienced in transcribing non-native speech and segmented into sentence-level units, AS-units, following Foster, Tonkyn, and Wigglesworth (2000).

Filenames are structured in this way

the first 4 digits are the anonymous ID of the speaker (ELI student).
the number after the underscore is the level of student in the ELI: 3= low intermediate, 4 = high-intermediate, and 5= low advanced (one semester from matriculating into regular college courses).
the letter is the class section (which may have a different language instructor).
the last number is which speech of the semester. In summer, the learners gave 2 speeches, but in Fall and Spring, they often gave 3 speeches, which were in response to different prompts. The prompt can usually be found in the first utterance of the speech because the ELI asked students to state the topic. The list of prompts is listed in Appendix A (p. 196) of the dissertation.

Acknowledgements

Andrew Yankes reformatted this corpus into accord with current versions of CHAT.