soraUVALAL Corpus

Sonja Mujcinovic
Department of English Philology
University of Valladolid


Raquel Fernández Fuertes
Department of English Philology
University of Valladolid


Participants: 106
Type of Study: experimental
Location: Bosnia, Denmark, Spain
Media type: no audio
DOI: doi:10.21415/CKSF-CH67

Mujcinovic, S., (2020). English subjects in the linguistic production of L1 Spanish, L1 Bosnian and L1 Danish speakers: comparative grammar, typological similarity and transfer [Unpublished doctoral dissertation]. University of Valladolid.

Mujcinovic, S., (2015). The analysis of subjects in the oral and written production of L2 English learners: transfer and language typology. In Pedro A. Fuertes-Olivera et al. (eds.), Current Work in Corpus Linguistics: Working with Traditionally-conceived Corpora and Beyond. Selected Papers from the 7th International Conference on Corpus Linguistics (CILC2015). Procedia Social and Behavioral Sciences. Amsterdam: Elsevier

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

The aim of this project is to deal with typological difference/similarity with a focus on sentential subjects. In order to do so, both oral and written production data from a total of 106 sequential bilingual children were elicited. For all the children English was their L2 and they differ in their L1 (i.e., Spanish, Bosnian or Danish). This way we could focus on L2 English production when two typologically similar languages are in contact in the case of Danish and English (both being [-null subject] languages), as well as when two typologically different languages are in contact in the case of Spanish and Bosnian with English (the first ones being [+null subject] languages, while English is [-null subject]). Together with typological similarity, these data have also been elicited to address modality (i.e., oral and written production data) and time of exposure (i.e., data from participants with 2 and 4 years of exposure to English).

The purpose of this dataset was to deal with the potential interactions between typology, exposure and modality in order to further characterize L1 transfer. The data elicitation was conducted in 2014 and 2015.


The 106 participants that took part in this study are L2 English children, who were first classified into three groups according to their L1 (Spanish, Danish or Bosnian) and then subclassified into two more groups each, according to the time they have been instructed in the L2 (2 years or 4 years) (see table below). The data were collected in the schools the participants attended in the country where they lived (i.e., Spain, Denmark and Bosnia).
L1 group # age years instruction
Spanish 1 13 9-10 2
- 2 20 11-12 4
Bosnian 1 17 11-12 2
- 2 22 12-13 4
Danish 1 16 11-12 2
- 2 18 12-13 4

The criteria applied when selecting the participates were the following:

The BO folder is for Bosnian; the DK folder is for Danish; and the SP folder is for Spanish. The Spanish students came from Valladolid; the Danish students came from Soroe; and the Bosnian students came from Banja Luka. Students in folder 1 had two years of L2 English and students in folder 2 had 4 years of L2 English

Oral task

The data from the oral task were elicited via a semi-guided interview. The questions asked were formulated so that the participants would answer with complete sentences. If this was not the case, they were encouraged to do so.

The participants were interviewed individually and voice recorded at their schools. Each individual interview lasted 8 to 16 minutes. Even though a protocol with different topics proposed was established (e.g., family, hobbies, interests, school, preferences, music, friends, etc.), the participants were encouraged to talk about any desired topic.

Written task

The data from the written task were elicited via a wordless picture sequence task adapted from the A1-ball story from the Edmonton Narrative Norms Instrument (ENNI) (Schneider et al. 2005; The story consists of five pictures that showed an elephant and a giraffe playing with a ball.

The changes that have been made to the original ENNI story are related to the characters and their biological gender. Thus, the characters in the adapted version are Mary Giraffe and Tom Elephant.

The participants were instructed in their L1. The task was conducted in a classroom where the whole class participated together. The pictures were projected on a screen for all to see. After seeing the story, the participants had one hour in total to complete the task and they were allowed to ask for vocabulary which, in the case of verbs, was provided to them in a non-inflected form.


Audio-recordings and transcriptions were done by Sonja Mujcinovic, Luis Miguel Toquero Pérez y Tamara Gómez Carrero. This study has been conducted as part of the activities of the UVALAL (University of Valladolid Language Acquisition Lab) research group.

Funding was supplied by these sources: