Samantha Sie Theoretical and Applied Linguistics University of Cambridge slws2@cam.ac.uk |
Participants: | 175 |
Type of Study: | narrative |
Location: | Malaysia, United Kingdom |
Media type: | audio |
DOI: | doi:10.21415/VNRH-8760 |
Sie, S. (2023). SLABank Database: English - SME Corpus. DOI:10.21415/VNRH-8760
In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.
The Standard Malaysian English (SME) corpus was compiled from November 2019 to November 2020 as part of the author’s main PhD project, which examined crosslinguistic influence in the ultimate acquisition of Standard English in the Postcolonial Englishes context of Malaysia. Malaysia was chosen as the primary research site for two main reasons. Firstly, it has a linguistically diverse makeup, with Malay, Chinese, and Tamil being some of the most widely spoken first languages (L1s) amongst the local speech communities. Secondly, it has a long-standing history with the English language, which not only enjoys an elevated albeit restrictive status of an official second language (L2) but has also undergone structural indigenisation due to linguistic and sociolinguistic factors such as protracted language contact, L2 acquisitional mechanisms, identity rewritings, and bilingual creativity. Accordingly, the PhD project set out to investigate the extent to which different L1s – including nativized English, which sees a growing number of L1 speakers – played a facilitative or an adverse role in the ultimate acquisition of Standard English.
The corpus comprises 175 elicited narratives produced by adult Malaysians (n = 145; mean age = 20 years, SD = 1.21) and British controls (n = 30, mean age = 21, SD = 2.54). These participants were mostly university students studying in Malaysia (i.e., Universiti Malaya) and the UK (i.e., University of Cambridge), respectively. There are altogether 103,607 words from about 14 hours of audio recording in this corpus.
The narrative task was carried out on a one-to-one basis. Participants were asked to watch an animated silent film called “Snack Attack” (2012) and narrate the story to the researcher. As the morphosyntactic features under inquiry were English finiteness morphemes (i.e., tense inflections, copula and auxiliary BE, auxiliary DO), four questions were presented in the following order to elicit them from participants:
As the data collection had to be conducted in two phases due to Covid disruptions, the narrative sessions took place in different mediums. Consequently, the quality of the audio recordings was affected to a certain degree. In the first phase (November 2019 – March 2020; pre-Covid), the narrative sessions were recorded in person using a Sony ICD-UX560F sound recorder and were held in a quiet, public room (e.g., research office, seminar room) on campus. In the second phase (November 2020; during Covid), the narrative sessions were conducted via Zoom and were recorded using the audio-recording function provided by the video conferencing programme.
The PhD project was carried out in line with the ethical guidelines set by the Ethics Committee of the Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge. Participants were informed well in advance about the aims and requirements of the study and took part on a voluntary basis. Most of them gave signed consent to have their anonymised audio files made available in an open-access language corpus, such as TalkBank. There were, however, three individuals who did not give permission for their audio files to be uploaded publicly. Therefore, the transcripts of these three individuals are not audio-linked, whereas the rest are. Finally, for anonymisation purposes, unique IDs were assigned to all participants.
Notes in the transcripts’ @Comment line include:
Reference: Snack Attack (2012). Cadelago, A. Metanoia Films and Arc productions.