Langman Corpus

Juliet Langman
Division of Bicultural-Bilingual Studies
University of Texas at San Antonio
juliet.langman@utsa.edu

Participants:	11
Type of Study:	interview
Location:	Hungary
Media type:	audio
DOI:	doi:10.21415/T5C027

Citation information

Publications using these data should cite:

Langman, J. (1998). “Aha” as Communication Strategy: Chinese speakers of Hungarian. In Regan, V. (ed.) Contemporary Approaches to Second-language Acquisition in Social Context: Crosslinguistic Perspectives. Dublin: University College Dublin Press, 32-45.

Langman, J. (1997). Analyzing second-language learners’ communication strategies: Chinese speakers of Hungarian. Acta Linguistica Hungarica, 44, 277–299.

Langman, J. (1995-1996). The role of code-switching in achieving understanding: Chinese speakers of Hungarian. Acta Linguistica Hungarica, 43, 323–344.

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

This corpus is made up of 10 files consisting of interviews conducted in 1994 with 11 Chinese immigrants living in Hungary. The bulk of the conversation is in Hungarian, although in the case of those who speak English there is also English, and in the case of one transcript (KIN10) there are significant amounts of Chinese (with a Hungarian translation in a %tra dependent tier). Interviews focused on issues related to their arrival in Hungary as well as their daily life activities. With the exception of KIN2 and KIN10 none of the participants had had formal training in Hungarian. Interviewers were the researcher, as well as three different Hungarian undergraduates. Data were collected with two purposes in mind: the analyses of communicative strategies among adult second-language learners learning in a nonstructured environment, and the analysis of the acquisition of morphology of an agglutinative language.

Partial support for data collection and analysis was provided through a grant awarded to Dr. Csaba Pléh, OTKA grant T018173, A magyar morfológia pszicholingvistikai vizsgálata (The psycholinguistic study of Hungarian morphology).

Special Coding

The following additional form markers have been used in the (*) speaker lines of the transcripts:
@e = English word, e.g., go@e
@c = Chinese word, e.g., xie@c
@a = adult-invented word, e.g., pigyilni@a

The following special codes have been used on the %lan tier:
$MIX utterances with some form of code-switching or borrowing
$CHI utterance in Chinese (used only in KIN10)

The following special codes have been used on the %rep (repetition) tier to identify:
1. whose speech is repeated

SRP self-repetition of immediately previous utterance
ORP other repetition of immediately previous utterance
SRE self-repetition of an utterance not immediately preceding
ORE other repetition of an utterance not immediately preceding

2. the function of the repetition

MIS misunderstanding, prompting, asking for clarification
VAL validation repetition of previous utterance
EXP explanation to ease understanding
COR correction and language learning functions

3. the form of the repetition

PAR partial
COM exact
TRA translation
PLU repetition including additional information

These three types of codes could be combined as in: %rep: SRP:MIS:PAR

Error coding focused exclusively on morphology and is represented on two separate tiers, %err and %mor. The %mor tier shows the actual target form for each error marked. The %err tier marks the types of errors using the following codes:
$OMI: omission
$OMI:PAR partial omission
$INS: insertion
$INS:PAR partial insertion
$SWI switched form
$SWI:PAR partially switched form

Langman Corpus

Browsable transcripts

Download transcripts

Media folder

Citation information

Project Description

Special Coding