HKPU Corpus

Angel Chan
Department of Chinese and Bilingual Studies
Hong Kong Polytechnic University
angelwschan@gmail.com, angel.ws.chan@polyu.edu.hk

Participants:	20
Type of Study:	oral interviews
Location:	Hong Kong
Media type:	audio
DOI:	doi:10.21415/T5489G

Citation information

Chan, A., Feng, Z. H., Yang, W. C. (2013). A new multimedia shared L2 spoken Mandarin Chinese corpus: construction and linguistic analyses. Paper presented at the Annual Meeting of the International Association of Chinese Linguistics (IACL), 21. 7-9 June 2013. Taiwan.

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

This corpus builds on a completed MA student dissertation by Jeff Zhen-Hui Feng with technical support by a doctoral student Wenchun Yang and a research assistant Shanrong Xie under the supervision of Angel Chan. It is a small video-linked L2 spoken Mandarin Chinese corpus featuring 14 adult subjects whose L1 is English and 6 L1 adult participants as controls. It is hoped that the addition of this corpus could further raise the visibility of SLA learner corpora featuring Chinese as the target language at Talkbank in this collaborative data-sharing international consortium.

Participants

Fourteen L2 adult learners of Mandarin (L1: English) and six L1 native Mandarin speakers (as L1 controls), aged between 20 and 70, were recruited. The 6 L1 participants were selected to match 6 L2 participants based on gender, age and education level to enable systematic L1-L2 comparisons for future research. Tables 1 and 2 below provide the background information of our L2 participants and the L1 participants respectively. All the L2 participants are able to communicate verbally in Mandarin Chinese at least at simple sentence level. All participants have given their written consent to participate in this project.

Background Information of the 14 L2 Subjects (L1 English)

No. Subject Age Gender Education Age_Started Contexts Other_Languages
1 Aa 23 M Bachelor 22 Classroom/Conversation French, Swedish
2 Al 34 M Bachelor 30 Classroom/Conversation/Reading/Self-learning None
3 Ba 20 M Bachelor 18 Classroom/Conversation Spanish, German
4 Ga 42 M Doctor 19 Conversation/Reading/TV/movies Spanish
5 Ge 34 M Master 27 Conversation Spanish
6 Ja 31 M Master 24 Classroom/Conversation/Self-learning Spanish, French
7 Je 38 M Doctor 30 Conversation French
8 Jo 37 M High School 34 Classroom/Conversation None
9 Mi 24 F Master 18 Classroom/Conversation/Reading German, French
10 Mo 70 M Doctor 22 Classroom/Conversation/Reading/Self-learning French
11 Na 23 M Bachelor 20 Classroom/Conversation French
12 Pa 48 M Master 28 Classroom/Conversation/Self-learning German, French
13 Ph 24 M Bachelor 20 Classroom/Conversation/Self-learning None
14 Ta 36 F High School 33 Classroom/Conversation None

No.	Subject	Age	Gender	Education	Age_Started	Contexts	Other_Languages
1	Aa	23	M	Bachelor	22	Classroom/Conversation	French, Swedish
2	Al	34	M	Bachelor	30	Classroom/Conversation/Reading/Self-learning	None
3	Ba	20	M	Bachelor	18	Classroom/Conversation	Spanish, German
4	Ga	42	M	Doctor	19	Conversation/Reading/TV/movies	Spanish
5	Ge	34	M	Master	27	Conversation	Spanish
6	Ja	31	M	Master	24	Classroom/Conversation/Self-learning	Spanish, French
7	Je	38	M	Doctor	30	Conversation	French
8	Jo	37	M	High School	34	Classroom/Conversation	None
9	Mi	24	F	Master	18	Classroom/Conversation/Reading	German, French
10	Mo	70	M	Doctor	22	Classroom/Conversation/Reading/Self-learning	French
11	Na	23	M	Bachelor	20	Classroom/Conversation	French
12	Pa	48	M	Master	28	Classroom/Conversation/Self-learning	German, French
13	Ph	24	M	Bachelor	20	Classroom/Conversation/Self-learning	None
14	Ta	36	F	High School	33	Classroom/Conversation	None

Background Information of the 6 L1 Mandarin Subjects

No. Age Education Level Gender L2_Match
1 Do 35 Doctor M Je
2 Gu 38 Bachelor M Ga
3 Qi 49 Bachelor M Pa
4 Wu 35 Doctor M Jo
5 Ya 24 Master F Mi
6 Zh 32 Master F Ta

No.	Age	Education	Level	Gender	L2_Match
1	Do	35	Doctor	M	Je
2	Gu	38	Bachelor	M	Ga
3	Qi	49	Bachelor	M	Pa
4	Wu	35	Doctor	M	Jo
5	Ya	24	Master	F	Mi
6	Zh	32	Master	F	Ta

Procedures

Speech samples were collected from each of the 20 participants on an individual basis. Each participant engaged in a structured narrative task, retelling in Mandarin Chinese the classic frog story “Frog, Where Are You?” (Mayer 1969, Berman & Slobin 1994) commonly used in cross-linguistic research. The process was videotaped with a high-quality audio track.

As is typical for studies using the frog story research tool, each participant reads the standard “frog story” storybook that tells a story in 24 pictures with no words, and then is asked to tell or retell the story in the target language. In addition, in this project, after reading the wordless storybook once, each participant (L1 and L2 participants alike) would listen once to the story narrated and audio-recorded in English with a standard story script to ensure that s/he became familiar with the story contents, before s/he retold the story in the target Mandarin language. The selected L1 Mandarin participants have the level of L2 English proficiency being able to understand the story narration in English.

The procedures in constructing the corpus follow the Talkbank format. We also conducted inter-person reliability checks of the transcriptions, the video-linking and synchronization of the data, as well as manual disambiguation of the automatic tagging.

Acknowledgment

The construction of this corpus was supported by a grant (project code: 1-ZVAQ) from the Hong Kong Polytechnic University to Angel Chan.

HKPU Corpus

Browsable transcripts

Download transcripts

Media folder

Citation information

Project Description

Participants

Procedures

Acknowledgment