Asian English Speech Corpus Project

The TaiWaN Asian English Speech cOrpus Project (TWNAESOP) is part of the ongoing multinational (AESOP) whose aim is to build up a consortium of English speech corpus. Each research team will use a common recording setup and share an experimental task set, and will develop a common, open-ended annotation system. AESOPcollected corpora which represent many varieties of English spoken in Asia will be an open resource, available to the research community at large. In 2009, the TWNAESOP team presented data design to be used as core content for all AESOP collaborators so that the consortium could constructs databases with a small portion of overlapping content for commonality investigations as well as additional data to suit individual needs [1]. ·  best ielts coaching in chennai The 2009 AESOP core design focused on eliciting production of a large range of segmental, suprasegmental characteristics from 6 read speech tasks and 2 spontaneous speech tasks, namely, (1) Target Words in Carrier Sentences (2) Target Words at Phrase Boundaries (3) Target Words in Contrastive Stress Positions (4) Stressed and Unstressed Function Words (5) Prosodic Disambiguation (6) Reading a Passage The North Wind and the Sun (1) Computer-Prompted Dialogue (2) Picture Description Task [1, 2, 3]. Since the Taiwan AESOP research team (TWNAESOP) which we represent were among the first AESOP members to obtain research funds (TaiWaN Asian English Speech cOrpus Project, Chiang Chih-Kuo Foundation for International Scholarly Exchange (CCKF) project number DB002-D-08, 2009.7.1-2012.6.31), we were able to start data collection from the fall of 2009. Speech data of the core content from 500 speakers were successfully collected in 2010, including 12 American L1 speakers and 488 Taiwan Mandarin speakers [4]. However, while the controls used for the core design included the phonetics aspects of English segmental and supra-segmental in one hour of recording time per speaker, the collected data did not include sufficient phonotactic variations or a wide range of prosodic and pragmatic aspects of L2 English as well. In order to collect more speech data to remedy the above mentioned problems, we have designed a different set of materials. A review of some of the major corpora including TIMIT, IViE, UME-ERJ and CASSAESOP revealed the following picture. The 1993 TIMIT corpus of read speech was designed to provide speech data for acoustic-phonetic studies and for the development and evaluation of automatic speech recognition systems [5]. Its content includes read speech of (1) 460 phoneticallycompact sentences (2) 2 calibration sentences and (3) 1890 randomly selected sentences [6]. The corpus has been widely used by the speech science community since its appearance and proved to be quite suitable for general acoustic-phonetic research. The 1998 IViE corpus by the Phonetics Lab, University of Oxford is designed to investigate intonation variation of British English. Its contents includes read speech of (1) 22 phoneticallycontrolled sentences with different grammatical structures sentences and (2) a passage of The Cinderella Fairy Tale ; and spontaneous speech of (3) a retold version of the Cinderella fairy tale (4) a map task and (5) discussion on a given topic [7]. The 2000 UME-ERJ (Utilization of Multimedia to Promote Higher Education Reform-English Read by Japanese) corpus of read speech by Tokyo University is designed to support CALL (Computer Assisted Language Learning) related research through collecting segmental and suprasegmental speech materials read by Japanese students. The content of the UME-ERJ corpus design includes 8 sentence sets and 5 words sets. The sentence sets consist of the TIMIT phonetically compact sentences, single intonational phrases, sentences with intonation markings and sentences with stress markings. The word sets consist of list of single words, short phrases and compound words [8, 9]. The 2010 CASSAESOP corpus by The Chinese Academy of Social Sciences is by far the largest in size and most comprehensive in content materials. It consists of all of the phonetically compact sentences from TIMIT, the Cinderella Fairy Tale from IViE, the text pieces from UME-ERJ, the AESOP core sentences [1], and two additional sections of their own design, namely the CASS-E (English) and CASS-C (Chinese). The CASS-E set was specially designed to cover a large variety of intonation variations. Our goal is to collect sufficient phonotactic variations as well as more prosody and discourse data, while facilitating the AESOP goal of open resource and data sharing at the same time. We discussed with the CASSAESOP team as well as the Japan AESOP team at Waseda University and reached a consensus to collect some common data of mutual interests as well. As a result, the TWNAESOP2 project reported below is set to collect an entirely different set of speech data with the following goals: (1) a large coverage of phonotactic variations through read speech of isolated words (2) the passage of The Cinderella Fairy Tale that is considerably longer than “The North Wind and the Sun” used before (3) the CASSAESOP designed broad/narrow focus sentences and (4) the Waseda developed DCT (Discourse Completion Tasks) discourse data through elicited dialogues. In the following sections, we will report how we designed a set of phonotactic variations (Sec. 2.1.), what we adopted from CASSAESOP (Sec. 2.2; 2.3), and how we tailored the Waseda DCT to suit Taiwan speakers (Sec. 2.4). The speech data we collect will be termed TWNAESOP2 2. CORPUS DESIGN 2.1. Phonotactic Variations from Phonetically-rich Isolated Words To generate a large coverage of phonotactic variations, the design rationale of the PhoneBook which aims to design a set of word lists that cover most if not all significant coarticulatory variants in a large variety of phonetic contexts was adopted [10]. The PhoneBook developed a two-stage process to generate word lists from the CMU electronic dictionary. At the first stage, a list of candidate words is generated through a manual filtering process that deletes unsuitable words to avoid reading difficulties that speakers may experience. Foreign words, difficult and obscure words, words with multiple acceptable pronunciations, potentially embarrassing words, likely misread word, homonym pairs and acronyms are deleted through a filtering process. The phonemic contexts, such as tri-phones and syllable-based templates, can therefore be enumerated form candidate word list. At the second stage, the final word list is generated by choosing a subset of candidate word lists that captures each enumerated context at least once. The frequency of phonemic contexts is used as reference of scoring estimation. Following the PhoneBook design rationale but with automatic instead of manual filtering, the following open resources were used for the present project, namely, (1) the syllabified CMU electronic dictionary [11] and (2) part of speech (POS) from a word frequency list [12].

Leave a Reply

Your email address will not be published.