Traditional education methodologies, where an instructor teaches non-native learners, might be expensive and constrained with regard to time and space. However, frequent and repetitive practice is necessary for language acquisition. As a result, Computer-Assisted Language Learning (CALL) has been recently receiving attention because CALL systems provide easy accessibility for frequent and repetitive practice. For language acquisition, previous research shows that teaching only the pronunciation of segments fails to significantly improve the comprehensibility of non-native speakers’ spontaneous speech , whereas teaching prosody does improve comprehensibility . Prosody has also been shown to play a more important role than segments in the judgment of comprehensibility and/or accenting of nonnative speech . · Among prosodic features, we focus on pitch accents, which, in English, serve as a cue for identifying prominence of a syllable within a word and are handled as a type of tonal event in Tones and Break Indices (ToBI) that contain a set of conventions widely used for transcribing and annotating the prosody of speech. Because, in English, pitch accents are typically realized with words that carry important information in a sentence, non-native learners should practice pitch accents to acquire higher-level skills of foreign language speech. In the proposed work, we simply focus on accented words having pitch accents in a sentence to avoid data sparseness caused by detailed classification of accentuation for each syllable. Previous studies on pitch accents have mainly focused on techniques to improve detection performance as a way to evaluate learners’ proficiency. However, this study focuses on the integration of technologies to build an automatic pitch accent feedback system as well as the feedback itself in terms of effectiveness and appropriateness. In addition, previous studies used native speech as a reference for direct comparison with learners’ speech, but in this study, a predicted pitch accent pattern is taken as a reference and is represented by accented words rather than accented syllables. This means that any sentence can be employed for practice, so that practice materials are not required. This study examines the following items: (1) the construction of a CALL system for English pitch accents, (2) the adaptation of predicted pitch accent patterns as a reference, (3) the adaptation of English speech corpus spoken by Korean learners to the pitch accent detection model and (4) the evaluation of learning effectiveness with a pitch accent CALL system. The remainder of this paper is structured as follows: Section 2 describes the materials used in our experiments, Section 3 introduces technologies that predict and detect pitch accents, Section 4 presents a detailed description of our experimental designs and results and Section 5 draws a conclusion.
A corpus for the prediction model Our pitch accent feedback system contains two models that require training data: (1) a pitch accent prediction model and (2) a pitch accent detection model. To formulate a prediction model that predicts pitch accents to be imposed on a given sentence, we used the Boston University radio news corpus (BU corpus) , in which pitch accents are annotated in each sentence. This corpus consists of seven hours of speech recorded from seven native announcers and includes orthographic transcription, phonetic alignments, part-ofspeech (POS) tags and prosodic labels, including pitch accents. In the ToBI, pitch accents are annotated as H*, L*, L*+H, L+H*, H+!H*. For example, in the sentence “In April, the S.J.C.’s current leader Edward Hennessy reaches a mandatory retirement age of seventy, …”, the accented words are “In”, “S.J.C.’s”, “leader”, “Edward”, “Hennessy”, “mandatory”, “retirement” and “seventy” (Fig. 1). 2.2. A corpus for the detection model To formulate a detection model that can detect pitch accents in the speech of Koreans learning English, we used the Korean Learner’s English Accentuation corpus (KLEAC corpus) . This corpus consists of six hours of speech in English by 75 native Korean speakers (5,500 sentences), including orthographic transcription, accent marks and proficiency levels. The inter-annotator agreement for accents is 87.1%. Accents were manually labeled by well-trained phonetics experts, annotated approximately with the same guideline used in the BU corpus and partly cross-checked with the Praat tool . For example, in the sentence “For some reason, I don’t feel like going out”, the accented words are “some”, “reason”, “I”, “don’t”, “feel”, “going”, and “out” as the marks labeled on the third tier (Fig. 2). A comparison of the two corpora shows that Korean speakers tend to generate more pitch accents, as shown in Table 1, because Korean learners tend to care about exact pronunciation of words due to low proficiency.