Preparation for a learner experiment For a learner experiment, we trained the prediction model with the entire BU corpus and the detection model with the entire KLEAC corpus. In the feedback part, we set θ1 and θ2 in Equation (1) to 0.3 and 0.5, respectively, for decision boundaries. Though these values were not ensured to show the most effective feedbacks, we empirically considered the values enough to satisfy the learner experiment’s goal, which is to observe any improvement in learners’ pitch accent proficiency in real CALL activity. To verify the learning effectiveness of the proposed system, we designed an experiment in which 10 Koreans, as English learners, utilized the proposed system to practice pitch accents in 20 sentences for one hour, and four English experts then assessed the speech files recorded before and after the practice without knowing their temporal information. The 20 sentences were randomly selected from news articles. ·
Evaluation of the learner experiment To evaluate learning effectiveness of the proposed system, an experiment for an individual learner was conducted as follows: ① Before the practice, utterances for the 20 sentences are recorded. ② For one hour, the learner utilizes the proposed system to practice pitch accents with the 20 sentences alone. ③ After the practice, utterances for the 20 sentences are recorded again. The educational effects of the system are explained by the English proficiency test conducted by the experts, but only from the perspective of pitch accent.
Real performance of prediction and detection models in the learner experiment To measure the real performance of the prediction and detection models utilized in the learner experiment, the English experts conducted different evaluations for the prediction and detection models. For the prediction model, the English experts assigned pitch accents on the given 20 sentences used in the learner experiment without any knowledge of the system’s prediction results. The experts’ assigned outcomes would then be compared to those results. Accuracy and F-measure values of the system’s prediction results were 91.84% and 91.89%, respectively, shown in Table 6. The accuracy of the prediction model is higher in the real learner experiment than in the model validation because the given 20 sentences consist of relatively basic words that are frequently found in the BU corpus which were used to train the prediction model. For the detection model, the English experts evaluated 400 speech files of learners’ utterances that were recorded before and after practice in the learner experiment. Based on a comparison between the system’s detection outcomes and the experts’ evaluation results, Table 6 shows that the accuracy of the detection model is 83.01% in the real learner experiment, which is slightly higher than in the model validation.
Feedback performance in the learner experiment With the utterances and feedback results obtained in the learner experiment, we evaluated accuracies of feedback represented by the feedback part. To evaluate the feedback performance, the English experts focused solely on the feedback part without paying attention to the results of the prediction and detection parts. We gave the feedback part a positive score even if the prediction and detection parts are wrong, but the feedback part reflect precise results so that the feedback part informs learners of the suitable feedbacks. Based on the utterances recorded before and after practice in the learning experience, Table 7 is drawn from pitch accents to be predicted and detected by four English experts, the same individuals who evaluated the learning experiment. The composition ratio in Table 7 represents the proportion of each pair for pitch accent types to be predicted and detected by the experts. For example, the pair of “unaccented-unaccented” for the predicted and detected pitch accents by the experts has a composition ratio of 0.306. The sum of the composition ratio for entire pairs is 1. Table 7 also shows the proportion of each feedback for the pair for pitch accent types. For example, in the pair of “unaccented-accented”, the proportions of positive, negative and empty feedbacks are 0.323, 0.502 and 0.173, respectively. In Table 7, the accuracy of correct feedback is 69.27%, given the assumption that correct feedbacks of the pairs of “unaccented-unaccented”, “unaccented-accented”, “accented-unaccented” and “accented-accented” are positive, negative, negative and positive, respectively. If we consider that correct and empty feedbacks are not wrong, the accuracy of not-wrong feedbacks is 82.77%.