From Corpus to Benchmark: Evaluating Pretrained Language Models for Indonesian-Javanese Krama Translation

Penulis: Yuliawati, Arlisa; Alfina, Ika; Budi, Indra
Informasi
JurnalProceedings of 2025 International Conference on Asian Language Processing, IALP 2025, 2025 International Conference on Asian Language Processing (IALP)
PenerbitInstitute of Electrical and Electronics Engineers Inc., IEEE
Halaman199 - 204
Tahun Publikasi2025
ISBN979-833158979-0
Jenis SumberScopus
Abstrak
Javanese, a regional language in Indonesia, has an honorific system called Ngoko and Krama. While Ngoko shows a more casual style that can be used between people with similar social status, Krama gives a polite nuance that is usually used to talk to people with higher social status. However, the Javanese honorific is underexplored due to the limited knowledge and the dataset's unavailability. This study aims to explore machine translation (MT) involving Javanese Krama by developing an Indonesian-Javanese Krama parallel corpus (NgokoKrama) and benchmarking three prominent pretrained language models: IndoBART-v2, Indo-T5, and NLLB-200. The three models were fine-tuned using NgokoKrama, which consists of 1000 pairs of Indonesian-Javanese Krama sentences, and evaluated under lowresource conditions to determine the models' effectiveness and adaptability. The evaluation was conducted by calculating the SacreBLEU score. Among the experimental results on the three models, NLLB-200, the state-of-the-art multilingual MT, shows its adaptability to both translation directions for Indonesian-Javanese Krama with the highest performance in both directions. Meanwhile, IndoBART-v2 and Indo-T5 were ranked second and third, respectively. This result shows that with limited amount yet good quality of parallel corpus, translation involving Javanese Krama can be performed well by fine-tuning pretrained language models already trained with Javanese language in general (dominated by Ngoko or all in Ngoko). © 2025 IEEE.
Dokumen & Tautan

© 2025 Universitas Indonesia. Seluruh hak cipta dilindungi.