Classification of Honorific Levels in Javanese: Comparison Between Rule-Based, Classical Machine Learning and Transformer-Based Methods

Penulis: Amin, Iqbal Pahlevi; Yuliawati, Arlisa; Alfina, Ika
Informasi
JurnalProceedings of 2025 International Conference on Asian Language Processing, IALP 2025, 2025 International Conference on Asian Language Processing (IALP)
PenerbitInstitute of Electrical and Electronics Engineers Inc., IEEE
Halaman170 - 175
Tahun Publikasi2025
ISBN979-833158979-0
Jenis SumberScopus
Abstrak
We study the classification of honorific levels in Javanese, a regional language in Indonesia. Generally, The honorific levels are divided into N goko (the casual nuance) and Krama (the formal or polite nuance). We compare the performance of rule-based, classical machine learning approaches (Logistic Regression, Gaussian Naive Bayes, SVM, Random Forest, CatBoost) and Transformer-based methods (BERT with Word2Vec/fastText word embeddings). We also built a new dataset of 979 sentences that were manually annotated. Unfortunately, this dataset exhibits severe class imbalance (1:10.125 for Ngoko: Krama ratio), which we addressed through oversampling techniques, SMOTE and Polynom-fit-SMOTE. The experiment results show that BERT achieved the highest performance with 98.40 % precision, 93.36 % recall, and 95.30 % F1-score, significantly outperforming other methods. This work demonstrates the effectiveness of language-specific pretraining for capturing Javanese honorific-level nuances. © 2025 IEEE.
Dokumen & Tautan

© 2025 Universitas Indonesia. Seluruh hak cipta dilindungi.