Classification of Honorific Levels in Javanese: Comparison Between Rule-Based, Classical Machine Learning and Transformer-Based Methods
Informasi
JurnalProceedings of 2025 International Conference on Asian Language Processing, IALP 2025, 2025 International Conference on Asian Language Processing (IALP)
PenerbitInstitute of Electrical and Electronics Engineers Inc., IEEE
Halaman170 - 175
Tahun Publikasi2025
ISBN979-833158979-0
Jenis SumberScopus
Abstrak
We study the classification of honorific levels in Javanese, a regional language in Indonesia. Generally, The honorific levels are divided into N goko (the casual nuance) and Krama (the formal or polite nuance). We compare the performance of rule-based, classical machine learning approaches (Logistic Regression, Gaussian Naive Bayes, SVM, Random Forest, CatBoost) and Transformer-based methods (BERT with Word2Vec/fastText word embeddings). We also built a new dataset of 979 sentences that were manually annotated. Unfortunately, this dataset exhibits severe class imbalance (1:10.125 for Ngoko: Krama ratio), which we addressed through oversampling techniques, SMOTE and Polynom-fit-SMOTE. The experiment results show that BERT achieved the highest performance with 98.40 % precision, 93.36 % recall, and 95.30 % F1-score, significantly outperforming other methods. This work demonstrates the effectiveness of language-specific pretraining for capturing Javanese honorific-level nuances. © 2025 IEEE.
Dokumen & Tautan
