A Comparison between SymSpell and a Combination of Damerau-Levenshtein Distance with the Trie Data Structure

Penulis: Audah, Hanif Arkan; Yuliawati, Arlisa; Alfina, Ika
Informasi
Jurnal2023 10th International Conference on Advanced Informatics: Concept, Theory and Application, ICAICTA 2023
PenerbitInstitute of Electrical and Electronics Engineers Inc.
Halaman -
Tahun Publikasi2023
ISBN979-835032991-9
Jenis SumberScopus
Sitasi
Scopus: 2
Google Scholar: 3
PubMed: 3
Abstrak
Non-word error results from a spelling error where the word itself is not in the dictionary and is not a known word. This study compares two non-word error correction methods for Indonesian: SymSpell and a combination of Damerau-Levenshtein distance with the trie data structure (DLTrie). We evaluated the performance of both methods for isolated-word and context-dependent cases. For SymSpell, we implemented its two variants: weighted and unweighted. Furthermore, we enriched the KBBI V dictionary with additional words from Wiktionary to form an Indonesian dictionary of 91,557 words. To evaluate both methods, we built a synthetic dataset containing 58,532 misspellings. The evaluation measures the best-match accuracy, candidate accuracy, and run time. The experiment shows that for isolated-word cases, SymSpell performed better than DLTrie as it obtained a higher best-match accuracy and a lower run time than DLTrie. The best-performing SymSpell implementation is the weighted SymSpell, which has the best-match accuracy of 66.79%, candidate accuracy of 99.33%, and a run time of 0.39 ms per word. On the other hand, for context-dependent cases, SymSpell obtained a slightly lower best-match accuracy of 89.58% compared to DLTrie's 89.93%, but it was faster by several orders of magnitude. © 2023 IEEE.
Dokumen & Tautan

© 2025 Universitas Indonesia. Seluruh hak cipta dilindungi.