Indo-WDSimpleQuAD2.0: an Indonesian Benchmark Dataset for Knowledge Graph Question Answering System

Penulis: Yani, Mohammad; Setiawan, Wawan; Frihatmawati, Rizky; Atmamudin, Wali; Mustamiin, Muhamad
Informasi
JurnalInternational Journal of Computing
PenerbitResearch Institute of Intelligent Computer Systems
Volume & EdisiVol. 24,Edisi 4
Halaman1 - 10
Tahun Publikasi2025
ISSN17276209
Jenis SumberScopus
Abstrak
We propose Indo-WDSimpleQuAD2.0, a silver standard for an Indonesian-language benchmark dataset developed from SimpleQuestions and LC-QuAD 2.0 based on Wikidata. This dataset development is proposed due to the current absence of a representative KGQA benchmark dataset in Indonesian language. SimpleQuestions and LC-QuAD 2.0 were chosen because, in terms of question type variety and complexity, these datasets serve as supersets of other available datasets. Indo-WDSimpleQuAD2.0 comprises 27,924 questions for SimpleQuestions and 31,821 for LC-QuAD 2.0. Indo-WDSimpleQuAD2.0 was developed through a rigorous translation process by English language experts and native Indonesian speakers. This translation process was conducted in three rigorous stages: initial translation, validation and verification, and finalization of the translation. To ensure the quality of this dataset, the authors applied four criteria: translation accuracy, writing quality, semantic integrity, and annotation process. Indo-WDSimpleQuAD2.0 can serve as the first Indonesian-language KGQA benchmark dataset based on Wikidata, thus supporting future research and development of Indonesian KGQA systems. © 2025, Research Institute of Intelligent Computer Systems. All rights reserved.
Dokumen & Tautan

© 2025 Universitas Indonesia. Seluruh hak cipta dilindungi.