Protein Sequence-Based COVID-19 Detection: A Comparative Study of Machine Learning Classification Methods

Penulis: Aminah, Siti; Ardaneswari, Gianinna; Awang, Mohd Khalid; Yusaputra, Muhammad Ariq; Sari, Dian Puspita
Informasi
JurnalJournal of Electrical and Computer Engineering
PenerbitHindawi Limited
Volume & EdisiVol. 2024
Halaman -
Tahun Publikasi2024
ISSN20900147
Jenis SumberScopus
Abstrak
Coronaviruses, including severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), continue to pose a signifcant public health challenge globally, even in 2024. Despite advancements in vaccines and treatments, the accurate classifcation of coronavirus protein sequences remains crucial for monitoring variants, understanding viral behavior, and developing targeted interventions. In this study, we investigate the efcacy of various classifcation methods in accurately classifying coronavirus protein sequences. We explore the use of K-nearest neighbor (KNN), fuzzy KNN (FKNN), support vector machine (SVM), and SVM with particle swarm optimization (PSO-SVM) algorithms for classifcation, complemented by feature selection techniques including principal component analysis (PCA) and random forest-recursive feature elimination (RF-RFE). Our dataset comprises 2000 protein sequences, evenly split between SARS-CoV-2 and non-SARS-CoV-2 sequences. Trough rigorous analysis, we evaluate the performance of each classifcation model in terms of accuracy, sensitivity, specifcity, and receiver operating characteristic area under the curve (ROC-AUC). Our fndings demonstrate consistently high performance across all models, refecting their efcacy in classifying coronavirus protein sequences. Notably, the PCA + PSO-SVM model emerges as the top-performing model, exhibiting the highest classifcation accuracy, specifcity, and ROC-AUC score, demonstrating its efectiveness in distinguishing between SARS-CoV-2 and non-SARS-CoV-2 sequences. Overall, our study highlights the importance of employing advanced classifcation methods and feature selection techniques in accurately classifying coronavirus protein sequences. Te fndings provide valuable insights for researchers and practitioners in the feld of bioinformatics and contribute to ongoing eforts in understanding and combating the COVID-19 pandemic and its evolving challenges. Copyright © 2024 Siti Aminah et al.
Dokumen & Tautan

© 2025 Universitas Indonesia. Seluruh hak cipta dilindungi.