Visual question answering for monas tourism object using deep learning

Penulis: Siregar, Ahmad Hasan; Chahyati, Dina

Informasi

Jurnal2020 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2020

PenerbitInstitute of Electrical and Electronics Engineers Inc.

Halaman381 - 386

Tahun Publikasi2020

DOI10.1109/ICACSIS51025.2020.9263149

ISBN978-172819279-6

Jenis SumberScopus

Sitasi

Scopus: 2

Abstrak

Visual Question Answering (VQA) is a machine learning task, given a pair of image and natural language visual question about the image, the task is to answer the question. It is known that there is no public VQA dataset currently available in Bahasa Indonesia. This research compiles a Monas VQA dataset that uses Bahasa Indonesia in the question and Monas, a memorial monument for Indonesian, as the image specific context to resolve the problem. This research also proposes methods to solve VQA using CNN for image embedding, techniques from the NLP field for sentence embedding e.g. Bag-of-Words, fastText, BERT, and BiLSTM, lastly multimodal machine learning to let both embedded information to interact with each other. The best performing model achieves 68.39% accuracy with architecture impact analysis presented. © 2020 IEEE.

Dokumen & Tautan

Scopus