A phoneme is the smallest sound in a sentence that has no meaning but plays the most important role in meaning formation. Phoneme identification from a video that shows an actor speaking Indonesian sentences is an important part of developing visual-to-text applications. This application can translate mouth movements from a video into a series of Indonesian texts so that it can facilitate communication for the deaf. This study aims to optimize the performance of the classification process on image data, including as many as 32 phonemes from video extraction results so that they can be used to support the phoneme identification process to realize visual-to-text applications in Indonesian. The classification algorithm used in this study was neural network backpropagation. Some of the proposed efforts to optimize the performance of the classification process included using a comparison of the proportion of datasets, estimating the number of hidden layers, and reducing the dimensions of the dataset using the principal component analysis (PCA) method to reduce the amount of data that is considered less important without reducing the level of information. The dimensions of the data before reduction were 1280 × 7100 data matrices and 1280 × 50 data matrices after reduction. The accuracy results obtained in data optimization using the PCA were equal to 87.16% with a data proportion of 8 : 2 and fifty important data points were used in the data optimization process using the PCA.
Copyrights © 2023