Rila Mandala
Sekolah Teknik Elektro dan Informatika Institut Teknologi Bandung Jl. Ganesha 10 Bandung 40132

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

Robust Automatic Speech Recognition Features using Complex Wavelet Packet Transform Coefficients Tjong Wan Sen; Bambang Riyanto Trilaksono; Arry Akhmad Arman; Rila Mandala
Journal of ICT Research and Applications Vol. 3 No. 2 (2009)
Publisher : LPPM ITB

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.5614/itbj.ict.2009.3.2.4

Abstract

To improve the performance of phoneme based Automatic Speech Recognition (ASR) in noisy environment; we developed a new technique that could add robustness to clean phonemes features. These robust features are obtained from Complex Wavelet Packet Transform (CWPT) coefficients. Since the CWPT coefficients represent all different frequency bands of the input signal, decomposing the input signal into complete CWPT tree would also cover all frequencies involved in recognition process. For time overlapping signals with different frequency contents, e. g. phoneme signal with noises, its CWPT coefficients are the combination of CWPT coefficients of phoneme signal and CWPT coefficients of noises. The CWPT coefficients of phonemes signal would be changed according to frequency components contained in noises. Since the numbers of phonemes in every language are relatively small (limited) and already well known, one could easily derive principal component vectors from clean training dataset using Principal Component Analysis (PCA). These principal component vectors could be used then to add robustness and minimize noises effects in testing phase. Simulation results, using Alpha Numeric 4 (AN4) from Carnegie Mellon University and NOISEX-92 examples from Rice University, showed that this new technique could be used as features extractor that improves the robustness of phoneme based ASR systems in various adverse noisy conditions and still preserves the performance in clean environments.
Penerapan Convolutional Neural Network (CNN) dan Euclidean Distance Matrices (EDM) untuk Mengurangi False Positive pada Pengenalan Aktifitas Finger Point Call Rila Mandala; Mohammad Deny Safari
JEPIN (Jurnal Edukasi dan Penelitian Informatika) Vol 9, No 1 (2023): Volume 9 No 1
Publisher : Program Studi Informatika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26418/jp.v9i1.61716

Abstract

Aktifitas finger point call (FPC) yang mengharuskan operator menunjuk (finger point) dan mengucapkan (call) sebelum menjalankan suatu proses, merupakan aktifitas yang umum diterapkan di industri manufaktur khususnya pada perusahaan Jepang. FPC terbukti efektif mengurangi human error, tetapi operator sering tidak konsisten dalam menerapkan FPC sehingga perlu sistem untuk mendeteksi aktifitas FPC sudah dilakukan dengan baik dan benar. Salah satu metode pengenalan aktifitas (activity recognition) yaitu menggunakan convolutional neural networks (CNN) untuk mengklasifikasikan aktifitas manusia. Namun, aktifitas FPC dinyatakan valid atau invalid setelah memastikan operator menunjuk dengan benar ke arah objek dan menunjuk ke arah referensi, sehingga harus dilakukan analisis pada beberapa frame video. Apabila hanya menggunakan CNN saja, akan menyebabkan tingkat false positive menjadi tinggi, karena CNN akan langsung melakukan analisis pada 1 frame video. Tujuan penelitian ini yaitu mengurangi false positive ketika mendeteksi aktifitas FPC dengan cara melakukan anlaisis lebih lanjut pada hasil deteksi menggunakan euclidean distance matrices (EDM). Hasil penelitian menunjukkan pada percobaan yang diperagakan oleh 1 orang: false positive berkurang hingga 100%, nilai Precision sebesar 1, dan nilai recall sebesar 0,96. Hasil ketika diperagakan oleh 10 orang: nilai Precision sebesar 0,9, dan nilai recall sebesar 0,9. lebih baik dibandingkan YOLOv7 versi original yang nilai Precisionnya hanya sebesar 0,5.