Speech Recognizing Comparisons Between Web Speech API and FPT.AI API

Nowadays, people use speech recognition services for many purposes in their daily lives, such as learning foreign languages, communicating, etc. Therefore, they need to decide which ones to use. High accuracy and short processing time speech recognition service will help improve the work effectively...

Full description

Main Authors: Tran, D.C., Nguyen, D.L., Ha, H.S., Hassan, M.F.
Format: Article
Institution: Universiti Teknologi Petronas
Record Id / ISBN-0: utp-eprints.28893 /
Published: Springer Science and Business Media Deutschland GmbH 2022
Online Access: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85116480522&doi=10.1007%2f978-981-16-2406-3_64&partnerID=40&md5=a1ffad457cfbb1bb01be650287375c0d
http://eprints.utp.edu.my/28893/
Tags: Add Tag
No Tags, Be the first to tag this record!
id utp-eprints.28893
recordtype eprints
spelling utp-eprints.288932022-03-17T02:22:10Z Speech Recognizing Comparisons Between Web Speech API and FPT.AI API Tran, D.C. Nguyen, D.L. Ha, H.S. Hassan, M.F. Nowadays, people use speech recognition services for many purposes in their daily lives, such as learning foreign languages, communicating, etc. Therefore, they need to decide which ones to use. High accuracy and short processing time speech recognition service will help improve the work effectively as the time to re-check output results and the delay time between recognition tasks. For Vietnamese speech recognition, Web Speech API and FPT.AI API are popular. Web Speech API supports multiple languages, while FPT.AI API focuses on Vietnamese as FPT.AI�s products are developed exclusively for the Vietnamese market. In order to assist people in choosing a suitable Vietnamese speech recognition service, in this paper, the speech recognizing accuracy and processing time between Web Speech API and FPT.AI API has been compared. 307 audio files containing Vietnamese speeches which are obtained from FPT Open Speech Dataset were chosen to test the accuracy and the processing time of both APIs. For the accuracy test, FPT.AI API was 0.57 more precise than Web Speech API. However, in the processing time test, Web Speech API was 50.99 faster than FPT.AI API. For Web Speech API, it was mostly accurate to process 12�14-second-long audio files, while FPT.AI API did best when process 2�4-second-long audio files. The audio files with duration values between 2 and 8 seconds are optimal for both APIs to proceed with STT conversions. © 2022, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. Springer Science and Business Media Deutschland GmbH 2022 Article NonPeerReviewed https://www.scopus.com/inward/record.uri?eid=2-s2.0-85116480522&doi=10.1007%2f978-981-16-2406-3_64&partnerID=40&md5=a1ffad457cfbb1bb01be650287375c0d Tran, D.C. and Nguyen, D.L. and Ha, H.S. and Hassan, M.F. (2022) Speech Recognizing Comparisons Between Web Speech API and FPT.AI API. Lecture Notes in Electrical Engineering, 770 . pp. 853-865. http://eprints.utp.edu.my/28893/
institution Universiti Teknologi Petronas
collection UTP Institutional Repository
description Nowadays, people use speech recognition services for many purposes in their daily lives, such as learning foreign languages, communicating, etc. Therefore, they need to decide which ones to use. High accuracy and short processing time speech recognition service will help improve the work effectively as the time to re-check output results and the delay time between recognition tasks. For Vietnamese speech recognition, Web Speech API and FPT.AI API are popular. Web Speech API supports multiple languages, while FPT.AI API focuses on Vietnamese as FPT.AI�s products are developed exclusively for the Vietnamese market. In order to assist people in choosing a suitable Vietnamese speech recognition service, in this paper, the speech recognizing accuracy and processing time between Web Speech API and FPT.AI API has been compared. 307 audio files containing Vietnamese speeches which are obtained from FPT Open Speech Dataset were chosen to test the accuracy and the processing time of both APIs. For the accuracy test, FPT.AI API was 0.57 more precise than Web Speech API. However, in the processing time test, Web Speech API was 50.99 faster than FPT.AI API. For Web Speech API, it was mostly accurate to process 12�14-second-long audio files, while FPT.AI API did best when process 2�4-second-long audio files. The audio files with duration values between 2 and 8 seconds are optimal for both APIs to proceed with STT conversions. © 2022, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
format Article
author Tran, D.C.
Nguyen, D.L.
Ha, H.S.
Hassan, M.F.
spellingShingle Tran, D.C.
Nguyen, D.L.
Ha, H.S.
Hassan, M.F.
Speech Recognizing Comparisons Between Web Speech API and FPT.AI API
author_sort Tran, D.C.
title Speech Recognizing Comparisons Between Web Speech API and FPT.AI API
title_short Speech Recognizing Comparisons Between Web Speech API and FPT.AI API
title_full Speech Recognizing Comparisons Between Web Speech API and FPT.AI API
title_fullStr Speech Recognizing Comparisons Between Web Speech API and FPT.AI API
title_full_unstemmed Speech Recognizing Comparisons Between Web Speech API and FPT.AI API
title_sort speech recognizing comparisons between web speech api and fpt.ai api
publisher Springer Science and Business Media Deutschland GmbH
publishDate 2022
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85116480522&doi=10.1007%2f978-981-16-2406-3_64&partnerID=40&md5=a1ffad457cfbb1bb01be650287375c0d
http://eprints.utp.edu.my/28893/
_version_ 1741197166628569088
score 11.62408