• 대한전기학회
Mobile QR Code QR CODE : The Transactions of the Korean Institute of Electrical Engineers
  • COPE
  • kcse
  • 한국과학기술단체총연합회
  • 한국학술지인용색인
  • Scopus
  • crossref
  • orcid
Title A Performance Comparison of Commercial Speech Recognition APIs in Noisy Environments
Authors 이건희(Geonhui Lee) ; 이상화(Sanghwa Lee) ; 지수환(Suhwan Ji) ; 김아욱(Auk Kim) ; 임현승(Hyeonseung Im)
DOI https://doi.org/10.5370/KIEE.2022.71.9.1266
Page pp.1266-1273
ISSN 1975-8359
Keywords Speech recognition; Noisy environment; Word error rate; Character error rate
Abstract This paper compares the performance of five commercial speech recognition APIs under noisy environments, namely those provided by Amazon AWS, Microsoft Azure, Google, Kakao, and Naver. To this end, we used an open dataset for development and evaluation of multi-channel noise processing technology provided in AI Hub. We tested each API’s performance with respect to the speaker’s gender and location and the speech content, and measured their error rate using both word error rate (WER) and character error rate (CER). Except for the AWS API, the error rate was higher when tested with female’s data than male’s one, and when tested with the data recorded from the side than the front. The error rate was also relatively high when the test sentences contained proper nouns such as person’s names and local names, and the shorter the sentences, the higher the error rate.
Moreover, the Google API outperformed all the others in terms of both WER and CER, with 53% and 18% of error rate, respectively