SISTEMA DE RECONOCIMIENTO DE VOZ BASADO EN UN MÉTODO DE APRENDIZAJE SUPERVISADO Y LA CORRELACIÓN DE PEARSON (K-NN ALGORITHM AND PEARSON CORRELATION-BASED A VOICE RECOGNITION SYSTEM)

Anel Ramírez Álvarez; Luz A. Sánchez Gálvez; Mario Anzures García; Sully Sánchez Gálvez; Mariano Larios Gómez

SISTEMA DE RECONOCIMIENTO DE VOZ BASADO EN UN MÉTODO DE APRENDIZAJE SUPERVISADO Y LA CORRELACIÓN DE PEARSON (K-NN ALGORITHM AND PEARSON CORRELATION-BASED A VOICE RECOGNITION SYSTEM)

Anel Ramírez Álvarez, Luz A. Sánchez Gálvez, Mario Anzures García, Sully Sánchez Gálvez, Mariano Larios Gómez

Resumen

Resumen
El reconocimiento automático de voz es una disciplina de la inteligencia artificial, que tiene como objetivo permitir la comunicación hablada entre seres humanos y computadoras. Este artículo propone un sistema de reconocimiento de voz, basado en la extracción de características distintivas de la voz y el método de aprendizaje supervisado, denominado algoritmo k-NN (k-Nearest Neighbors), que requiere del entrenamiento del sistema. Así como se plantea calcular automáticamente por medio de la correlación de Pearson, para que el sistema de reconocimiento de voz sea más del algoritmo k-NN. Finalmente, se evalúa el sistema con voces de personajes conocidos para centrarse en la eficiencia del sistema.
Palabras Clave: Algoritmo K-vecinos más cercanos, correlación de Pearson, entrenamiento, extracción de características, sistema de reconocimiento de voz.

Abstract
Automatic speech recognition or automatic voice recognition is a discipline of artificial intelligence, which aims to allow spoken communication between humans and computers. This paper proposes a speech recognition system, based on the extraction of distinctive characteristics of the voice, and the k-NN (k-Nearest Neighbors) algorithm, which requires training of the system. As well as, it, presents the calculation of through Pearson's correlation, in this way k will not be fixed, and the speech recognition will be most efficient. Finally, the system is evaluated; by using known characters for it focuses on the efficiency of such system.
Keywords: Feature extraction, k-Nearest neighbors Pearson correlation, training, voice recognition system.

Texto completo:

743-764 PDF

Referencias

About I, and Denis V. Historia de la identificación de las personas, 2011.

Aguirrezabala M. Estudio de verificación biométrica de voz. Tesis de Maestría, 2015.

Arias A., Rubiano D. Método automático de reconocimiento de voz para la clasificación de vocales al lenguaje de señas colombiano, Disertación, 2018.

Big, Aproximación de Big Data a las Colecciones Musicales. 5to Congreso Nacional de Ingeniería, Informática/Sistemas de Información. Aplicaciones Informáticas y de Sistemas de Información. Noviembre 2017.

Chu S., Narayanan S. and Jay Kuo C. Environmental sound recognition with time-frequency audio features. IEEE Trans. Audio, Speech and Lang. pp. 1142-1158, 2012.

Fix E., Hodges, J. L. An Important Contribution to Nonparametric Discriminant Analysis and Density Estimation: Commentary on Fix and Hodges (1951). International Statistical Review / Revue Internationale de Statistique 57 (3): 233-238, 1989.

Furui S. Talker recognition by long time averaged speech spectrum Electronics and Communications, Japan, 1972.

Juang B. H. The Past, Present, and Future of Speech Processing, IEEE Signal Processing Magazine, mayo 1998.

Li-Chun W., (2020). An Industrial-Strength Audio Search Algorithm. Shazam Entertainment, Ltd. Disponible en:

https://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf.

Matsui T., Furui S. Concatenated phoneme models for text-variable speaker recognition, Proc. ICASSP,1993.

MIR, (2020). Music Information Retrieval: Part 2. Feature Extraction. Alexander Schindler.

http://www.ifs.tuwien.ac.at/~schindler/lectures/MIR_Feature_Extraction.html.

Ortega M. Introducción a la biometría. técnicas avanzadas de procesado de imagen, 2013.

Rabiner L. R., Juang B. H. Fundamentals of Speech Recognition, Prentice-Hall, Englewood Cliffs, 1993.

Salamón J., Gómez E., Bonada J. Sinusoid Extraction and salience function design for predominant melody stimation.Music Technology Group Universitat Pompeu Fabra, Barcelona, 2011.

Tordera J. C., (2101). Lingüística computacional. Tratamiento del habla. Valencia: Universtitat de València.

https://es.wikipedia.org/wiki/Reconocimiento_del_habla.

Weisstein E. W., Fast Fourier Transform. Weisstein, Eric W., ed. MathWorld Wolfram Researc, 2015.

URL de la licencia: https://creativecommons.org/licenses/by/3.0/deed.es

Barra de separación

Pistas Educativas está bajo la Licencia Creative Commons Atribución 3.0 No portada.

TECNOLÓGICO NACIONAL DE MÉXICO / INSTITUTO TECNOLÓGICO DE CELAYA

Antonio García Cubas Pte #600 esq. Av. Tecnológico, Celaya, Gto. México

Tel. 461 61 17575 Ext 5450 y 5146

pistaseducativas@itcelaya.edu.mx

http://pistaseducativas.celaya.tecnm.mx/index.php/pistas

Nombre de usuario/a
Contraseña
No cerrar sesión