PROTOTIPO FUNCIONAL PARA CLASIFICACIÓN DE IMÁGENES CON SALIDA DE AUDIO EN UN SISTEMA EMBEBIDO CON RED NEURONAL CONVOLUCIONAL (FUNCTIONAL PROTOTYPE FOR CLASSIFICATION OF IMAGES WITH AUDIO OUTPUT IN AN EMBEDDED SYSTEM USING CONVOLUTIONAL NEURAL NETWORK)

Fidel López Saca, Andrés Ferreyra Ramírez, Carlos Avilés Cruz

Resumen


En los últimos años, las redes neuronales convolucionales, han tenido una gran popularidad en aplicaciones de clasificación de imágenes, principalmente porque superan en rendimiento a los algoritmos tradicionales. Sin embargo, su alto costo computacional complica su implementación en sistemas embebidos con pocos recursos como las Raspberry Pi 3. Para superar este problema, se puede hacer uso del “Neural Compute Stick”, un dispositivo desarrollado recientemente, que integra una GPU en la que se puede cargar una red neuronal convolucional pre-entrenada. En este artículo se presenta un prototipo basado en la Raspberry Pi 3, que realiza clasificación de imágenes con reproducción de audio. La clasificación se realizada con la red GoogleNet, la cual es entrenada fuera de línea, implementada en un NCS e integrada a la tarjeta Raspberry Pi 3. En el sistema propuesto, la imagen que ingresa a través de una cámara web, es clasificada y etiquetada con la red convolucional y finalmente la etiqueta es traducida en audio por el sistema embebido para describir el objeto encontrado en la imagen.

In recent years, convolutional neural networks (CNN) have become very popular in image classification applications, mainly because they outperform traditional algorithms in performance. However, its high computational cost complicates its implementation in embedded systems with few resources such as Raspberry Pi 3. To overcome this problem, a "Neural Compute Stick" (NCS) can be used, which integrates a GPU. In the NCS can be loaded a pre-trained convolutional neural network. This article presents a prototype based on a Raspberry Pi 3, which performs image classification with audio reproduction. The classification is done through a GoogleNet net, which is trained offline, implemented in the NCS and integrated with the Raspberry Pi 3 card. In the proposed system, the image that enters through a webcam is classified and tagged with the CNN. Finally, the tag is translated into an audio file to be heard.


Texto completo:

804-821 PDF

Referencias


Caffe: http://caffe.berkeleyvision.org/, Agosto 2018.

Ciresan, D, C., Meier, U., Masci, J., Gambardella, L., Schumidhuber, J., Flexible high performance convolutional neural networks for image classification. In Proceeding of the Twenty-Second International Joint Conference on Artificial Intelligence, Vol. 2, 1237-1242, 2011.

eSpeak: http://espeak.sourceforge.net/commands.html, Abril 2018.

Everingham, M. and Van~Gool, L. and Williams, C. K. I. and Winn, J. and Zisserman. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results:http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html, Abril 2018

Farabet, C., Martini, B., Akselrod, P., Talay, S., LeCun, Y., Culurciello E., Hardware accelerated convolutional neural networks for synthetic vision systems. In Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), 257-260, 2010.

Girshick, R., Donahue, J., Darrell, T., Malik, J., Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 580-587, 2014.

He, K., Zhang, X., Ren, S., Sun, J., Deep residual learning for image recognition. In Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778, 2016.

Howard, A., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H., MobileNet: Efficient Convolutional Neural Networks for Mobile Vision Applications. CoRR, vol. abs/1704.04861, 2017.

Karpathy, A., Fei-Fei, L., Deep visual-semantic alignments for generating image descriptions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3128-3137, 2015.

Krizhevsky, A., Sutskever, I., Hilton, G., ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, 1097-1105, 2012.

LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., Gradient-based learning applied to document recognition. In Proceedings of the IEEE, Vol. 86, Issue 11, 2278-2324, 1998.

Mollahosseini, A., Chan, D., Mahoor, M, H., Going deeper in facial expression recognition using deep neural networks. In Proceeding IEEE Conference on Applications of Computer Vision (WACV), 1-10, 2016.

Neural Compute Application Zoo (NC App Zoo): https://github.com/movidius/ncappzoo/, Abril 2018.

NC SDK, Intel® Movidius™: https://movidius.github.io/ncsdk/tf_compile_guidance.html, Mayo 2018.

NCS Quick Star, Intel® Movidius™: https://developer.movidius.com/start, Marzo 2018.

Peemen, M., Setio, A., Mesman, B., Corporaal, H., Memory-centric accelerator design for convolutional neural networks. 31st International Conference on Computer Design, 13-19, 2013.

Prasoon, A., Petersen, K., Igel, C., Lauze, F., Dam, E., Nielsen, M., Deep feature learning for knee cartilage segmentation using a triplanar convolutional neural network. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 246-253, 2013.

Raspberry Pi: https://www.raspberrypi.org/products/raspberry-pi-3-model-b/, Marzo 2018.

Sankaradas, M., Jakkula, V., Cadambi, S., Chakradhar, S., Durdanovic, I., Cosatto, E., Graf, H., A massively parallel coprocessor for convolutional neural networks. 20th IEEE International Conference on Application-specific Systems, Architectures and Processors, 53-60,2009.

Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y., OverFeat: Integrated recognition, localization and detection using convolutional networks. Journal: arXiv preprint arXiv:1312.6229, 2014, Online, Available: https://arxiv.org/pdf/1312.6229.pdf.

Stanford Vision Lab, Stanford University, Imagenet. http://imagenet.org, Octubre 2017.

Strigl, D., Kofler, K., Podlipnig, S., Performance and scalability of gpu-based convolutional neural networks. In Proceeding 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), 317-324, 2010.

Suriyal S., Druzgalski C., Gautam K. Mobile assisted diabetic retinopathy detection using deep neural network. 2018 Global Medical Engineering Physics Exchanges/Pan American Health Care Exchanges (GMEPE/PAHCE), Marzo 2018.

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A. Going Deeper with Convolutions. In Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June, 2015.

Tafjira, N, B., Shun-Feng, S., Towards self-driving car using convolutional neural network and road lane detector. 2nd International Conference on Automation, Cognitive Science, Optics, Micro Electro-­Mechanical System, and Information Technology (ICACOMIT), 65-69, 2017.

TensorFlow: https://www.tensorflow.or/, Agosto 2018.

Vasilache, N., Johnson, J., Mathieu, M., Chintala, S., Piantino, S., LeCun, Y., Fast convolutional net with fbfft: A gpu performance evaluation. arXiv preprint arXiv:1412.7580, 2014, Online, Available: https://arxiv.org/pdf/1412.7580.pdf.

Xu, X., Amaro, J., Caulfield, S., Forembski, A., Falcao, G., and Maloney, D. Convolutional Neural Network on Neural Compute Stick for Voxelized Point-clouds Classification. 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), 2017.






URL de la licencia: https://creativecommons.org/licenses/by/3.0/deed.es

Barra de separación

Licencia Creative Commons    Pistas Educativas está bajo la Licencia Creative Commons Atribución 3.0 No portada.    

TECNOLÓGICO NACIONAL DE MÉXICO / INSTITUTO TECNOLÓGICO DE CELAYA

Antonio García Cubas Pte #600 esq. Av. Tecnológico, Celaya, Gto. México

Tel. 461 61 17575 Ext 5450 y 5146

pistaseducativas@itcelaya.edu.mx

http://pistaseducativas.celaya.tecnm.mx/index.php/pistas