IMPACTO DEL DESEQUILIBRIO DE CLASES EN EL ENTRENAMIENTO DE REDES NEURONALES CONVOLUCIONALES EN PROBLEMAS MULTI-CLASE (IMPACT OF CLASS IMBALANCE IN THE TRAINING OF CONVOLUTIONAL NEURAL NETWORKS FOR MULTI-CLASS PROBLEMS)

Andrés Ferreyra Ramírez, Eduardo Rodríguez Martínez

Resumen


El problema del desequilibrio de clases en el aprendizaje automático, se presenta cuando el conjunto de entrenamiento subyacente está compuesto por un número desigual de muestras para cada clase, lo que ocasiona que datos de algunas clases dominen claramente. Aparentemente, la mayoría de los modelos clasificadores aprenden a clasificar dichos conjuntos de datos; sin embargo, presentan un rendimiento de generalización deficiente debido a un fuerte sesgo hacia las clases mayoritarias. En este artículo, se presenta un estudio sistemático dirigido a comprender como afecta el problema del desequilibrio de clases al rendimiento de una red neuronal convolucional entrenada para una tarea de clasificación de imágenes, y se presenta una metodología para corregir el sobreentrenamiento e incrementar la generalización de la red.

The class imbalance problem in machine learning occurs when the underlying training set is composed of unequal number of samples for each class, which causes data from some classes to clearly dominate. Apparently, most classifiers learn to classify such datasets, however, they show poor generalization performance due to a strong bias towards the majority classes. This article presents a systematic study aimed at understanding how the class imbalance problem affects the performance of a convolutional neural network which has been trained for an image classification task. Also, we present a methodology to correct the overtraining and increase the generalization performance of the network.


Texto completo:

445-460 PDF

Referencias


Ando, S., Huang, C., Deeper over-sampling framework for classifying imbalance data. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD’17), 770-785, 2017.

Buda, M., Maki, A., Mazurowski, M., A systematic study of the class imbalance problem in convolutional neural networks. Computer Vision and Pattern Recognition, journal (CoRR), arXiv preprint arXiv:1710.05381, 2017.

Caltech101, Computational Vision at Caltech: http://www.vision.caltech.edu/Image_Datasets/Caltech101/, Junio, 2018.

Chawla, N., Bowyer, K., Kegelmeyer, W., SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, Vol. 16, 321-357, 2002.

Dong, Q., Gong, S., Zhu, X., Class rectification hard mining for imbalanced deep learning. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17), 1869-1878, 2017.

Dong, Q., Gong, S., Zhu, X., Imbalanced deep learning by minority class incremental rectification. IEEE Transactions on Pattern Analysis and Machine Intelligence, Early Access, 2018.

He, H., Garcia, E., Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, Vol. 21, Issue 9, 1263-1284, 2009.

Hensman, P., Masko, D., The impact of imbalanced training data for convolutional neural networks. Degree project, 2015.

Huang, C., Li, Y., Loy, C., Tang, X., Learning deep representation for imbalanced classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16), 5375-5384, 2016.

Khan, S., Hayat, M., Bennamoun, M., Sohel, F., Tongeri, R., Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Transactions on Neural Networks and Learning Systems, Vol. 29, Issue 8, 3573-3587, 2018.

Pulgar, F., Rivera, A., Charte, F., del Jesus, M., On the impact of imbalanced data in convolutional neural networks performance. In Proceedings of the International Conference on Hybrid Artificial Intelligence Systems (HAIS’17), 220-232, 2017.

Song, J., Shen, Y., Jing, Y., Song, M., Towards deeper insights into deep learning from imbalanced data. In Proceedings of the Chinese Conference on Computer Vision (CCCV’17), 674-684, 2017.

Srivastava, N., Hilton, G., Krizhevvsky, A., Sutskever, I., Salakhutdinov, R., Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, Vol. 15, 1929-1958, 2014.

Sze-To, A., Wong, A., A weight-selection strategy on training deep neural networks for imbalanced classification. In Proceedings of the International Conference on Image Analysis and Recognition (ICAR’2017), 3-10, 2017.

Wang, S., Liu, W., Wu, J., Cao, L., Meng, Q., Kennedy, P., Training deep neural networks on imbalanced data sets. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’16), 4368-4374, 2016.

Wang, S., Yao, X., Multiclass imbalance problems: Analysis and potential solutions. IEEE Transactions on Systems, Man, and Cybernetics – Part B: Cybernetics, Vol. 42, Issue 4, 1119-1130, 2012.






URL de la licencia: https://creativecommons.org/licenses/by/3.0/deed.es

Barra de separación

Licencia Creative Commons    Pistas Educativas está bajo la Licencia Creative Commons Atribución 3.0 No portada.    

TECNOLÓGICO NACIONAL DE MÉXICO / INSTITUTO TECNOLÓGICO DE CELAYA

Antonio García Cubas Pte #600 esq. Av. Tecnológico, Celaya, Gto. México

Tel. 461 61 17575 Ext 5450 y 5146

pistaseducativas@itcelaya.edu.mx

http://pistaseducativas.celaya.tecnm.mx/index.php/pistas