UN ENFOQUE BASADO EN DATOS PARA PREDECIR EVENTOS DELICTIVOS EN CIUDADES INTELIGENTES (A DATA-DRIVEN APPROACH FOR PREDICTING CRIMINAL EVENTS IN SMART CITIES)
Resumen
Resumen
Actualmente, uno los retos de las instituciones gubernamentales es garantizar la seguridad de los habitantes. Este desafío también se presenta en el contexto de ciudades inteligentes, pero con la ventaja de tener sistemas de información de seguridad pública que colectan datos de los eventos delictivos en tiempo real. Por lo cual, se pueden diseñar enfoques basados en técnicas de minería de datos y aprendizaje automático que permitan predecir eventos delictivos basados en datos históricos y en el comportamiento identificados por zonas de una ciudad y en sus habitantes. En este trabajo se presenta un análisis predictivo de eventos criminales utilizando un conjunto de datos que almacena 6.4 millones de registros, colectados por un sistema de información implementado en una ciudad inteligente. El enfoque propuesto permite determinar la etiqueta de una clase de tipo binaria, la cual representa la probabilidad que un individuo sea arrestado al cometer un delito. Además, se realiza una comparación entre dos algoritmos de clasificación de datos: algoritmo de árbol de decisión CART y algoritmo de ensamble AdaBoost, con el fin de determinar qué algoritmo obtiene un mejor rendimiento mediante la métrica de precisión y una validación cruzada. Previamente, en el conjunto de datos se aplica un método de selección de características para disminuir la dimensionalidad de los datos y el costo computacional en la ejecución de los algoritmos de clasificación.
Palabras Claves: Clasificación, Selección de atributos, Ciudades inteligentes, Predicción, Árbol de decisión.
Abstract
Nowadays, one of the challenges of government institutions is to guarantee the safety of the inhabitants. This challenge is also presented in the context of smart cities, but with the advantage of having public security information systems that collect data of criminal events in real time. Therefore, approaches based on data mining techniques and automatic learning can be designed to predict criminal events based on historical data and behavior identified by areas of a city and its inhabitants. This paper presents a predictive analysis of criminal events using a set of data that stores 6.4 million records, collected by an information system implemented in an intelligent city. The proposed approach allows determining the label of a class of binary type, which represents the probability that an individual is arrested when committing a crime. In addition, a comparison is made between two data classification algorithms: CART decision tree algorithm and AdaBoost ensemble algorithm, in order to determine which algorithm obtains better performance through precision metrics and cross-validation. Previously, a feature selection method is applied in the data set to reduce the dimensionality of the data and the computational cost in the execution of the classification algorithms.
Keywords: Classification, Feature selection, Smart cities, Prediction, Decision tree.
Texto completo:
491-506 PDFReferencias
Aalst V., W.M.,Process Mining: Data Science in Action. Springer-Verlag Berlin Heidelberg, 2 edn., ISBN 978-3-662-49851-4, 2016.
Aggarwal C., Data mining, 1st ed. New Delhi: Springer, ISBN 978-3-319-14142-8, 2015.
Bel L., et. Al., CART algorithm for spatial data: Application to environmental and ecological data, Computational Statistics and Data Analysis. 3082-3093, 2009, https://doi.org/10.1016/j.csda.2008.09.012, 2009.
Bolón-Canedo V., Sanchez-Moroño N., Alonso-Betanzos A., Feature Selection for High-Dimensional Data, Springer International Publishing, ISBN 978-3-319-21858-8, 2015.
Chu J., Lee T., Ullah A., Component-wise AdaBoost algorithms for high-dimensional binary classification and class probability prediction, Handbook of Statistics, https://doi.org/10.1016/bs.host.2018.10.003, 2018.
GUYON I., WESTON J., BARNHILL S., Gene Selection for Cancer Classification using Support Vector Machines, Machine Learning,46,389-422, https://doi.org/10.1023/A:1012487302797, 2002.
Mohanty M., Sahoo S., Biswal Pradyut., Sabut S., Efficient classification of ventricular arrhythmias using feature selection and C4.5 classifier, Biomedical Signal Processing and Control, 44,200-208, https://doi.org/10.1016/j.bspc.2018.04.005,2018.
Kadar C., Pletikosa I., Mining large-scale human mobility data for long-term crime prediction, EPJ Data Science, https://doi.org/10.1140/epjds/s13688-018-0150-z, 2018.
Kai-Quan S., Chong-Jin O., Xiao-Ping L., Zhen H., Wilder-Smith E. P.V., A Feature Selection Method for Multilevel Mental Fatigue EGG Classification, IEEE Transaction on Biomedical Engineering, 54, 1231-1237, 10.1109/TBME.2007.890733, 2007.
Kuman R., Nagpal B., Analysis and prediction of crime patters using big data, International Journal of Information Technology, https://doi.org/10.1007/s41870-018-0260-7, 2018.
Liu, D., Huang, R. and Wosinski, M., Smart learning in smart cities. 1st ed. Springer Singapore, pp.18-19, https://doi.org/10.1007/978-981-10-4343-7, 2017.
Panigrahi R., Borah S., Rank Allocation to J48 of Decision Tree Classifiers using Binary and Multiclass Intrusion Detection Datasets, International Conference on Computational Intelligence and Data Science, 132, 323-332, https://doi.org/10.1016/j.procs.2018.05.186,2018.
Pelton, J. and Singh, I., Smart cities of today and tomorrow. 1st ed. Springer International Publishing AG, part of Springer Nature, https://doi.org/10.1007/978-3-319-95822-4, 2019.
Solorio-Fernandez S., Carrasco-Ochoa J. A., Martínez-Trinidad J. F., A review of unsupervised feature selection methods, Artificial Intelligence Review, https://doi.org/10.1007/s10462-019-09682-y, 2019.
Sosa C., Tello E., Lara D., Mata J., A Metodology Based on Model-Driven Enginnering for IoT Application Development, The Twelfth International Conference on Digital Society and Governments, ISBN: 978-1-61208-615-6, 2018.
Ramírez-Gallego S., Mouriño-Talín H., Martínez-Rego D., Bolón-Canedo V., Manuel Benítez J., Alonso-Betanzos A., Herrera F., An Information Theory-Based Feature Selection Framework for Big Data Under Apache Spark, IEEE Transaction on Systems Man and Cybernetics Systems,48, 10.1109/TSMC.2017.2670926,2018.
Rutkowski L., Jaworski M., Pietruczuk L., Duda P., The CART decision tree for mining data streams, Information Sciences,266,1-15, https://doi.org/10.1016/j.ins.2013.12.060,2014
Vomfell L., Härdle W. and Lessmann S., improving crime count forecasts using Twitter and taxi data, Decision Support Systems, vol. 113, pp. 73-85, https://doi.org/10.1016/j.dss.2018.07.003, 2018.
Witten I., Frank E., Hall M., Data mining, 3rd ed. Burlington, Mass.: Morgan Kaufmann Publishers, https://doi.org/10.1016/C2009-0-19715-5,2011.
Yang S., Gou, J., Jin J., An improved Id3 algorithm for medical data classification, Computers and Electrical Engineering, 1-14, https://doi.org/10.1016/j.compeleceng.2017.08.005, 2017.
Yu F., Li G., Chen H., Guo Y., Yuan Y., Coulton B., A VRF Charge Diagnosis Method based on Expert Modification C5.0 Decision Tree, International Journal of Refrigeration, https://doi.org/10.1016/j.ijrefrig.2018.05.034, 2018.
You W., Yang Z., Ji G., PLS-based recursive feature elimination for high-dimensional small sample, Knowledge-Based Systems, 55,15-28, https://doi.org/10.1016/j.knosys.2013.10.004,2014.
Youn E., Jeong M., Class dependent feature scaling method using naïve Bayes classifier for text datamining, Pattern Recognition Letters,30,477-485, https://doi.org/10.1016/j.patrec.2008.11.013, 2009.
Zhou Q., Zhou H., Zhou Q., Yang f., Lou L., Structure damage detection based on random forest recursive feature elimination, Mechanical Systems and Signal Processing, 46,82-90, https://doi.org/10.1016/j.patrec.2008.11.013, 2014.
URL de la licencia: https://creativecommons.org/licenses/by/3.0/deed.es
Pistas Educativas está bajo la Licencia Creative Commons Atribución 3.0 No portada.
TECNOLÓGICO NACIONAL DE MÉXICO / INSTITUTO TECNOLÓGICO DE CELAYA
Antonio García Cubas Pte #600 esq. Av. Tecnológico, Celaya, Gto. México
Tel. 461 61 17575 Ext 5450 y 5146
pistaseducativas@itcelaya.edu.mx