Automatic Prediction of the Educational Level of Twitter Users in Mexico

Twitter

Abstract:

To solve the task raised in this work, a series of characteristics were extracted from the textual content of the tweets published by users, which were used to build models based on machine learning, which predict whether a user has a university degree or not. Both were tested with a data set extracted directly from the site, composed of more than one million tweets in Spanish, corresponding to 195 users located in Mexico. With it, experiments were made following a 10-fold cross-validation. The evaluation was performed using the F1 macro metrics and the area under the ROC (AUC) curve. The results indicate that the task is complex, the best characteristics being the abbreviations, which reached values above 60% for both metrics, while the support vector and decision tree machine models showed similar performance.

Paper