In recent times, there has been an increased attention towards the spread of communalism and misogyny on social media. The use of a wide range of aggressive and hateful content on social media becomes interesting as well as challenging to study in context to India which is a secular nation with religious as well as linguistic and cultural heterogeneity. The aim of this project is to understand how communal and sexually threatening misogynistic content is linguistically and structurally constructed by the aggressors and harassers and how it is evaluated by the other participants in the discourse. We will use the methods of micro-level discourse analysis, which will be a combination of conversation analysis and the interactional model used for (im)politeness studies, in order to understand the construction and evaluation of aggression on social media.

We will use the insights from this study to develop a system that could automatically identify if some textual content is sexually threatening or communal on social media. The system will use multiple supervised text classification models that would be trained using a dataset annotated at 2 levels with labels pertaining to sexual and communal aggression as well as its evaluation by the other participants. The dataset will contain data in at least two, but not limited to, of the largest spoken Indian languages - Hindi and Bangla – as well as code-mixed content in three languages – Hindi, Bangla and English. It will be collected from both social media (like Facebook and Twitter) as well as comments on blogs and news/opinion websites.

For more details about the project you may visit the Project Website