Aug 3 2021 3:00 pm
A Unsupervised Deep Learning Approach for Hate-Speech Detection in Roman Urdu Text
Dr. Asim Karim
Zoom Meetings (Online)
MS Thesis defense
The measure of text that is created by social media each day is expanding drastically. This enormous volume of for the most part unstructured content can’t be basically handled and seen by computers. Automatic organization of the text is a challenge problem especially in low source languages like Roman Urdu (RU) because of the scarcity of the data, annotated datasets, and language models for this task. In this research we developed a framework to make a representation of Roman Urdu (RU) text using a novel approach of AE(Auto Encoders) named as KATE (K-competitive Autoencoder for TExt) which is builded on the top of K-Spars. Then to cluster those representations using K-mean into hate-speech and non hate-speech class. We conclude that this novel approach named KATE (K-competitive Autoencoder for TExt) trained with BERT (Bidirectional Encoder Representations from Transformers) and word2vec embeddings is more beneficial as compared to trained with word count embedding and that the proposed model exhibits greater robustness as compared to baselines.