Event date:
Oct 13 2021 3:00 pm

Transliteration and Translation of code-mix Language (Roman Urdu and English) to Urdu Through Deep Learning Techniques

Dr. Asim Karim
Atta ur Rehman Shah
Zoom Meetings (Online)
MS Thesis defense


Among the popular languages of the subcontinent, Urdu lies in the top tier. Each day millions of text instances are generated in on social media platforms and through messaging applications. In Pakistan, due to diverse cultures of language, People speak different language according to that area. Consequently, social media users often combine English with their mother tongue language for communication. An amalgamation of different languages in the conversation results in difficult to understand. Currently, there is no parallel corpus of such code mix language publicly available on the web. there is no parallel corpus of such code mix language publicly available on the web. In this work, we have created a novel dataset through Twitter API. We have presented a number of different deep learning techniques for transliteration and translation of code mix language (Roman and English) to Urdu script. We have presented 3 different transformers and through ensemble learning, we combine them together to get good results. Results show ensemble learning on transformers performs better than Seq2seq LSTM and MT5 Model.


Dr. Asim Karim (Advisor),

Dr. Agha Ali Raza

Zoom link: https://lums-edu-pk.zoom.us/j/93746342865?pwd=ZEZmc2FvRkhyVEdpa3Y0aFFveXJZZz09

Meeting ID: 937 4634 2865

Passcode: 367262