Towards Robust NLP

Event date:

Nov 12 2021 4:00 pm

Towards Robust NLP

Speaker(s)

Dr. Abdul Rafae Khan

Venue

Zoom/Online

Abstract

My research primarily aims at general robust Natural Language Processing (NLP) that is agnostic to the test domain. NLP has achieved high-quality benchmarks with closed datasets such as WMT and NIST but can fail when the input contains noise due to, for example, mismatched domains or spelling errors. My talk will focus on using alternative forms of textual representations during the model training when additional data is absent. I will propose several methods which simplify the input representation as well as decrease the ambiguity. I will introduce an initial investigation of a statistical grouping method for lexical normalization, which uses string, phonetic, and semantic similarity features. Then, I will describe how this idea was extended to create grouping based on phonetic and logogram features as alternative representations for Neural Network-based NLP. Finally, I will talk about how the semantic diversity within these groupings is utilized to propose artificial methods which force diversity within a group. I will discuss three diverse grouping methods and their application on three NLP tasks, showing a significant performance boost on the in-domain as well as noisy test data.

Dr. Abdul Rafae Khan will be talking about “Towards Robust NLP” in the next Computer Science (CS) Research Seminar on Friday, November 12, 2021 at 4pm PKST.

Join us via Zoom meeting link: https://lums-edu-pk.zoom.us/j/8353602730?pwd=emFaM21hNGtWNHhHZ0dmZm5ueFZqUT09

About the speaker:

Dr. Abdul Rafae Khan has been working as a Postdoctoral Fellow at Stevens Institute of Technology in New Jersey since 2020. He completed his Ph.D. from the City University of New York in Robust Neural Machine Translation. Dr. Khan holds a Master's degree from Lahore University of Management Sciences. His research interests are in Neural Network-based Natural Language Processing with a focus on improving the quality and robustness of noisy/informal text (commonly used in social media and text messages). His research has been published in leading Natural Language Processing venues including JNLE, EMNLP, and NAACL. He has participated in and won Natural Language Processing competitions including WMT 2018 Biomedical Machine Translation Task and SemEval 2019 Cross-lingual Semantic Parsing task. The SemEval organizers awarded Dr. Khan a student scholarship. Apart from his research duties, Dr. Khan has also supervised several undergraduate and graduate students in their Bachelor's and Master's theses respectively.