Fact Verification in the Urdu Language
Abstract:
In today’s world, social media is the most common source of information for an ordinary person; a significant amount of work has been done to verify that information in English. However, no work has been done for the fact verification problem in Urdu, which millions of people use as a common language. This paper introduces the first fact verification dataset in Urdu language and machine-translated English version generated from Wikipedia essays. The dataset consists of 7,364 manually generated claim and evidence pairs. We propose an XML-R model-based complete pipeline solution after performing different experiments using our train set, XNLI dataset, and FEVER datasets. We achieved a final accuracy of 83.43% on our Urdu dataset and 85.17% on the machine-translated English version. We have also analyzed and compared results of models trained on datasets like FEVER and XNLI datasets first, and how much this approach can help in improving results in a low resource language like Urdu.
Committee:
Dr. Agha Ali Raza (Advisor)
Dr. Asim Karim
Zoom link: https://lums-edu-pk.zoom.us/j/93746342865?pwd=ZEZmc2FvRkhyVEdpa3Y0aFFveXJZZz09
Meeting ID: 937 4634 2865
Passcode: 367262