Creation of Urdu WordNet
Abstract:
Urdu is a language of around 170 million speakers and is regarded as the 11th most spoken language in the world. Due to a high presence of Urdu speakers across the globe, there is an immense need to develop language resources of Urdu. Apart from that, it is a low resource language, having a very small research contributions towards the development of language tools. Despite of lack of language resources, we aim to create a foundation to create WordNet for Urdu language. We studied different approaches that had been adopted in past for WordNet creation and we employed a translation based technique to create a dataset of Urdu synonyms, antonyms and hypernyms. We have incorporated Google Translation API and nltk’s WordNet as our prime resources and created a WordNet of around 7000 unique words. Although, this WordNet has a significant room for further development and improvement in future. We have also presented techniques and ideas which can be followed in future to increase the WordNet size and improve its results.
Committee:
Dr. Agha Ali Raza (Advisor)
Dr. Asim Karim
Zoom link: https://lums-edu-pk.zoom.us/j/95241422118?pwd=NTZRUXk1UGdEU2RvSlQ4T3BlQVBxQT09
Meeting ID: 952 4142 2118
Passcode: 894508