Arabic Handwritten Text OCR
Abstract:
In the era of modern technology, we need a reliable system to do tasks on real-world data without any manual inspection. Arabic is spoken across nearly 30 countries i.e. Saudi Arabia, UAE, Oman, Egypt etc. all over the world. We aim to work to offer a reliable digitized system as the existing systems works great over scanned documents only and show poor performance on handwritten text. The literature work under interest discusses the challenges/issues regarding Arabic text recognition as eastern languages have more than one writing styles. There are different "Rasm-ul-Khath"(writing script) with respect to the time so it’s very difficult to build a generic system that will be able to recognize on every style. Technology has paved way for immense growth within the area of algorithmic approaches, the significance of machine learning and its underlying algorithms states a constructed framework around data and hence missing values in the dataset will be filled after exploring the exploratory data analysis. We will use existing OCR(Optical Character Recognition system built by NUST) on ”Tesseract” platform to improve its accuracy and apply some modifications to it according to our dataset. We will also apply transfer learning strategy(along HMM OCR techniques) and well-known classification algorithms(KNN, Naive Bayes etc) to existing OCR system to improve its correctness and reliability.
Committee:
- Dr. Agha Ali Raza (Advisor)
- Evaluator: Dr Imdadullah Khan