LSTM BASED MASKED LANGUAGE MODEL USING BERT EMBEDDINGS

SBASSE Home
LSTM BASED MASKED LANGUAGE MODEL USING BERT EMBEDDINGS

Event date:

Oct 15 2021 3:00 pm

LSTM BASED MASKED LANGUAGE MODEL USING BERT EMBEDDINGS

Supervisor

Dr. Agha Ali Raza

Student

Muhammad Ibtissam

Venue

Zoom Meetings (Online)

Event

MS Thesis defense

Abstract:

Most NLP tasks are now solvable by introducing Transformers to the language models. In every LMs the essential part is the context of the word in a sentence to predict or generate with higher accuracies. We propose a single prediction LSTM based conditional Masked Language Model (MLM) with left and right, contextual token embeddings obtained from BERT. It takes sentences and corresponding topical words as input, to generate masked predictions on a randomized location of a sentence. The output of the model is an ID from BERT vocab against the masked token, which is then compared with the True Labels to calculate the test accuracy of the model. We built two LSTM models and were able to outperform the traditional BERT after applying Boosting technique in Model 2. With BERT embeddings our model was almost 100 times smaller than BERT base uncased with a significant difference in model parameters and was able to outperform state-of-the-art MLM accuracy by 4.49%.

Committee:

Dr. Agha Ali Raza (Advisor)

Dr. Asim Karim

Zoom link: https://lums-edu-pk.zoom.us/j/95241422118?pwd=NTZRUXk1UGdEU2RvSlQ4T3BlQVBxQT09

Meeting ID: 952 4142 2118

Passcode: 894508