
An Integrated In silico Approach to Characterize Proteoforms using Sequence and Structure Data
In the second phase of this work, we have employed sequence-based information for developing a novel protein structural proteomics workflow. For that, we have obtained ~200 sequences of Hepatitis C Virus (HCV) NS3/4A of Genotype 3 and modeled each mutational variant for onwards pharmacophore-based virtual screening (PBVS) followed by covalent docking. We then targeted the predicted structure using >100 ligands which were selected after screening small molecule databases including MolPort, ChEMBL, DrugBank, ZINC, PubChem, and Mcule. Mutations at 14 positions within the HCV NS3/4A G3 ligand-binding pocket including F43L, H57R, Q80K, R123T/S, I132L, Y134C/R/S, S139P, R155G, A156T, V158A, C159V, D168Q, C525W/Y, and Q526H/R were evaluated. Two mutations were identified within the catalytic triad (H57R and S139P). We applied several in silico methods to investigate the mutagenic variations in the binding pocket of Genotype 3 (G3) HCV NS3/4A and evaluated ligands towards its efficacious inhibition. cpd-217 (CHEMBL569970; PubChem45485999) was identified as a potential covalent inhibitor of Ser139 containing a chemical warhead. The hit established a covalent bond (C–S) with the reactive Ser139 forming favorable interactions with ligand-binding residues. The binding stability of cpd-217 was then confirmed by molecular dynamic simulation followed by MM/GBSA binding free energy calculation. The free energy decomposition analysis indicated that the resistant mutants alter the HCV NS3/4A-ligand interaction, resulting in an unbalanced energy distribution within the binding site leading to drug resistance. cpd-217 was identified to interact with all NS3/4A G3 variants with significant covalent docking scores ranging between -6.5 to -4.1 kcal/mol. We concluded that cpd-217 is a potential inhibitor of HCV NS3/4A G3 variants that warrants further in vitro and in vivo studies. The study will pave the way for drug design and development of HCV G3 NS3/4A, which has a high prevalence in developing countries, including Pakistan.
Towards a translational application of the pipeline, a metaproteomics case study was developed for profiling microbial and heavy metal contamination of Hudiara drain (a large wastewater channel) in Lahore, Pakistan. Profiling of microbiota within these water samples revealed the presence of bacterial and fungal species including Bacillus, Exiguobacterium, Aspergillus, and Penicillium. These species have been previously reported to be resistant to environmental stresses including high levels of heavy metal concentrations.
In conclusion, this thesis proposes a multi-level proteomics pipeline that brings together different computational proteomics approaches towards the sequence-structure-function analysis of proteins. The proposed pipeline paves way for developing a next-generation integrative in silico platform for an enhanced identification, and characterization of proteoforms.
Final Defense Committee Members:
1. Dr. Basit Shafiq (Chair & Associate Professor, External Thesis Committee Member)
Syed Babar Ali School of Science and Engineering, Department of Computer Science, Lahore University of Management Sciences (LUMS), Pakistan.
2. Dr. Shamshad Zarina (Visiting Professor, External Thesis Committee Member)
Dr. Zafar H. Zaidi Center for Proteomics, University of Karachi, Pakistan.
3. Dr. Shaper Mirza (Associate Professor, Thesis Committee Member)
Syed Babar Ali School of Science and Engineering, Department of Life Sciences, Lahore University of Management Sciences (LUMS), Pakistan.
4. Dr. Muhammad Tariq (Associate Professor, Thesis Committee Member)
Syed Babar Ali School of Science and Engineering, Department of Life Sciences, Lahore University of Management Sciences (LUMS), Pakistan.
5. Dr. Safee Ullah Chaudhary (Associate Professor, PhD Supervisor)
Syed Babar Ali School of Science and Engineering, Department of Life Sciences, Lahore University of Management Sciences (LUMS), Pakistan.
Publications:
1. Basharat, Abdul Rehman, Kanzal Iman, Muhammad Farhan Khalid, Zohra Anwar, Rashid Hussain, Humnah Gohar Kabir, Maria Tahreem et al. "SPECTRUM–A MATLAB toolbox for proteoform identification from top-down proteomics data." Scientific reports 9, no. 1 (2019): 1-14.
2. Khalid, Muhammad Farhan, Kanzal Iman, Amna Ghafoor, Mujtaba Saboor, Ahsan Ali, Urwa Muaz, Abdul Rehman Basharat et al. "PERCEPTRON: An open-source GPU-accelerated proteoform identification pipeline for top-down proteomics." Nucleic Acids Research (2021).
3. Kanzal Iman, Kyung-Hoon Kwon, Sung Hwan Kim, Kyu Hwan Park, Manhoi Hur, Yong Seong Cho, Hyun Sik Kim et al. “De novo-based Complementary Ion Search (COINS) Algorithm for Enhanced Proteoform Identification and Characterization in Top-Down Proteomics.” Proteomics. (In Review)
4. Ashraf, Muhammad Usman, Kanzal Iman, Muhammad Farhan Khalid, Hafiz Muhammad Salman, Talha Shafi, Momal Rafi, Nida Javaid et al. "Evolution of efficacious pangenotypic hepatitis C virus therapies." Medicinal research reviews 39, no. 3 (2019): 1091-1136.
5. Kanzal Iman, Muhammad Usman Mirza, Fazila Sadia, Matheus Froeyen, Safee Ullah Chaudhary. “An integrative pharmacophore-based screening, covalent docking, molecular dynamics and MM-GBSA approach reveals a covalent inhibitor for targeting drug-resistant Genotype 3 variants of Hepatitis C Viral NS3/4A serine protease”. PLOS ONE. (In Review)
6. Ambreen Sabir, Zainab Nasir, Zonaira Khalid, Hafiz Muhammad Salman, Muhammad Farhan Khalid, Muhammad Burhan Khalid, Fatima Arshad, Kanzal Iman et al. “Profiling microbial and heavy metal contamination of Hudiara drain and its adjoining areas in Lahore, Pakistan.” Chemosphere. (In Review)