Optimization of de novo supertranscriptome assembly using RNA-seq data

SBASSE Home
Optimization of de novo supertranscriptome assembly using RNA-seq data

Post Date

Mar 3 2026

Optimization of de novo supertranscriptome assembly using RNA-seq data

Year

2024

Supervisor:

Dr. Aziz Mithani

Students:

Madiha Shabbir

MS/PhD

PhD

Reference / Filters

Life Sciences

Abstract: Transcriptomic studies involving organisms for which reference genomes are not available typically start by generating de novo transcriptome or supertranscriptome assembly from the raw RNA-seq reads. Assembling a (super)transcriptome is, however, a challenging task due to significantly varying abundance of mRNA transcripts, alternative splicing, and sequencing errors. As a result, popular de novo (super)transcriptome assembly tools generate tens or hundreds of partial, broken or mis-assembled contigs leading to decreased assembly accuracy. These assembly errors not only prevent accurate functional annotation but also affect downstream analyses. Commonly available tools for assembly improvement rely primarily on running BLAST using closely related species making their accuracy and reliability conditioned on the availability of the data for closely related organisms. This thesis presents ROAST, a linux-based tool for Reference-free Optimization of Assembled Supertranscriptomes. ROAST is an iterative tool that uses paired-end information of the reads produced from Illumina sequencing platform and error signatures including soft-clips, unexpected expression coverage, and reads with mates unmapped or mapped on a different contig generated by RNA-seq alignment tools to identify and fix various supertranscriptome assembly errors including incomplete and fragmented sequences, false chimera, inversions and translocations and other structural errors using without the aid of a reference sequence. The reference free approach of ROAST makes it highly useful for studies involving non-model organisms where a high quality reference genome or transcriptome is usually not available from closely related species. The performance of ROAST is evaluated by generating and improving de novo super(transcriptome) assemblies of five model organisms including human, mouse, chicken, rice and Arabidopsis as well as using simulated RNA-seq reads and errors generated from reference supertranscriptomes of these model organisms. Results show that ROAST identifies assembly errors with high accuracy in the case of simulated data and significantly improves the assembly quality for model organisms by identifying and fixing various assembly errors. Finally, the thesis also demonstrates the utility of ROAST in refining de novo supertranscriptome assemblies of non-model organisms by improving supertranscriptome assemblies of six non-model organisms including bluefin tuna, camu camu, cotton mealybug, Gilbert’s halosaurid fish, lobster cockroach and vent mussel. ROAST is available for download at https://github.com/azizmithani/roast.

Publication:

Shabbir, M., & Mithani, A. (2024). Roast: a tool for reference-free optimization of supertranscriptome assemblies. BMC bioinformatics, 25(1), 2.

https://link.springer.com/article/10.1186/s12859-023-05614-4