RNA-Atlas assembles comprehensive knowledge on human transcriptome

Together with Ghent University, Amsterdam University of Medicine, National Chiao Tung University and Illumina, researchers at Baylor College of Medicine have built one of the most comprehensive catalogs of the human transcriptome ever. By combining complementary sequencing techniques, they have deepened our understanding of the function of known RNA molecules and discovered thousands of new RNAs.

Their research, published in Nature Biotechnology, is the result of more than five years of work to further unravel the complexity of the human transcriptome. A better understanding of our transcriptome is essential to study disease processes and uncover novel genes that may serve as therapeutic targets or biomarkers.

RNAs in all shapes and sizes

The transcriptome is the sum of all RNA molecules that are transcribed from the DNA strands that make up our genome. However, there is not a one-for-one relationship. Firstly, each cell and tissue have unique transcriptomes, with varying RNA production and compositions, including tissue-specific RNAs. Secondly, not all RNAs are transcribed from typical, protein coding genes that eventually produce proteins. Many of our RNA molecules are not used as a template to build proteins. They originate from what once was called junk DNA, or long sequences of DNA with unknown functions.

These non-coding RNAs (ncRNAs) come in all kinds of shapes and sizes: short, long and even circular RNAs. Many of them even lack the tail of adenine molecules that is typical for protein-coding RNAs.

300 human cell and tissue types and three sequencing methods

“There have been other projects to catalog our transcriptome but the RNA-Atlas project is unique because of the applied sequencing methods,” said Dr. Pieter Mestdagh, professor at the Center for Medical Genetics at Ghent University. “Not only did we look at the transcriptome of as many as 300 human cell and tissue types but, most importantly, we did so with three complementary sequencing technologies, one aimed at small RNAs, one aimed at polyadenylated (polyA) RNAs, and a technique called total RNA sequencing.”

This last sequencing technology led to the discovery of thousands of novel non-coding RNA genes, including a novel class of non-polyadenylated single-exon genes and many new circular RNAs. By combining and comparing the results of the different sequencing methods, the researchers were able to define for every measured RNA transcript, the abundance in the different cells and tissues, whether it has a polyA-tail or not (it appears that for some genes this can differ from cell type to cell type), and whether it is linear or circular. Moreover, the consortium searched and found important clues in determining the function of some of the ncRNAs. By looking at the abundance of different RNAs in different cell types they found correlations that indicate regulatory functions and could determine whether this regulation happens on the transcription level (by preventing or stimulating transcription of protein-coding genes) or post-transcriptional (e.g. by breaking down RNAs).

An invaluable resource for biomedical science

All data, analyses and results are available for download and interrogation in the R2 web portal, enabling the scientific community to implement this resource as a tool for the exploration of non-coding RNA biology and function.

“By combining all data in one comprehensive catalog, we have created a new valuable resource for biomedical scientists around the world studying disease processes,” said Dr. Pavel Sumazin, associate professor of pediatrics – oncology at Baylor College of Medicine and member of the Dan L Duncan Comprehensive Cancer Center. “The age of RNA therapeutics is swiftly rising – we’ve all witnessed the impressive creation of RNA vaccines, and already the first medicines that target RNA are used in the clinic. I’m sure we’ll see lots more of these therapies in the next years and decades.”

The Baylor team contributed to this work by analyzing sequencing data and interpreting gene function, including the function of thousands of the new genes predicted in the work. The results pointed to the identification of new non-coding genes, including thousands of single-exon long non-coding RNAs that regulate key pathways in virtually each normal and disease human cell type. RNA molecules of this type have been observed before but were disregarded and thought to carry no biological function.

/Public Release. View in full here.