Where is the raw sequencing data stored publicly?

Data availability

Most peer-reviewed journals require studies dealing with bioinformatic data to include a data availability statement. This statement explains where the raw sequencing data and all important associated data sets are available. Publishing sequence data online is now mandatory and considered good practice, as it allows other scientists to repeat the analyses or use the data for their own studies.

Typically, the Data Availability Statement is located at the end of the manuscript. Sometimes it is also located at the very beginning. The information can also be integrated into the Materials and Methods section.

Sequence Read Archive (SRA)

Sequence data is published as raw data. The world’s largest public database for raw sequence data from high-throughput sequencing is the Sequence Read Archive (SRA). The SRA was founded in 2007 and is now jointly operated by the NCBI (USA), the EBI/ENA (Europe), and the DDBJ (Japan).

The SRA is the world’s largest collection of sequence data.

Visit the Sequence Read Archive (SRA)

  1. Visit https://www.ncbi.nlm.nih.gov/sra
  2. Search for SRA data entry SRR14313721

Next, you should see the SRA record entry.

What information can you see?

Can you find out the meaning of…

  • BioProject
  • BioSample
  • Run

SRA entry information

The SRA offers the possibility to retrieve a lot of information about the sequencing. We will look at this in the next step. To do this, we click directly on the run entry SRR14313721.

SRA entry up close

Click on SRR14313721.

This should open the detailed view, which provides information on the number of sequenced bases, the quality of the sequencing, and the format of the sequence data.

Taxonomy Analysis Tool

The SRA web interface also offers a simple analysis of the taxonomy. This is helpful if you want to know what is included in the sequencing. It can provide an indication of whether there may be contamination in the sequencing.

Taxonomy Analysis Tool - Window into the SRA dataset

At the bottom, there is a field called ‘View in Krona’, which we click on.

The Taxonomy Browser window will then open. This analysis is performed automatically by NCBI SRA for every dataset that is uploaded there. It is a rough metagenomic analysis based on the information stored at NCBI itself. You can see which organisms (including viruses) were found in the sequence data.

The web interface also offers an interactive way to analyse the composition. This tool is called Krona, which we will launch next.

The Krona tool for interactive analysis will then open. Play around with the interactive display to understand how Krona works!

Search SRA database

It’s time to put what you’ve learned into practice. How about searching the SRA database yourself? Can you think of any organisms that might have been sequenced?

Let’s find out. Search for organisms you work with!

Your own search

Which organism comes to your mind? Then search for it!
Examples:

  • Ascovirus
  • Homo sapiens
  • Beauveria bassiana
  • Silkworm

Visit https://www.ncbi.nlm.nih.gov/sra to start your search.

Also use Krona to analyse your search!

Let’s discuss your findings!

Who found something interesing?
Can you share your findings?