FAQ

Frequently Asked Questions about GeneLab

General

Submission

Details

 

 


 

General

What is GeneLab?

The multi-year GeneLab project is both a science collaboration initiative as well as a data system effort to establish a public bioinformatics repository.The mission of GeneLab is to maximize the utilization of the valuable biological research resources aboard the International Space Station (ISS) by collecting genomic, transcriptomic, proteomic, and metabolomics data known as “omics.” These omics data will enable exploration of the molecular network responses of terrestrial biology to the space environment, and will be made available to researchers worldwide. The ultimate goal of GeneLab is to enable space exploration by allowing researchers to understand the complex responses of biological systems to the space environment. GeneLab data will potentially be useful for the development of countermeasures, monitoring the microbes that colonize the space station, understanding how food plants could be modified to grow better in space, and unraveling the responses of humans and other organisms to the combined effects of altered gravity and space radiation.

What are omics data?

Omics refers to multiple datatypes when the Latin suffix -ome ('totality') is added to a field of study. If the data are derived from DNA (which encode genes comprising the genome), then the data are genomic data. Data from RNA which are transcribed from DNA are called transcriptomics; from protein - proteomics, from metabolic substances - metabolomics. Other data types include epigenomics and epitranscriptomics - data that show modifications of DNA and RNA respectively that are involved in gene regulation, and lipidomics - characterizations of the lipids (fats) in a sample. Metagenomics is the study of genetic material from an environmental or clinical sample. The sample may contain many different microbial types (i.e. bacteria, archaea, fungi, protists and viruses) and might be quite complex in nature. Data sets generated by high-throughput analysis of these various omics types are often quite large.

What is the GeneLab Data System (GLDS)?

The GeneLab Data System (GLDS) is NASA’s premier open-access omics data platform for biological experiments. GLDS archives, houses, and freely distributes standards-compliant, high-throughput sequencing and other omics data from spaceflight-relevant experiments. The data are meticulously curated by GeneLab project bioinformaticians and enhanced by associated experimental metadata that include flight information, project details, sample/tissue processing protocols, omics analysis details and other ancillary metadata. For more information about various aspects of the GLDS or the GeneLab project please see our documentation.

Who sponsors GeneLab?

GeneLab is being developed at NASA’s Ames Research Center. Science direction is provided by the Space Life and Physical Sciences Division (SLPSRA) at NASA Headquarters. Project funding is provided jointly by SLPSRA, and the International Space Station Research Integration Office (ISSRIO) at the Johnson Space Center.

What data types does the GLDS support?

GeneLab accepts studies whose design types, assay types, and data types match those listed in the table below.

Study Design Type Assay Type Data Type Accepted Data File Format(s)

Genotyping Design

 

Genotyping Assay

DNA Sequence Data

FASTA, FASTQ

Genotyping by Array Design

Genotyping by Array Assay

DNA Sequence Data

CEL, Tab-delimited Text

Genotyping by High Throughput Sequencing Design

Genotyping By High Throughput Sequencing Assay

DNA Sequence Data

FASTA, FASTQ, SAM, HDF4, NetCDF, SFF, GTF/GFF3/VCF/BED

Comparative Genome Hybridization by Array Design

Comparative Genomic Hybridization By Array Assay

DNA Sequence Data

CEL, Tab-delimited Text

ChIP-seq Design

ChIP-seq Assay

DNA Sequence Data

FASTA, FASTQ, SAM, HDF4, NetCDF, SFF, GTF/GFF3/VCF/BED

DNA Methylation Profiling By High Throughput Sequencing Design

DNA Methylation Profiling By High Throughput Sequencing Assay

DNA Sequence Data

FASTA, FASTQ, SAM, HDF4, NetCDF, SFF, GTF/GFF3/VCF/BED

MicroRNA Profiling by Array Design

 

MicroRNA Profiling by Array Assay

 

Transcription Profiling Data

CEL, Tab-delimited Text

MicroRNA Profiling by High Throughput Sequencing Design

MicroRNA Profiling by High Throughput Sequencing Assay

Transcription Profiling Data

FASTA, FASTQ, SAM, HDF4, NetCDF, SFF, GTF/GFF3/VCF/BED

Transcription Profiling by Array Design

Transcription Profiling by Array Assay

Transcription Profiling Data

CEL, Tab-delimited Text

Transcription Profiling by High Throughput Sequencing Design

Transcription Profiling by High Throughput Sequencing (RNA-seq)

Transcription Profiling Data

FASTA, FASTQ, SAM, HDF4, NetCDF, SFF, GTF/GFF3/VCF/BED

Transcription Profiling by RT-PCR Design

Transcription Profiling by RT-PCR Assay

Transcription Profiling Data

Tab-delimited Text

Transcription Profiling by Tiling Array Design

Transcription Profiling by Tiling Array Assay

Transcription Profiling Data

CEL, Tab-delimited Text

Translation Profiling Design

Poly(A)-Site Sequencing Assay

RNA Bind-n-Seq Assay

Ribosomal Profiling by Sequencing Assay

Self-transcribing Active Regulatory Region Sequencing Assay

Serial Analysis of Gene Expression

Transcript Leader Sequencing
Transcription Profiling by MPSS Assay

Translation-associated Transcript Leader Sequencing

Translation Profiling Assay

Transcription Profiling Data

FASTA, FASTQ, SAM, HDF4, NetCDF, SFF, GTF/GFF3/VCF/BED, CEL, Tab-delimited Text

Proteomic Profiling Design

Mass Spectrometry Assay

Mass Spectrometry Assay Data,

mzML, mzQuantML, spML, pdb, pdbseq, pdbnuc, pdbnucseq, CID, HCD, mzData, DTA, PKL, MS2, MGF, ETD, RAW

Proteomic Profiling Design

Electrophoresis System

Electrophoresis System Data

GelML, spML

Proteomic Profiling Design

Western Blot Analysis

Western Blot Analysis Data

spML

Proteomic Profiling by Array Design

Proteomic Profiling By Array Assay

Protein Microarray Assay Data

CEL, Tab-delimited Text

Metabolomic Profiling Design

Mass Spectrometry Assay

Mass Spectrometry Assay Data

mzML, mzQuantML, spML, mzData, DTA, PKL, MS2, MGF, ETD, RAW

Metabolomic Profiling Design

NMR Spectroscopy Assay

Spectra: .doc, .docx, .txt, .pdf, .tif formats

Metabolomic Profiling Design

Other Metabolite Profiling Assay (TBD)[1]

Metabolomic Data (TBD)[1]

CSV, Excel, and (TBD)[2]

[1] Pending concept specification by OBI (Ontology for Biomedical Investigation)

[2] Pending format(s) specification by life science research community

 

Submission

Submission process - How to do a submission?

Please see the following web page for how to submit data to GeneLab using this link.

Post-submission process - How do I modify a dataset or metadata?

Please contact GeneLab using this link. A GeneLab team member will assist you in making the corrections.

When will my data receive a GeneLab accession number?

Currently, the GeneLab accession number is assigned as the data is posted in the GLDS. During later GLDS phases in the coming years, it will be possible to assign the GL accession number before the data is posted similar to how the unique Digital Object Identifier (DOI) system works. It is not possible to modify our Version 1.0 software platform to perform this service, however Version 2.0 will incorporate this functionality and is slated for release in the fall of 2017.

I'm a reviewer, how do I access and evaluate pre-publication data?

Currently, the GeneLab Data System cannot provide limited access to datasets. All data is immediately publicly available. We are evaluating ways to provide limited access in future releases of the GLDS

Details

What is the data volume or size of the GeneLab data repository?

As of the latest GeneLab data Release 1.0.18.PD.1 (provisional data deployed on 8/31/2017), there is about a total of 7.5 TeraBytes (TB) of compressed data volume/size for the data repository. On average the uncompressed volume size for text data files is about 3 to 4 times using GNU zip (gzip) compression method, which is based on the DEFLATE compression algorithm.

How many distinct data sets/studies does GeneLab have?

As of the latest GeneLab data Release 1.0.18.PD.1 (provisional data deployed on 8/31/2017), there is a total of 138 distinct data sets/studies.

What technologies is the GeneLab Data System (GLDS) built on?

The current GLDS Phase 1 system is built using a NASA customized web-based, collaboration and knowledge sharing software platform called the Center for Cross-discipline Collaboration (C3 for short). It is developed by NASA Ames Research Center's Intelligent Systems Division (Code TI) using the Open Source Python web framework called Django, the Python and JavaScript programming languages, and the MySQL Relational DataBase Management System (RDBMS).

From the results of the realigned FY2016 GLDS software platform trade study, we are building a customizable new GeneLab software platform on top of the Broad Institute of MIT and Harvard's GenomeSpace integration platform for GLDS Phase 2 and beyond for extensibility, scalability, modularity, and performance.

Why should I deposit data in GL and not one of other online repositories available?

GeneLab has meticulous curated metadata compared to majority of other online bioinformatics data repositories. GeneLab is also focused on space biology domain specific data sets/studies; whereas other repositories may concentrate on other specific data domains. In the coming years, the GeneLab is planning to host a collaborative workspace, housing bioinformatic and other analysis and data display tools.

Can I get an Accession Number to include in my manuscript prior to the data being posted on GL?

For the current GLDS Phase 1 system, the unique GeneLab accession numbers are tied to the database record level and only issued shortly before the dataset is publicly released. For the upcoming GLDS bioinformatics software platform for Phase 2 and beyond due to be released in September 2017, the unique GeneLab accession numbers will be made readily available to be included in researchers' manuscripts soon after the data is submitted and far in advance of the public release of the data in GLDS.

Does my dataset need to be related to ISS research or does GL also host suborbital and ground results?

All data sets must be relevant to spaceflight research or to the study of gravity as a continuum. This includes ground results that simulate aspects of the spaceflight environment and suborbital experiments.