Getting Started with AWS and OSDR Data

This how-to guide is for new users who would like to access and work with the Open Science Data Repository (OSDR) data available on the Registry of Open Data on AWS. This initiative, as part of the Science Mission Directorate (SMD) Open-Source Science Initiative, aims to enhance data accessibility for open science within the NASA space biology community. Through the OSDR S3 bucket, users can access a diverse range of data including microbe, plant, fruit fly, rodent, human cell culture, ground study, and commercial astronaut studies. This guide will walk you through the steps of installing the AWS Command Line Interface (CLI) and utilizing it to access OSDR data.

Table of Contents

Introduction to OSDR Data on AWG

OSDR data is now available on the Registry of Open Data on AWS, providing increased accessibility to open science data across NASA and the space biology community. The OSDR S3 bucket offers a wide range of data types for exploration and analysis.

To get started, you have two primary methods to access and explore OSDR data:

  • AWS S3 Browser Interface: This web-based interface allows you to visually browse and interact with the OSDR data. Access the browser interface to get started. (http://nasa-osdr.s3-website-us-west-2.amazonaws.com/).
  • AWS Command Line Interface (CLI): The AWS CLI is a powerful tool that enables programmatic access to AWS services, including OSDR data. This guide will focus on using the CLI to access OSDR data programmatically.

Installing the AWS CLI

Before you can start working with OSDR data using the AWS CLI, you need to install it on your system. Follow these steps to install the AWS CLI:

  • Open a terminal window
  • Visit the following link for installation instructions: Install or update the AWS CLI
  • Once the AWS CLI is installed, you're ready to start exploring OSDR data programmatically

Exploring OSDR Data

Now that you have the AWS CLI installed, you can use it to explore OSDR data in the S3 bucket. Here are some basic commands to help you get started:

  • To list the contents of the OSDR S3 bucket (command line): aws s3 ls --no-sign-request s3://nasa-osdr/
  • To list the contents of a specific OSDR dataset (e.g., OSD-96): aws s3 ls --no-sign-request s3://nasa-osdr/OSD-96/ --recursive

Copying Data Locally

To copy specific files from the OSDR S3 bucket to your local machine, you can use the aws s3 cp command. For example:

  • To copy a specific file to your local directory: aws s3 cp --no-sign-request s3://nasa-osdr/OSD-96/version-6/rna_seq/GLDS-96_rna_seq_Dmel_Can-S_wo_GC_5th-gen-GC-der_1.5hr_GSM2350418_R2_raw.fastq.gz_trimming_report.txt
  • To copy all files within a specific dataset to your local directory: aws s3 cp --no-sign-request s3://nasa-osdr/OSD-96/ --recursive

Additional Resources

If you're looking to delve deeper into OSDR data access and AWS capabilities, consider exploring the following resources:

This tutorial serves as a starting point for beginners to access and explore OSDR data using the AWS CLI. As you become more familiar with the tools and resources, you can expand your knowledge and take advantage of advanced features for data analysis and research.

Please Contact Us if you have any questions or need further assistance. Happy exploring the world of open science data on AWS!