Description
Cloud-based analysis of genomic datasets is increasingly vital for portability, reproducibility, and multi-institution collaboration, but transitioning to the cloud can be daunting. We will offer a workshop that will serve to eliminate some of the barriers to the adoption of these tools. Specifically, we will teach researchers how to access and utilize The Analysis, Visualization, and Informatic Lab-space (AnVIL), an environment that provides access to hosted data, reproducible tools, and collaborative workspaces, and comprehensive documentation to enable users to conduct research in the cloud. This workshop will demonstrate how to access and explore data in AnVIL. Participants will also learn to search for analysis tools in Dockstore, a platform for sharing portable, container-based tools and workflows written to be interoperable across local and cloud environments. Finally, they will analyze data in a Terra workspace, which is a dedicated space where researchers can access and organize the same data and tools and run analyses.
This workshop will specifically explore and demonstrate open-access data from the Human Pangenome Reference Consortium (HPRC), an NHGRI funded effort to create a more diverse and comprehensive reference human pangenome. We will present the data and methods produced and utilized within the first year of this project, which ultimately aims to release the assembly of high-quality diploid genomes from >350 ethnically diverse individuals across five years. Currently, raw data and assemblies from 45 individuals and associated Docker-based analysis workflows written in the Workflow Description Language (WDL) are available in the AnVIL for researchers to explore and utilize. Data and workflows will continue to be publicly released as early as possible to promote open science. These data make an excellent substrate for interaction with these data types and new workspaces and methods.
Using data and workflows from the HPRC, participants of this workshop will follow along with instructors to learn how to:
- Register for a Terra account and set up a project using $300 in free Google Cloud credits
- Set up a collaborative cloud workspace in Terra
- Access and explore Human Pangenome Data hosted by AnVIL
- Search for bioinformatics workflows in Dockstore and export them to a Terra workspace
- Configure and launch a Docker-based WDL workflow to conduct a parallel analysis
- Monitor cloud costs associated with an analysis
After completing the workshop, attendees will be able to leverage AnVIL to analyze hosted datasets and launch analyses that are reproducible and scalable. Attendees will also be familiar with Human Pangenome data and resources.