Internet2

close
Use Internet2 SiteID

Already have an Internet2 SiteID?
Sign in here.

Internet2 SiteID

Blogs

E-CAS: Researchers to Present on the "Development of BioCompute Objects for Integration into Galaxy in a Cloud Computing Environment"

Mar 31, 2020, by Jamie Sunderland
Tags: Applied Research, E-CAS, Exploring Clouds for Acceleration of Science (E-CAS) Project, Frontpage News, Research & Education Networks, Research Solutions

By Raja Mazumder, Jonathon Keeney, Charles Hadley King, and Janisha Patel from George Washington University 

On Wednesday, April 1, 2020, researchers from George Washington University (GW) are presenting the results of the past year’s work as part of the NSF funded “Exploring Clouds for the Acceleration of Science” (E-CAS) project to a scientific panel and the general public. This online presentation includes a live demo of the platform, including a coronavirus (COVID-19) analysis workflow execution that has been made publicly available.

BioCompute objects (BCOs) allow researchers to record bioinformatic analyses comprised of any number of algorithmic steps and variables to make the entirety of a computational experiment clearly understandable and easier to repeat. 

We set out to create a library of BCOs that describe bioinformatic workflows and make them available to the international research community through the Amazon Web Services (AWS) platform. This library can then be accessed and contributed to by those who use popular bioinformatics platforms, such as Galaxy or the HIVE platform.

Project Background

As one of the aims of this project, the BCO specification was extended for analyses in specific platforms: HIVE AND Galaxy. To accomplish this, we mapped the Galaxy schema and HIVE schema to the BCO schema, enabling the data to be captured appropriately. A tool for each platform was then written to produce a properly formatted object according to the BCO schema.

Examples of Galaxy and HIVE platform instances that are BioCompute write-capable are now available on GW institutional hardware and on AWS. Repeated tests verify that identical pipelines built in either of these platforms will produce the same BCO (albeit with different unique identifiers) on the respective cloud platforms.

We constructed a BCO database based on the BioCompute schema with a graphical user-interface allowing a user to search and display BCOs or create their own. We incorporated existing code to check the conformance of a BCO into the database submission process. The BCO database has since been integrated into the local GW hardware, as well as the AWS platform. 

The Application Programming Interface (API) code (and related documentation) for the Galaxy BCO is incorporated into a fork of the core Galaxy Project code on GitHub, which allows other research groups to integrate this code into their own platform. 

Online Presentation Details

On Wednesday, April 1, we are virtually presenting the results of this year’s work to a scientific panel and the general public. We will present the research in three parts: 

1.    An introduction to BioCompute Objects. 
2.    Discuss the specific tools developed and how they led us to develop associated protocols and best practices. This includes tools and documentation for our H.I.V.E. AWS instance, a Galaxy AWS instance, and how we have developed cost estimation tools for these platforms.
3.    A live demo of a coronavirus (COVID-19) analysis workflow execution that is publicly available on our platform called the Coronavirus Analysis. This pipeline consists of an alignment tool and then a variant calling tool. We will then out put a BioCompute Object and deposit it into our BCO Portal. 

Here is information to connect with the online presentation.