Ocean Cloud Commons

The Ocean Cloud Commons (OCC) is a cloud-based resource and repository that allows researchers to query the Tara Oceans Expedition Data in the cloud; and makes available comparative metagenomic tools through the Ocean Treasure Box (OTB). The Tara Oceans Expedition has provided the largest publicly available contiguous dataset available in genomics for any scientific project in the world. Using the research schooner Tara and modern sequencing and state-of-the-art imaging technologies, a multinational team of scientists sampled microscopic plankton at hundreds of sites and depths in all the major oceanic regions. The Tara Oceans Expedition data have been released, but it is a challenge for researchers to access, manipulate, and analyze such large-scale resources. This project creates an Ocean Cloud Commons (OCC), a cloud-based resource and repository allowing researchers to query the Tara Oceans Expedition Data in the cloud; it also makes available comparative metagenomic tools through the Ocean Treasure Box (OTB). The Ocean Cloud Commons and Ocean Treasure Box build upon established partnerships with organizations such as CyVerse Cyberinfrastructure, Agave Platform, OpenCloud, and computing facilities at the Texas Advanced Computing Center. The Ocean Cloud Commons uses an algorithm based on MapReduce to create a comparative metagenomics data resource in a Hadoop big data framework. The OCC can be widely accessed by researchers using tools developed in the Ocean Treasure Box and implemented as Apps in the CyVerse Cyberinfrastructure. Specifically, OTB tools deploy and compute on OCC data in OpenCloud via the Agave Platform and Developer API from CyVerse. Taken together, the OTB tools and OCC data resources enable researchers to address global-scale questions about the distribution of microbes across the sea that affect climate and ecosystem function.
This award by the Advanced Cyberinfrastructure Division is jointly supported by the NSF Directorate for Biological Sciences (Division of Biological Infrastructure), and the NSF Directorate for Geosciences.