UA Data Lake

Exciting times! We are collaborating with the University of Arizona high performance computing team to create the first ever University-wide Data Lake. What is a Data lake you ask? In short, a data lake is a large pool of data stored in a hadoop “big data” architecture that allows researchers to query and compute on enormous data sets. We are creating this amazing new data reservoir, by making high impact large-scale datasets such as twitter and the human and earth microbiome projects available in a linearly-scaling hadoop architecture. Researchers can then “buy-in” to add additional data nodes to store their own “big data” sets, that can be used in conjunction with persistent university data resources. This data lake will allow researchers university-wide to query relevant “big data” resources and pair these data up with metadata tags and their own data sets to answer ever evolving data questions. Today at UA, researchers mine large-scale datasets from diverse areas to perform research on how to staff emergency rooms based on twitter data to understanding the role of microbes in biogeochemical cycling in the world’s ocean.

iMicrobe Data Commons is up!

The iMicrobe Data Commons (funded by the Gordon Betty Moore Foundation) is now available as an interactive website (see imicrobe.us and data.imicrobe.us). The project was initially funded to make CAMERA microbial dataset available through an interactive data commons (data.imicrobe.us) and in the iPlant Data Store for use in the iPlant cyberinfrastructure (iplantcollaborative.org). Two months in and we are finished with those goals! We are now working hard on developing a query-able ontology for the data sets and making these data easily discoverable within the iPlant cyberinfrastructure. HOWTO docs are under development at wiki.imicrobe.us, and our first workshop is coming up at the American Society for Oceanography and Limnology (ASLO) in Granada, Spain. We hope to see you there (http://imicrobe.us/education).

Help Create a Federated Cyberinfrastructure for Ocean and Geobiology Environmental ‘omics

Aloha Oceanography and Geobiology science communities, Has your environmental research ‘dabbled’ in the realm of ‘omics? Are you interested in working with a broader community to help plan the way forward to build widely available facilities and cyberinfrastructures that will facilitate and enable ‘omics-based databases, analyses research for the whole community? Then please read on! We invite you to join the EarthCube Oceanography and Geobiology Environmental ‘Omics (ECOGEO) Research Coordination Network (RCN)! ECOGEO is a recently NSF-funded RCN led by Dr. Ed DeLong (MIT/UH Manoa). ECOGEO’s mission is to identify community needs and develop necessary plans to create a federated cyberinfrastructure to enable ocean and geobiology environmental ‘omics. The website has links on how to join EarthCube and our RCN and get signed up for our listserv. In addition to the RCN site, we are also conducting a BRIEF research survey aimed at identifying community needs with respect to ‘omics research. Please take 5-15 minutes to participate in the survey, as this will help create the foundation of our RCN’s mission. Thanks for your time! We look forward to working with you to create a new, community-supported way to do ‘omics research. If you have any questions, please contact Elisha Wood-Charlson, our communications project manager (email: ecogeo.rcn@gmail.com)

Lucky 13

I just became the 13th member of the steering committee for The EarthCube Oceaography and Geobiology Environmental ‘Omics (ECOGEO) Reseach Coordination Network (RCN) . This project was born out of an Earth Cube workshop in August 2013 called “Ocean ‘omics and technology cyberinfrastructure: current challenges and future requirements”. Through this RCN, we hope to define the infrastructure needs for the next generation of cyber-scientists in -omics based environmental sciences.

ASM 2015 iMicrobe workshop

Bonnie Hurwitz will be presenting a full-day workshop on the iMicrobe project (coming soon) at the 2015 meeting of the American Society of Microbiology in New Orleans, LA. When registration opens November 20 (Nov.13 for Premium Members), be sure to reserve your seat at the workshop to be held at New Orleans Ernest N. Morial Convention Center on May 30 (8:30am-4:30pm). Come learn about our cyberinfrastructure project to support research in microbial ecology.

Surgeons have true grit

I was invited to speak at Diabetic Limb Salvage Conference this month, and have come to the conclusion that surgeons have true git in the battle against infection. It takes amazing strength of character to treat patients with severe wounds, both from the perspective of delivering difficult news to patients regarding treatment options and the potential to lose limbs, to work in the OR to physically remove infected tissue before it spreads. One of the surgeons I spoke to, told me that he knows a fellow will be successful if he/she can make quick decisions on-the-fly and walk into unknown situations in the OR confidently. I “experienced” this myself, in live demo, watching surgeons at a remote OR at George Washington University make quick pivots and choices as they encountered unexpected damage when reconstructing a person’s foot. It was truly eye opening considering wounds in three dimensions, considering choices in sampling, and how microbial communities organize in space and time. I was perhaps one of the only computational/micro-biologist in a room full of 1000 clinicians and nurses. My conclusion: computational biologists need to interface with the real world of patient care on occasion to create better algorithms and sample prep considerations that can lead to antibiotic stewardship and directed care for infection.

Exploring Hadoop with MapR

Recently Bonnie and I had a chance to meet with our colleague Dr. Allen Day. The three of us all worked together many years ago at Cold Spring Harbor Lab under Dr. Lincoln Stein. Allen now works for MapR helping organizations understand computing with Hadoop, an area where we are actively developing. Traditional high-performance computing with clusters has been and will long continue to be very important to our work, but Hadoop promises major advances for how we bring together our data and our computing.

Programming for Biology

Last week I had the pleasure of acting as a teaching assistant for the Programming for Biology course at Cold Spring Harbor Laboratory. As I worked for 13 years at CSHL before joining UA, it was nice to get back to the campus and see old friends. It was either my third or fourth time to participate in the course — I’ve lost count over the years. My first boss at CSHL, Lincoln Stein, created the course, and it’s now run by Simon Prochnik and Sofia Robb. I was honored that they asked me to help out again, and, as usual, I was highly impressed by the course organization and the high caliber of the students. Bonnie and I have aspirations to create a similar course at UA.