UA Data Lake

Exciting times! We are collaborating with the University of Arizona high performance computing team to create the first ever University-wide Data Lake. What is a Data lake you ask? In short, a data lake is a large pool of data stored in a hadoop “big data” architecture that allows researchers to query and compute on enormous data sets. We are creating this amazing new data reservoir, by making high impact large-scale datasets such as twitter and the human and earth microbiome projects available in a linearly-scaling hadoop architecture. Researchers can then “buy-in” to add additional data nodes to store their own “big data” sets, that can be used in conjunction with persistent university data resources. This data lake will allow researchers university-wide to query relevant “big data” resources and pair these data up with metadata tags and their own data sets to answer ever evolving data questions. Today at UA, researchers mine large-scale datasets from diverse areas to perform research on how to staff emergency rooms based on twitter data to understanding the role of microbes in biogeochemical cycling in the world’s ocean.