Project Description

Much of the research in the geosciences, such as projecting future changes in the environment and improving weather and flood forecasting, is conducted using computational models that simulate the Earth’s atmosphere, oceans, and land surfaces. These geoscience models are part of the full research workflow that leads to scientific discovery. There is strong agreement across the sciences that reproducible workflows are needed. Open and reproducible workflows not only strengthen public confidence in the sciences, but also result in more efficient community science, leading to faster time to science. However, recent efforts to standardize data sharing and archiving guidelines within research institutions, professional societies, and academic publishers make clear that the scientific community does not know what to do about data produced as output from the computational models. To date, the rule for reproducibility is to “save all the data”, but model data can be prohibitively large, particularly in a field like atmospheric science. The massive size of the model outputs, as well as the large computational cost to produce these outputs, makes this not only a problem of reproducibility, but also a “big data” problem. To achieve open and reproducible workflows in geoscience modeling research, this project will bring together modelers representing diverse research areas and application types, and representing modeling efforts from large to small.

Discussion across different modeling communities suggests that the answer to “what to do about model data” will look different depending on model descriptors. Examples of important model descriptors include reproducibility, storage vs. computational costs, and value to the community. Since the atmospheric model community is incredibly diverse, this project will organize community workshops to tackle the problem. These workshops will involve representatives from across the geoscience modeling spectrum, including both operations and research, and ranging across complexity and size. The ultimate goal of these workshops is to provide model data best practices to the community, including scientific journal publishers, and funding agencies. To achieve this goal, this team of researchers suggests to craft rubrics based on the model descriptors that will help researchers and centers describe their model data in consistent terms so that proper decisions are made regarding archiving and retention.

Workshop Information

Additional details will be provided as they become available.

If you are interested in participating in one or both workshops, please provide the following information to Doug Schuster, schuster@ucar.edu:

Steering Committee

Adam Clark, NOAA/University of Oklahoma

Dr. Clark is a Research Meteorologist at the NOAA and Affiliate Associate Professor in the School of Meteorology at the University of Oklahoma. He is one of the lead planners and facilitators for the annual NOAA/Hazardous Weather Testbed Spring Forecasting Experiments and his primary research interests are geared toward developing tools and improving forecast models used for predicting severe weather.

Gilbert Compo, Cooperative Institute for Research in Environmental Sciences (CIRES) at the University of Colorado Boulder

Dr. Compo is a Senior Research Scientist at CIRES who co-leads the ongoing NOAA 20th Century Reanalysis Project. Through his research, he has vast experience generating and working with umerical model generated ocean and atmospheric historical reanalysis products.

Laura Condon, University of Arizona, Hydrology and Atmospheric Sciences

Dr. Condon researches water sustainability and the dynamics of hydrologic systems in the context of past development and future climate change. Her work combines physically based numerical modeling with statistical techniques to evaluate complex system behavior. She is a member of the ParFlow development team and teaches integrated modeling courses.

Gokhan Danabasoglu, NCAR, Climate and Global Dynamics Laboratory

Dr. Danabasoglu brings extensive experience as a leader, developer, and Chief Scientist for the Community Earth System Model (CESM) – an internationally recognized, data-intensive, community-developed, state-of-the-art global climate model.

Josh Hacker, Jupiter

Dr. Hacker is a co-founder of Jupiter Intelligence, Inc., and leads the Earth and Ocean Systems team at Jupiter. His research has spanned numerical weather prediction, predictability and data assimilation. His computing experience and interests span traditional HPC to public cloud platforms.

Michael A. Friedman, American Meteorological Society (AMS)

Dr. Friedman is the Senior Manager for Publishing Operations within the American Meteorological Society (AMS). He also serves as the AMS staff liaison to the AMS Board on Data Stewardship, and is involved with external initiatives to develop common approaches and standards for data citation, storage, and stewardship among publishers, funders, and repositories in the earth and space sciences.

Jared Marquis, Univeristy of North Dakota, Atmospheric Sciences

Mr. Marquis is a PhD student in the Department of Atmospheric Science at the University of North Dakota under Dr. Jianglong Zhang. His research experience includes estimating impact of cirrus cloud contamination in infrared sea surface temperature retrieval using radiative transfer simulations, dust and cirrus biases in hyperspectral radiance assimilation, and various modification of radiative transfer code microphysics and solar models. Other research interests include remote sensing, seasonal range forecasts, and mesoscale modeling.

Matthew Mayernik, NCAR, Library, co-PI

Dr. Mayernik leads NCAR initiatives on data citation, management, and policy. He is also a member of the AMS Board on Data Stewardship, and led an AMS initiative to write a “Data Archiving and Citation” recommendation that was instituted in 2015.

Gretchen Mullendore, University of North Dakota, Atmospheric Sciences, co-PI

Dr. Mullendore has used a range of mesoscale models throughout her research career (e.g., UW-COMMAS, WRF). She has used models for many different outcomes, ranging from idealized sensitivity studies to real-time operational forecasting. Dr. Mullendore teaches graduate-level classes in NWP and is a member of the Unidata Strategic Advisory Committee.

Elisa Murillo, Univeristy of Oklahoma, School of Meteorology

Ms. Murillo is a PhD student in the School of Meteorology at the University of Oklahoma working with Dr. Cameron Homeyer. Her research interests focus on tropopause-penetrating convection, and her model experience includes a variety of science objectives, such as idealized sensitivity studies of mesoscale models and global ensemble models (e.g. CESM).

Douglas Schuster, NCAR, Computational & Information System Laboratory, co-PI

Mr. Schuster is the manager for the Data Engineering and Curation Section at NCAR, which manages the NCAR Research Data Archive (RDA). He is also the current chair of the AMS Board on Data Stewardship.

Gary Strand, NCAR, Climate and Global Dynamics Laboratory

Mr. Strand is the CESM Data Manager. He brings considerable experience as a software engineer specializing in climate model data management (archiving, access, metadata, provenance, etc.), data analysis and visualization, and the many issues involved with terascale and petascale data.

Project Award Information

NSF Award 1929773

NSF Award 1929757