Complications in Climate Data Classification: The Political and Cultural Production of Variable Names


  • Nicholas M Weber University of Illinois at Urbana-Champaign
  • Andrea K Thomer University of Illinois at Urbana-Champaign
  • Gary Strand Climate Change Prediction Group, Climate and Global Dynamics division, National Center for Atmospheric Research



Model intercomparison projects are a unique and highly specialized form of data—intensive collaboration in the earth sciences. Typically, a set of pre‐determined boundary conditions (scenarios) are agreed upon by a community of model developers that then test and simulate each of those scenarios with individual ‘runs’ of a climate model. Because both the human expertise, and the computational power needed to produce an intercomparison project are exceptionally expensive, the data they produce are often archived for the broader climate science community to use in future research. Outside of high energy physics and astronomy sky surveys, climate modeling intercomparisons are one of the largest and most rapid methods of producing data in the natural sciences (Overpeck et al., 2010).

But, like any collaborative eScience project, the discovery and broad accessibility of this data is dependent on classifications and categorizations in the form of structured metadata—namely the Climate and Forecast (CF) metadata standard, which provides a controlled vocabulary to normalize the naming of a dataset’s variables. Intriguingly, the CF standard’s original publication notes, “…conventions have been developed only for things we know we need. Instead of trying to foresee the future, we have added features as required and will continue to do this” (Gregory, 2003). Yet, qualitatively we’ve observed that  this is not the case; although the time period of intercomparison projects remains stable (2-3 years), the scale and complexity of models and their output continue to grow—and thus, data creation and variable names consistently outpace the ratification of CF.