Pythonic GRIB

If you work with meteorological data, at some point or another you will encounter the infamous GRIB. Once an acronym for “GRIdded Binary”, GRIB now stands for “General Regularly-distributed Information in Binary” form. The format is used for storing just about any weather or climate model outputs and is defined by the World Meteorological Organization (WMO).

I get a lot of questions about how to work with GRIBs. For a long time this was a pain. For some time now, though, GDAL also supports opening GRIBs. There are also great methods of working with GRIBs directly, including the GRIB API from ECMWF. In this post I’ll show you an extension of the GRIB API which provides a more comfortable interface to the GRIB API from Python. This extension will eventually be incorporated into the GRIB API, but if you can’t wait for it you can also install it and use it on your own.

From a GIS perspective, GRIBs are basically a stack of self-contained layers. Each layer is called a message and is completely independent of all other layers. That means that, in a single “GRIB file” you might have thousands or more individual GRIB messages, each of which could be from a different model, a different type of forecast, might contain a different variable and be encoded using another coordinate system and/or compression method. The WMO standardizes this pretty well, meaning that the format is well-defined, but but there is no reference implementation. That’s one of the reasons there are all kinds of different packages that can read and write GRIB, all of which have different strengths and weaknesses. If you’re interested in learning more about GRIB, you can always check the WMO site, but honestly, the best description of GRIB I’ve ever found was on Wikipedia, so I would start there.

Let’s get to actually reading and writing GRIB. GDAL is your best bet for any general raster data, and it does have a GRIB driver, but it’s based on NOAA’s degrib library. degrib is a fine piece of software, but development on degrib has stopped – it continues to be maintained, but it’s not a high priority.

My best experience is with the ECMWF’s GRIB API. It comes with interfaces for C, Fortran and Python and also implements a number of general tools for peeking into GRIBs and examining their contents. The library is under active development and has grown by leaps and bounds over the past couple of years. ECMWF is planning to discontinue the GRIB API, but it will move the code for the GRIB API into ecCodes, which will continue to provide the same interface and be further developed for the foreseeable future. Since GRIB API is still current, and since code for the GRIB API is compatible with ecCodes, GRIB API is a library that you can use sustainably. It’s used in production code by a number of weather services to en- and decode their weather data in their operational numerical weather prediction, as well as in many other operational and research applications.

So why do people still ask how to work with GRIBs? Especially in Python, GRIBs are not the most comfortable files to work with. Using the GRIB API, you have full access to the underlying C code for accessing GRIBs. This is nice, but ECMWF provides that only by exposing the functions – for the most part unmodified – to Python. This makes for a lot of bookkeeping. For example, each message has its own handle that you have to manage on your own. It’s an integer that the GRIB API uses internally to access the message in quesiton. So if you want to access each message in a GRIB file, it looks like this (example inspired by ECMWF’s examples):

from gribapi import (grib_count_in_file,
                     grib_new_from_file,
                     grib_get,
                     grib_release)

with open(GRIB_FILE) as gribfile:
    n_msg = grib_count_in_file(gribfile)
    msg_ids = [grib_new_from_file(gribfile) for i in range(n_msg)]

This opens the file GRIB_FILE, counts the number of messages, and stores a gid – an ID for each message – by requesting a handle for each message in the file. But it doesn’t stop there. Now you still have to use the handles. And remember to release them when you’re finished. You could now get the shortName of the variable stored in each message like this:

for i in range(n_msg):
    msg_id = msg_ids[i]
    print("Shortname of message {}: {}".format(
      i + 1, grib_get(msg_id, "shortName"))
    grib_release(msg_id)

If you hadn’t released the message at the end, the handle would still be open and running around in your RAM. You could use this technique to access any of the values stored in the GRIB, regardless of whether they’re actual data or metadata.

That’s not very Pythonic – you first have to open a file, count the GRIBs, request handles for all of the messages stored in the file, then iterate over the individual handles, having your way with them and closing them manually when you’re finished. That’s why I wrote an extension that takes care of the bookkeeping for you: PythonicGRIB.

PythonicGRIB lets you have the full functionality of the GRIB API, but it makes working with GRIBs a lot easier. GRIB files track their own message handles, for example, and you can use them in a context manager. So if you want to do the same thing as the example above – open a file, iterate over its messages while reporting their shortNames, and then close the file and release the message handles – it looks like this:

from pyth_grib import GribFile, GribMessage

with GribFile(filename) as grib:
    print("Number of msgs in file: {}".format(len(grib)))
    for i in range(len(grib)):
        msg = GribMessage(grib)
        print(msg["shortName"])

Where’d all the bookkeeping go? And look what you can do with the messages, assuming you have one open:

print("Msg size in bytes: {}".format(msg.size))
print("Keys in msg:")
for key in msg.keys():
    print(key)
# Check if value is missing
msg.missing("scaleFactorOfSecondFixedSurface")
# Set a key
msg["scaleFactorOfSecondFixedSurface"] = 5
# Set a key to missing
msg.set_missing("scaleFactorOfSecondFixedSurface")
# Set array values
msg["values"] = [1, 2, 3]
# Write to file
with open(TEST_FILE, "w") as test:
    msg.write(test)

That’s all there is to it.

Want to work on it? Fork it on GitHub. Want to use it? Install it from PyPI with:

pip install PythonicGRIB
About

My name’s Daniel Lee. I’m an enthusiast for open source and sharing. I grew up in the United States and did my doctorate in Germany. I've founded a company for planning solar power. I've worked on analog space suit interfaces, drones and a bunch of other things in my free time. I'm also involved in standards work for meteorological data. Now I work for the German Weather Service on improving forecasts for weather and renewable power production.

Tagged with: ,
Posted in Uncategorized

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

From the archive