Providing data access per HTTP using a ReST API in Django

At Full Stack Embedded we’re continuing our work on the weather server so that we can serve the observations we collect via ReST API. Read on to see the project’s progress… Or if you’ve ever asked yourself how to make a ReST API and/or server raw data from a database you access with Django.

Using Django as a server application lets us persist objects in a database and interact with them inside of a program as if they were normal software objects at runtime. This is done using an object relational model which connects the representation of the objects in the program with the representation of the objects in the database. This has all kinds of advantages and if you want to know why many projects choose to take this path and how it’s done, I would suggest the excellent Django documentation, which will tell you all you need to know about it.

Being able to access the objects in your database and manipulating them as if they’re a normal part of your program is only half the battle, though. You also need to be able to insert new ones into the database, as well as access those that are saved, manipulate them, etc. In this post I’ll show you how to access data that’s already in the database used by Django and transform it into other formats, then serve it to users using normal HTTP.

The data model

Django uses the model-view-controller (MVC) pattern for all user interactions. That means that you define a model of your system, views for examining it, and a controller for manipulating it. Today we’re talking about the model and view parts. In this section it’s about the model of the data we interact with on the weather server.

Our basic premise when recording weather observations is that weather is observed at a station, which never moves, and each observation is associated with a single station. This is a classic 1:n relationship, where one station can be associated with an infinite number of observations. We express this in the data model as follows in the models.py file of the Django project (this is, of course, a frozen copy. If you want the most current version, check it out on GitHub):

from django.db import models


class Station(models.Model):
    """Metadata for the observing station."""
    #: Unique station identifier
    station_id = models.IntegerField()
    #: Station's longitude in WGS84
    longitude = models.DecimalField(max_digits=7, decimal_places=4)
    #: Station's latitude in WGS84
    latitude = models.DecimalField(max_digits=6, decimal_places=4)
    #: Station's elevation over mean sea level in WGS84
    elevation = models.FloatField()
    #: Station's informal name
    name = models.CharField(max_length=80)
    #: Date of station activation.
    activated = models.DateTimeField('Station activated')
    #: Station's deactivation date. A reactivated station is a new station.
    deactivated = models.DateTimeField('Station deactivated',
                                       blank=True,
                                       null=True)
    description = models.CharField(max_length=200)

    def __str__(self):
        return self.name


class Observation(models.Model):
    """
    Weather observation.
    Observations are always in SI units.
    """
    obs_date = models.DateTimeField('observation date')
    #: Observing station
    station = models.ForeignKey(Station)
    temperature = models.DecimalField(max_digits=5, decimal_places=2)
    #: In %
    relative_humidity = models.DecimalField(max_digits=3, decimal_places=1)
    #: In mm
    precipitation = models.IntegerField()
    #: In m/s
    wind_speed = models.DecimalField(max_digits=5, decimal_places=2)
    #: In degrees clockwise from cartographic north
    wind_direction = models.IntegerField()
    #: In hPa
    pressure = models.IntegerField()

As you can see, a Station just contains metadata like a station’s name, its location, a description, etc., whereas an Observation has a foreign key connecting it to a Station, as well as observations for various meteorological variables associated with a timestamp.

Once this is in place and you’ve got your Django project migrated so that the appropriate fields are in the database, you’re ready to set up possibilities for users to view the data.

The views

Views are ways of examining parts of your program – in this case, we’re defining views to just access the data, not to further interact with it in any way.

Accessing views in Django is done using ReST (Representational State Transfer), which basically means that you access parts of the system by specifying what you want as a well-formed URL. Your entire request is encoded in the URL. The server parses that URL and formulates a response accordingly. This doesn’t require any knowledge outside of the request – the server doesn’t have to know where you’re from, what you’re interested in or what you’ve clicked beforehand. Thus, all requests are stateless, and statelessness is awesome because it helps us make robust systems that scale well.

URL patterns

Now that we’ve defined a proper model for the two objects in our program – stations and observations – we want to be able to access them as CSV files. CSV isn’t the prettiest way to swap data in the Internet, but pretty much anybody can read it and it’s simple, which is why we implemented it first.

I’m writing this at a time when the server hasn’t reached 1.0 yet, so I can’t guarantee the API won’t be changed. For the current API, see the current implementation on GitHub or the server’s current documentation.

Django parses regular expressions in the URL to figure it out. The entry point for accessing Django’s functions via HTTP is in urls.py. Right now, ours looks like this:

from django.conf.urls import include, url
from django.contrib import admin
from show_weather.views import index, csv_observation_request, csv_stations, png_observation_request

data_patterns = [
    url(r'^(?P<station>[0-9]+)/'                                         # Station ID
        r'(?P<start>[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2})'    # Start date
        r' - '
        r'(?P<end>[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2})/', include([  # End date
        url(r'csv$', csv_observation_request),
        url(r'png$', png_observation_request),])),
    url(r'^csv/stations/$', csv_stations),
]

urlpatterns = [
    url(r'^admin/', include(admin.site.urls)),
    url(r'^$', index),
    url(r'^data/', include(data_patterns))
]

The imports at the top are just to make the thing runnable. Line 3 imports the functions needed as for mapping requests to functions – arbitrary functions, which in this case are defined elsewhere in the project.

The action happens in lines 16-20. This is where individual URLs are mapped to functions. It works like this: urlpatterns is a list(-like) containing instances of url, produced by passing in a regular expression and a function that is called if that expression is matched. The list is processed sequentially, so if you have special cases you can put them up front and divert the program’s flow to the function you need. You can also pass an URL down the line for further processing to another pattern list of the same design. If you do this, the URL Django received is chopped down to the last part that matched, so the handler down the line receives a shorter URL. You’ll see this in action in a second.

The simplest example is line 18. If the URL is empty at the point in the program where the app receives the URL, it matches the regex ^$: there’s nothing between the beginning and the end. In this case, the request is passed to the function index, which is from us and is imported from show_weather.views.

The urlpattern for admin pages is similar. If the URL begins with the string admin/, the URL’s components up to that point are stripped down. The remaining components are passed to the handlers in admin.site.urls, which is imported in line 2.

You can see how that works in a bit more detail on line 19. There, all addresses beginning with data/ are stripped and passed down to the handlers in data_patterns. This is defined on lines 5-13, where 2 things happen.

If the request begins with a number, followed by a date range, it’s passed down to a further handler which checks whether a csv or a png is being requested. The request is then handed down to the appropriate function accordingly.

If that’s not the case, then the request needs to be csv/stations/, in which case it’s handed to the function csv_stations. Otherwise the request is unparseable and the user gets a 404 error.

This API is a bit inconsistent and will definitely change soon, so if you’re setting up your own weather server using our code, be aware of that.

Producing CSV output

Now our server knows how to route requests, but it still needs to produce the data the user wants. These are the views in the MVC pattern.

A view in Django is a function that receives a request, as well as any named parameters it pulled out of the URL (if you look at the section above you’ll see an examples for csv_observation_request and png_observation_request). Those parameters are processed using arbitrary code. The response needs to be a django.http.HttpResponse.

As are all other snippets introduced in this post, this is guaranteed to change. You can find the file in its entirety – frozen at the state I’m describing – on GitHub.

Discovering available stations

Since all observations are connected to stations, and you don’t have the metadata for where an observation took place without knowing where the station is, we offer the ability to look up the available stations in the server’s database. This is done by issuing a request like this, in this example running locally on my development machine:

me@linux:~> curl localhost:8000/data/csv/stations/
name,id,longitude,latitude,elevation,activated,deactivated,description
Daniel's apartment,0,-0.1419,51.5014,62.0,2014-01-01 00:00:00+00:00,,Where the party's going down!

This tells us that there’s 1 available station on the server, with the ID 0. We know that its name is Daniel's apartment. We know from the coordinates that it is at Buckingham Palace, and we know from the description that it is where the party’s going down. We also know that it’s been producing observations since 2014-01-01 00:00 and that it hasn’t stopped since.

This is accomplished like this:

def csv_stations(request):
    """Return CSV of available stations."""
    response = StringIO()
    response.write("name,id,longitude,latitude,elevation,activated,deactivated,"
                   "description\r\n")
    csv_renderer = writer(response)
    csv_renderer.writerows(
        ((station.name, station.station_id, station.longitude,
          station.latitude, station.elevation, station.activated,
          station.deactivated, station.description))
        for station in Station.objects.all()
    )
    response.seek(0)
    return HttpResponse(response)

Short and sweet, this function makes a StringIO buffer to work efficiently with all that text, writes a CSV header into it, and then renders the relevant information into the buffer using Python’s csv.writer. Note the sweet little generator expression from lines 8-11 which makes sure we’ve got a small memory footprint. The Station objects are queried from the database on line 11, in which we just ask for all instances of Station in the DB. Finally, the buffer is seeked back to the beginning so that when it’s read the entire result is returned, and it’s written into the HttpResponse we deliver back to the user at the end.

Obtaining observations

Observations are obtained in a similar fashion:

me@linux:~> # This data is totally random!
me@linux:~> # Address manually escaped - a browser does this for you automagically
me@linux:~> curl "localhost:8000/data/0/2014-08-12%2000:00%20-%202014-08-12%2023:00/csv"
date,station_id,temperature,relative_humidity,precipitation,wind_speed,wind_direction,pressure
2014-08-12 00:00:00+00:00,0,300.78,99.0,7,14.07,143,1016
2014-08-12 01:00:00+00:00,0,297.77,37.0,2,2.97,212,1003
2014-08-12 02:00:00+00:00,0,300.82,48.0,3,2.93,297,1015
2014-08-12 03:00:00+00:00,0,296.84,98.0,8,13.50,103,980
2014-08-12 04:00:00+00:00,0,299.11,49.0,3,9.65,172,1005
2014-08-12 05:00:00+00:00,0,295.46,40.0,2,10.18,162,984
2014-08-12 06:00:00+00:00,0,303.19,50.0,5,14.00,193,988
2014-08-12 07:00:00+00:00,0,297.20,57.0,8,2.21,97,990
2014-08-12 08:00:00+00:00,0,298.58,90.0,7,27.07,352,997
2014-08-12 09:00:00+00:00,0,300.39,47.0,9,29.88,147,990
2014-08-12 10:00:00+00:00,0,297.67,64.0,4,5.34,221,1004
2014-08-12 11:00:00+00:00,0,295.85,63.0,5,0.07,137,981
2014-08-12 12:00:00+00:00,0,303.64,99.0,3,13.81,12,989
2014-08-12 13:00:00+00:00,0,295.67,46.0,4,7.62,185,984
2014-08-12 14:00:00+00:00,0,300.99,86.0,1,20.52,285,989
2014-08-12 15:00:00+00:00,0,302.13,31.0,7,21.78,164,1001
2014-08-12 16:00:00+00:00,0,298.20,60.0,9,20.99,257,1003
2014-08-12 17:00:00+00:00,0,302.71,87.0,4,16.92,328,1009
2014-08-12 18:00:00+00:00,0,295.22,91.0,8,14.64,299,992
2014-08-12 19:00:00+00:00,0,295.23,77.0,5,23.53,200,997
2014-08-12 20:00:00+00:00,0,297.60,82.0,8,22.48,341,985
2014-08-12 21:00:00+00:00,0,297.18,73.0,0,23.66,216,1014
2014-08-12 22:00:00+00:00,0,299.97,55.0,8,25.16,120,1012
2014-08-12 23:00:00+00:00,0,302.51,42.0,9,4.63,122,1018

This is the code behind that:

def csv_observation_request(request, station, start, end):
    """Return CSV of observations in requested time range."""
    response = StringIO()
    response.write("date,station_id,temperature,relative_humidity,"
                   "precipitation,wind_speed,wind_direction,pressure\r\n")
    csv_renderer = writer(response)
    csv_renderer.writerows(
        ((obs.obs_date, obs.station.station_id, obs.temperature,
          obs.relative_humidity,
          obs.precipitation, obs.wind_speed, obs.wind_direction, obs.pressure))
        for obs
        in Observation.objects.filter(
            station__station_id=station,
            obs_date__gte=start,
            obs_date__lte=end
        )
    )
    response.seek(0)
    return HttpResponse(response)

The only essential difference here is that the Observations are filtered (lines 12-16) according to the parameters in the request.

Taken together with csv_stations, this is extremely WET, but I’d like to leave something for our students in Togo to refactor, so for the moment it’s staying.

Producing visualized output

Of course you can serve more than just text with Django. We’ve also built the possibility to encode temperature observations in a PNG and serve that. We’ll be extending this in the near future to include more observation types, and the code will probably be modified to consume the CSVs produced in the other functions. For now, the call looks like this when I visit http://localhost:8000/data/0/2014-08-12 00:00 - 2014-08-13 00:00/png:

The rudimentary temperature plot the weather server produces

The rudimentary temperature plot the weather server produces

As you can see, it’s still pretty rudimentary, but the point is that it works right now. It’s produced by the following code:

def png_observation_request(request, station, start, end):
    """Return PNG of observations in requested time range."""
    obs = Observation.objects.filter(station__station_id=station,
                                     obs_date__gte=start,
                                     obs_date__lte=end)

    obsdata = [o.temperature for o in obs]

    fig = plt.figure()
    ax1 = fig.add_subplot(1,1,1)
    plt.plot(obsdata)

    response = HttpResponse(content_type="image/png")
    plt.savefig(response, format="png")
    return response

Here we do the same familiar filtering trick on lines 3-5. The temperature is pulled out of the observations and inserted into a list on line 7, and then the data is plotted using matplotlib. Note that the HttpResponse needs to be instantiated with image/png as its Content-Type, otherwise Django won’t know how to insert the plot into it. If the response is instantiated as a plot from the beginning, it’s no problem to treat it as a file-like object and just write to it.

And that’s all there is to it! In the next post I’ll be detailing the progress we’ve made on the hardware, including the drivers and scheduler we’ve developed.

Advertisements
About

My name’s Daniel Lee. I’m an enthusiast for open source and sharing. I grew up in the United States and did my doctorate in Germany. I've founded a company for planning solar power. I've worked on analog space suit interfaces, drones and a bunch of other things in my free time. I'm also involved in standards work for meteorological data. I worked for a while German Weather Service on improving forecasts for weather and renewable power production. I later led the team for data ingest there before I started my current job, engineering software and data formats at EUMETSAT.

Tagged with: , , ,
Posted in Uncategorized

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

From the archive