Embedding Dynamic GitHub-stored maps in websites

Summary:

Last time we looked at embedding GitHub data in an external map using JSON-P.
This can be made dynamic by freshly calling for the data and rebuilding the appropriate map layers.
However, we have to play nicely and match GitHubs request for repeat data drawdowns.

Introduction:

Last time we looked at how we can take GeoJSON data stored on GitHub and build it into a mapping system external to the GitHub website, using JSON-P to get the data. However, we only looked at doing this once. Given that we've got a website that could potentially show updates, it would be nice to refresh the map when new data comes in.

As we don't have any server-side software that could build a persistent relationship with a web browser client we can't push new data to our visitor -- instead we need to get them to invisibly check whether there are any changes since the page loaded, and request the data again if there has been.

The only real issue is that, to prevent their servers being overloaded with request, GitHub ask that you only poll for new data a certain number of times an hour. We need to set up our system to respect this constraint.

Checking GitHub for changes:

Polling GitHub for changes relies on understanding a number of elements associated with Hypertext Transmission Protocol (http) request headers. In particular, GitHub rolls most of the needed information into the x-ratelimit-limit element, and the eTag element of the protocol. As a web user, you never see these, but they can be attached to any web requests or answers associated with browsing.

The eTag element allows us to check whether any changes have been made to our GitHub repository without bothering the server too much. When you request something from GitHub, the server sends you an eTag value in the response. This is an encoded representation of the resource -- a hash. The server generates these when the resource changes. Rather than requesting all the data and checking whether it has changed ourselves, we can ask the server only to send us the full data if the eTag has changed. The http protocol includes elements to do this -- we just need some javascript to make the http request. If the server things our eTag is the current one, it will just return a http status of "304" meaning "No change". We can therefore look out for other statuses and act on them . Here's the basic code to get our first eTag. If there has been a change we store the eTag (and could update the map). As this is the first time we get the eTag, there will definitely be a change from the default "00" eTag. Later we can use the proper eTag we get back to make more checks:

var eTag = "00"; var req = new XMLHttpRequest(); req.open('GET', "https://api.github.com/repos/MassAtLeeds/RouteFactor/events", false); req.setRequestHeader("If-None-Match", eTag); req.send(null); if (req.status != "304") { eTag = req.getResponseHeader("ETag"); // Here we know we've had a change so we could also update the map. }

Note that the URL is the URL for the 'events' info for our "RouteFactor" repository -- this contains information on any changes to the repository.

So, how often can we do this? Github sets the http communication x-ratelimit-limit element to the number of requests that should be made at a maximum within an hour. We can get this element of the http header, thus:

var req = new XMLHttpRequest(); req.open('GET', "https://api.github.com/repos/MassAtLeeds/RouteFactor/events", false); var rate = req.getResponseHeader("x-ratelimit-limit");

We can now use our rate to set up how often we check the site:

var myVar=setInterval(function(){check()}, 3600000 / rate);

However each time we check the site, we should also check the x-ratelimit-limit hasn't changed and change the checking interval if it has:

check () { var req = new XMLHttpRequest(); req.open('GET', "https://api.github.com/repos/MassAtLeeds/RouteFactor/events", false); var rate = req.getResponseHeader("x-ratelimit-limit"); if (rate != req.getResponseHeader("x-ratelimit-limit")) { rate = req.getResponseHeader("x-ratelimit-limit"); window.clearInterval(myVar); myVar=setInterval(function(){check()}, 3600000 / rate); } }

Finally, if we find a change, we want to download the GeoJSON data again. To do this, we need to dynamically inject a new script into the webpage, the src of which is a JSON-P request. This is exactly as we did to get our starting data, but this time, rather than hard-wiring the SCRIPT tags, we're going to get javascript to write them into the webpage. As soon as this is done, the script with run, generating the JSON-P code which will, in turn, call back our method, as we saw last time. Once we've updated the map we can remove this SCRIPT, so the page doesn't fill with SCRIPTs if there's a lot of updates.

function addScript () { var script = document.createElement('SCRIPT'); script.type = 'text/javascript'; script.async = false; script.src = 'https://api.github.com/repos/MassAtLeeds/RouteFactor/contents/test-routes.geojson?callback=getData'; var firstScript = document.getElementsByTagName('script')[0]; firstScript.parentNode.insertBefore(script, firstScript); } function removeScript () { var firstScript = document.getElementsByTagName('SCRIPT')[0]; firstScript.parentNode.removeChild(firstScript); }

So, now we just need to bring this all in order. First we'll make our http request. If the data has changed we'll inject the new SCRIPT, update our map, and then remove the new SCRIPT.

You can see an example working here.

This adjusts things slightly to save on code repetition (for example, it separates off the http requests into a getHTTP() method so it can be called at the start to initialise the eTag etc. but you should recognise most of it. The main map update code is in addFeatures(), which clears the map of layers, and then adds each new feature as a layer to an array of layers, before finally adding each to the map. I've added them to the array in part to ensure the casting, but also because there are other Leaflet methods that work with arrays of layers. Note that the code to first set up the map now also uses this method, however we still need to hardwire in the first SCRIPT call otherwise the data isn't available early enough to set up the map the first time.

The other thing to note is that while we currently just look for status responses that are not 304, we could then interrogate the events stream more deeply to discover what changes have been made and respond in a more nuanced way.

So, that's how to use GitHub for some low-level, client-only dynamic mapping. I say "low-level" as it would be much better to use a dedicated client-server setup for dynamic mapping, not least because of the throttled checking rate for GitHub (when we last tried it was 60 hits an hour, so changes take a minute to appear). However, if you want something relatively simple, it's a good start.

Andy : Last edited 30 Apr 2014