Embedding GitHub-stored maps in websites


Summary:

  • GitHub can store and track geographical data.
  • With added functionality, you can use this facility to build a crowdsourced mapping site.
  • To embed the mapped data within another site, you need to access the data through the JSON-P methodology.
  • Next time we'll look at making the map dynamically update as users add data.

Introduction:

GitHub is a 'free to those whose stuff is free' online storage repository. GitHub can be used to store, and allow editing access to, geographical data in the GeoJSON format (there is also functionality for tracking changes). GeoJSON is a relatively compact data format consisting of coordinates and attribute information as text. If you visit GeoJSON data on GitHub, you are first presented with a map of the data. This is non-editable as a map -- though you can edit the raw GeoJSON data within the repository, provided you have editing rights. Nevertheless, as we don't all see the world as a set of green GeoJSON text falling from the sky, the good people at Mapbox have created a free GitHub addin that presents GeoJSON data as an editable map, allowing visitors to add, delete, or edit data through a map interface.

As you might imagine, this presents a trivially simple method for setting up a crowdsourced / community map. You can see an example here by Robin.

What, however, if you want to embed the results of this, or some other GeoJSON file on GitHub within another site? You can embed the map created by GitHub directly. However, embedding a map doesn't give you a lot of flexibility in terms of setting up the map and interacting with it.


Getting GitHub Data into a map:

If we want to use our GitHub data elsewhere, say with a different mapping package, there is an issue. In general, most mapping packages you can embed within a website work with Javascript, but Javascript can't generally access resources not on the site that the Javascript-embedding webpage is sent from, so if your site is on www.mappygoodness.com, you won't be able to access resources on github.com.

The solution is to utilise a methodology called JSON-P (or "JSONP"). While you can't access resources on other sites from within Javascript, what you can do is embed Javascript from multiple sites within your webpage. While this Javascript can still only access resources from your site (because it's as if your site is making a copy of the Javascript and sending it with the webpage), JSON-P offers a cunning way of using this difference between scripts and resources to access resources on the server the embedded script is from. JSON-P is a service some servers offer in which embeds JSON data within a script as code. You can then embed this code within your webpage, and access the data. Neat hu?

Here's a simple example:

<SCRIPT type="application/javascript"
   src="https://www.someJSONserver.com/test-routes.geojson?jsonp=functionName">
</SCRIPT>

This tells the web browser to expect a SCRIPT at this point, but to go and find the code at the src location. Notice that the end of the address contains the code ?jsonp=functionName. A server that farms out JSON-P will recognise this, and send a SCRIPT like the following (formatted here for clarity):

functionName({
 "type": "Feature",
 "geometry": {
  "type": "Point",
  "coordinates": [15.6, 10.1]
 },
 "properties": {
  "name": "Innsmouth"
 }
});

This is then embedded within the SCRIPT tags above as if it were sent from our server. It is worth noting that many JSON-P uses get a script to dynamically make the SCRIPT tags above and inject them into the webpage, however, lots of mapping packages demand the data is available early in the webpage loading, so it is often simpler to manually add the tags near the top of the page rather than working out which elements are running when.

Notice that the data is embedded within a call to another piece of code "functionName", a method (a chunk of code to do a job), the name of which is determined by ?jsonp=functionName (as we've told the JSON-P code the method to call, this is known as a 'callback'). If we now write this method into our webpage, the JSON-P code, when it runs, will call this method (which is all it does) and pass it the data. Here's an example:

<SCRIPT type="text/javascript">
   function functionName(response) {
      var data = response.data.content;
      console.log(data); // dumps data to IE/Firefox/Chrome web developer console.
   }
</SCRIPT>

If you're familiar with javascript, you'll see that the data is passed to the method and that javascript interprets it as an object with a response.data.content variable that you can use to ask for the data.

Now, GitHub offers a JSON-P service for any JSON data stored in a GitHub repository. So, theoretically, all we have to do is request the JSON-P code by embedding it within our webpage, and we'll have our data, whichever server we actually want to work from.

In practice, however, things aren't quite so simple, especially if we decide that instead of the above we'll use an all-in-one get-and-parse JSON-P processing library. GitHub has its own callback=functionName call-back method structure, which means that standard JSON-P libraries that rely on jsonp=functionName often don't work, at it also encodes the data in a slightly compressed format. We therefore need to do a little work to get this running.

Firstly, our call to GitHub needs to be slightly non-standard, thus:

<SCRIPT type="application/javascript"
   src="https://api.github.com/repos/MassAtLeeds/RouteFactor/contents/test-routes.geojson?callback=getData">
</SCRIPT>

Secondly, the data that comes back needs re-encoding to a text format our mapping packages can cope with. It is encoded with BASE64 encoding. You can find some code to encode it into more standard text on NTT.cc. Assuming we put this in its own method, our called-back method would therefore look like this:

var jsonText = null;

function getData(response) {
   var data = response.data.content;
   jsonText = decode64(data);
   console.log(jsonText);
}

Note that we've added a page-level variable jsonText, which is going to end up with our data inside it in a recoded format, and which will be available to scripts across the page to use. Make sure that your called-back method, plus the recoding method, are before the SCRIPT requesting the JSON-P, so the browser knows about them prior to loading the JSON-P SCRIPT.

You should now be able to use the GeoJSON data in jsonText (in the example above we dump it to the IE/Firefox/Chrome web developer tools console.


You can find an example webpage that uses this technique to map with the popular Leaflet mapping package working here.

It uses jQuery to parse (split up and interpret) the recoded jsonText into leaflet-friendly GeoJSON objects -- note that jQuery won't work with the BASE64 encoded data, you have to recode it first (but there are other advantages to parsing it with jQuery - see below).


Security:

One final word about security. If you know anything about Cross-Site Scripting (XSS) attacks, this probably seems like absolute madness. You want to build a website that embeds a SCRIPT, the source of which is on an open public repository! This seems like it is asking for someone to come and add something unpleasant to your GitHub repository which will be uploaded into all your users' browsers. Indeed, there is still a marginal risk -- JSON-P makes people nervous for just this reason. However, this use-case is more secure than most. Firstly, GitHub is incharge of sending out the JSON, and it sends it out in Base64 format with a wrapping object. This is interpreted as a String (text), so that while

var data = alert("Hello World");

in javascript will pop up an alert, even if response.data.content is an encoding of alert("Hello World");, the following should not:

   var data = response.data.content;
   jsonText = decode64(data);

unless you do something very daft, like evaluating the String as a function.

Secondly, jQuery is used to parse the JSON. Theoretically one can evaluate JSON into an object, providing the opportunity for code to be executed, but jQuery's parse function should just analyse it as a series of data types and create an appropriate object without ever running any component of it. jQuery will utilise local JSON.parse functionality where it is available, and this should adhere to the ECMAScript standard for JSON parsers (see 15.12.2). This does not dictate runnable objects as a result of parsing, or that code should be run as part of the process.

Nevertheless, never say never. It is worth additionally setting your webpage's Content-Security-Policy to prevent Cross-Site Scripting attacks bouncing your users beyond your server and GitHub. More details at: this tutorial. Of course, if you don't want crowdsourcing, but just want somewhere your group can place data for the public to access, you can always also set your GitHub repository to be read-only.


Dynamic mapping:

So, this gets you a GeoJSON based map serving the data from GitHub. However, it will only load the data once, when the webpage starts. What if you want a crowdsourced map that updates as new data is added?

We'll look at this next.


Andy : Last edited 30 Apr 2014