Geospatial SPA using JavaScript only - part 1

If you have been following my blog you may have come across a few of my other articles related to geospatial data. In those articles, I focused on the geospatial capabilities of MongoDB, a technology that I used back then. For nearly a year, though, I have been working for another NoSQL vendor MarkLogic, whose product has won me over. As a result, I plan to write a series of articles explaining the ins and outs of my Geospatial single page application while highlighting some of the cool features in using MarkLogic as the database for the application's backend.

I started this project in my own free time, having an affection for geospatial data. The application itself was inspired by the amazing online Google+ Photo Editor. This tool not only allows you to view your photos, but also allows you to see additional metadata associated with your photos. The metadata is in a format called EXIF (Exchangeable Image File Format), a standard that specifies the metadata that images should contain when a photo is taken using either a digital camera (including DSLRs) or a smartphone. This information includes details of the camera, settings of the camera at the time the photo was created (for example, aperture and shutter speed), and - my favourite piece of information - GPS data, which is the latitude and longitude of the place where the photo was taken.

The online Google+ Photo Editor receives the information and then displays it. Here's a screenshot of the interface, if you're not familiar with it:

On top of displaying the image and the associated metadata, the editor allows you to edit the photo, providing sophisticated image re-touching and enhancing.

Now for the challenge: creating an application that is able to extract the EXIF information from photos and use the data for displaying the photos on a map.

I know you like screenshots, so here's one of the finished application, with all the photos in the database listed and with markers for each displayed on a map (I realize this doesn’t look as sophisticated as the Google example, but hey, I'm not a web designer):

In this first article, we are going to discuss an import script which is a separate Node.js application that loads documents into the database and also extracts metadata from the photos. In addition, we'll look at the database on the backend.

In fact, let's discuss the latter first.

In a previous article, I talked about how you can set up a MarkLogic database and then use the Node.js client API to connect to the database via Node.js.

To store the data, we are going to load JSON documents with the following structure into the database:

{  
  "filename": "IMG_1717.jpg",
  "location": {
    "type": "Point",
    "coordinates": [
      46.813167,
      17.769333
    ]
  },
  "make": "Apple",
  "model": "iPhone 4",
  "created": 1314206440000,
  "binary": "/binary/IMG_1717.jpg"
}

When you load a document into a MarkLogic database, it gets a URI, a unique identifier. We can control this URI, and, in fact, all the documents that the Node.js script is going to insert into the database, using a URI pattern of /image/[filename].jpg.json.

You may be wondering - where is the actual image? Well, notice the last property in the JSON example -- binary. The value of this property contains the URI to retrieve the image that corresponds to this document. Images that get inserted into the database will have their own unique URI, with this pattern: /binary/[filename].jpg.

One of the neat features of a MarkLogic database is that you can load JSON, XML, and text documents along with binary documents (as well as RDF triples) into the database, with no extra effort. That's right, you can insert a binary document with the same few lines that you use to insert a JSON document. The Node.js script loads binary documents into the database along with the JSON documents. And let me reiterate that we don't need to create a base64 encoded binary buffer.

The final piece for the database part is to set up an appropriate geospatial index. MarkLogic comes with a GUI administration interface, which makes adding indexes fairly effortless. (By the way, you can also add indexes using REST API calls, if you're into that sort of thing). The type of geospatial index to select depends on how the geospatial data is represented in the dataset. If you look at the example document structure above, you see that the latitude/longitude pair belongs to an array called 'coordinates' which is a property of the 'location' object. This structure means that the index should be a geospatial element child index. Here's a screenshot to see how I have set this up inside the MarkLogic Admin interface:

Now that our database is ready to go, let's discuss the Node.js script. The script should:

  • Accept a file or a folder as a first argument
  • Extract EXIF information from the photos
  • If there's no GPS EXIF information available, not insert the image into the database
  • Insert both a JSON document and the binary image into the database

The script depends on a few other packages, including the JavaScript library that helps you extract EXIF information and the MarkLogic Node.js Client API.

var fs         = require('fs');  
var ExifImage  = require('exif-makernote-fix').ExifImage; //own package  
var marklogic  = require('marklogic');  
var connection = require('./../dbsettings').connection;  
var db         = marklogic.createDatabaseClient(connection);  

With the few lines above, we are making a connection to the database.

Please note that the original EXIF node package (https://github.com/gomfunkel/node-exif), which I first used had some issues around retrieving the MakerNote metadata for some images (https://github.com/gomfunkel/node-exif/issues/32). Since it hasn't yet been fixed, I have implemented a fix of my own (removing the MakerNote extraction as I didn't need it) and registered my own package, which is called exif-makernote-fix.

Extracting EXIF data is amazingly easy using the package:

new ExifImage({ image: '/path/to/image.jpg'}, function(error, exifData) {  
  console.log(exifData); //returns object
});

If you look at the returned information, you will see something interesting under the GPS section:

gps:  
   { GPSLatitudeRef: 'N',
     GPSLatitude: [ 50, 46.92, 0 ],
     GPSLongitudeRef: 'W',
     GPSLongitude: [ 0, 58.42, 0 ],
     GPSAltitudeRef: 0,
     GPSAltitude: 0,
     GPSTimeStamp: [ 10, 18, 2777 ],
     GPSImgDirectionRef: 'T',
     GPSImgDirection: 116.82364729458918 }

Yes, the GPS data is stored as an array of numbers - e.g.: GPSLatitude: [ 10, 25, 22.682 ], representing degrees, minutes and seconds. In order for us to use these values, we need to convert them to decimal numbers.

If the GPSLatitudeRef is South (or in other words, if the Latitude Degree is between 0 and -90) the sign of the decimal number changes. Also, if the GPSLongitudeRef is West (or again, if the Longitude Degree is between 0 and -180) the sign of the decimal number changes.

Here's a very basic system to help you visualise the sign change:

    N (+)
W (-)    E (+)
    S  (-)

What this also means is that there needs to be some calculation done to convert the degrees, minutes and seconds into decimal numbers. The following will achieve exactly that:

var extractAndConvertGPSData = function extractAndConvertGPSData(location) {

        // only progress if the location is a valid data object
        if (typeof location === 'object') {

            // everything south of the equator has a negative latitude value
            if (location.latitudeReference === 'S') {
                location.latitude[0] = -location.latitude[0];
            }

            // everything west from the prime meridian has a negative longitude value
            if (location.longitudeReference === 'W') {
                location.longitude[0] = -location.longitude[0];
            }

            // the object that will hold the new, decimal lat/long pair
            var decimalLocation = {};
            var absoluteDegreeLatitude = Math.abs(Math.round(location.latitude[0] * 1000000));
            var absoluteMinuteLatitude = Math.abs(Math.round(location.latitude[1] * 1000000));
            var absoluteSecondLatitude = Math.abs(Math.round(location.latitude[2] * 1000000));

            var absoluteDegreeLongitude = Math.abs(Math.round(location.longitude[0] * 1000000));
            var absoluteMinuteLongitude = Math.abs(Math.round(location.longitude[1] * 1000000));
            var absoluteSecondLongitude = Math.abs(Math.round(location.longitude[2] * 1000000));

            var latitudeSign = location.latitude[0] < 0 ? -1 : 1;
            var longitudeSign = location.longitude[0] < 0 ? -1 : 1;

            decimalLocation.latitude  = Math.round(absoluteDegreeLatitude + (absoluteMinuteLatitude/60) + (absoluteSecondLatitude/3600)) * latitudeSign/1000000;
            decimalLocation.longitude = Math.round(absoluteDegreeLongitude + (absoluteMinuteLongitude/60) + (absoluteSecondLongitude/3600)) * longitudeSign/1000000;
            return decimalLocation;
        }
    }

The script itself is executed by calling a 'main' function importProcess(). This function checks the arguments provided, making sure that either the file or folder (path) exists. If the directory is valid, the script iterates through all the jpg/jpeg files in the directory, extracting the GPS information from them, and inserting them into the database. For individual files, it makes sure that the file is valid and then does the same as explained above.

var importProcess = function importProcess(callback) {  
        // get the path as the first agrument
        var arg    = process.argv[2];
        // make sure the argument exists (either file or folder)
        var exists = fs.existsSync(arg);
        // store the collection of files in an array
        var files  = [];

        if (exists) {

            // check whether the path is a directory
            if (fs.statSync(arg).isDirectory()) {
                fs.readdirSync(arg).filter(function(file) {
                    // only process files with jpg extension
                    if (file.toLowerCase().substr(-4) === '.jpg' || file.toLowerCase().substr(-5) === '.jpeg') {
                        files.push(arg + '/' + file);
                    }
                });

                // extract the GPS data out of the files
                getGPSInfo(files, function(data) {

                    // insert data to database
                    insertData(data, arg);
                });
            }

            // handle the scenario where the argument is a file
            else if (fs.statSync(arg).isFile()) {

                // extract GPS data out of one file
                getGPSInfo(arg, function(data) {
                    // insert data to database
                    insertData(data, arg);
                });
            }
        }

        // invalid or no argument provided
        else {
            arg = arg === undefined ? 'not supplied' : arg;
            console.log('The argument ' + arg + ' is not a valid path/file.');
            process.exit(1);
        }
    };

I'm not going to dive into the getGPSInfo function in detail, as it's nearly 100 lines long, but, basically, it uses the ExifImage library to extract the required information out, makes use of the extractAndConvertGPSData function, and, as part of its callback, builds up the data structure that will be inserted into the database.

Let's take a close look at the insert process. The function makes use of the MarkLogic db.documents.write() method:

var insertData = function insertData(data, path) {  
        if(path.toLowerCase().substr(-4) === '.jpg' || path.toLowerCase().substr(-5) === '.jpeg') {
            var file = path;
        } else {
            var file = path + '/' + data.filename;
        }

        db.documents.write({
            uri: '/image/' + data.filename + '.json',
            contentType: 'application/json',
            collections: ['image'],
            content: data
        }).result(function (response) {
            console.log('Successfully inserted JSON doc: ', response.documents[0].uri);
        });

        var ws = db.documents.createWriteStream({
          uri: '/binary/' + data.filename,
          contentType: 'image/jpeg',
          collections: ['binary']
        });
        ws.result(function(response) {
          console.log('Successfully inserted JPEG doc: ' + response.documents[0].uri);
        });
        fs.createReadStream(file).pipe(ws);
    };

I would like you to notice a few things here:

  • First, that the URI can be built up dynamically.
  • Second, that the db.documents.createWriteStream() method to insert the images into the database makes use of the node stream, which work similarly to unix pipes that let you easily read data from a source and pipe it to a destination. Using this method means that the insert is a lot faster.
  • Third I'd like to draw your attention to the contentType key. This is what tells the MarkLogic database whether the handle a particular document as binary or as JSON.
  • Finally, note the collections key. Collections are tags that enable queries to efficiently target subsets of documents within a MarkLogic database. Or, in other words I can now retrieve all the JSON documents just by reading all the JSON content that belongs to the image collection.

Finally, here's gif animation that shows the import process. In the Chrome window you can see the MarkLogic query console, a nice interface for viewing the contents of your database:

In the next article in this series, we will look at how the data is being served from the database via Node.js and ExpressJS. Stay tuned!

Show Comments