Skip to main content

Geospatial SPA using JavaScript only - part 1

8 min read

Older Article

This article was published 11 years ago. Some information may be outdated or no longer applicable.

If you’ve been following my blog, you may have spotted a few other articles about geospatial data. Those focused on MongoDB’s geospatial capabilities. For nearly a year, though, I’ve been working for another NoSQL vendor, MarkLogic, whose product won me over. So I’m writing a series of articles that walk through my geospatial single page application while highlighting some of MarkLogic’s features.

I started this project in my own time, out of a genuine affection for geospatial data. The application was inspired by the Google+ Photo Editor. That tool lets you view your photos and see additional metadata in a format called EXIF (Exchangeable Image File Format). EXIF is a standard that specifies the metadata images should contain when captured by a digital camera (including DSLRs) or a smartphone. It includes camera details, settings at the time of capture (aperture, shutter speed), and my favourite bit: GPS data. Latitude and longitude of where the photo was taken.

The Google+ Photo Editor pulls in that information and displays it. Here’s a screenshot if you haven’t seen it:

On top of displaying the image and metadata, the editor lets you retouch and enhance photos.

The challenge I set myself: build an application that extracts EXIF information from photos and plots them on a map.

Here’s a screenshot of the finished application, with all photos in the database listed and markers for each displayed on a map (it’s not as polished as the Google example, but I’m not a web designer):

In this first article, we’ll cover the import script (a separate Node.js application that loads documents into the database and extracts photo metadata) and the database backend.

Let’s start with the database.

In a previous article, I covered setting up a MarkLogic database and using the Node.js client API to connect to it.

We’ll load JSON documents with the following structure into the database:

{
  "filename": "IMG_1717.jpg",
  "location": {
    "type": "Point",
    "coordinates": [46.813167, 17.769333]
  },
  "make": "Apple",
  "model": "iPhone 4",
  "created": 1314206440000,
  "binary": "/binary/IMG_1717.jpg"
}

Each document in a MarkLogic database gets a URI (a unique identifier). We control this URI and use the pattern /image/[filename].jpg.json.

Where’s the actual image? Look at the last property: binary. Its value contains the URI to retrieve the corresponding image. Images inserted into the database get their own URI following the pattern /binary/[filename].jpg.

One of MarkLogic’s neat tricks: you can load JSON, XML, text, and binary documents (plus RDF triples) into the same database with no extra effort. The same few lines that insert a JSON document can insert a binary one. The Node.js script loads both. And no, we don’t need to create a base64 encoded binary buffer.

The last piece on the database side is setting up a geospatial index. MarkLogic ships with a GUI administration interface that makes adding indexes painless (you can also add them via REST API calls, if that’s your preference). The index type depends on how the geospatial data sits in your dataset. Looking at our document structure, the latitude/longitude pair lives in an array called ‘coordinates’, which is a property of the ‘location’ object. That means we need a geospatial element child index. Here’s how I set it up in the MarkLogic Admin interface:

With the database ready, let’s look at the Node.js script. It should:

  • Accept a file or a folder as a first argument
  • Extract EXIF information from the photos
  • Skip images with no GPS EXIF data (don’t insert them)
  • Insert both a JSON document and the binary image into the database

The script depends on a few packages, including the JavaScript library for EXIF extraction and the MarkLogic Node.js Client API:

var fs = require('fs');
var ExifImage = require('exif-makernote-fix').ExifImage; //own package
var marklogic = require('marklogic');
var connection = require('./../dbsettings').connection;
var db = marklogic.createDatabaseClient(connection);

Those few lines establish the database connection.

Please note that the original EXIF node package (https://github.com/gomfunkel/node-exif), which I first used had some issues around retrieving the MakerNote metadata for some images (https://github.com/gomfunkel/node-exif/issues/32). Since it hasn’t yet been fixed, I have implemented a fix of my own (removing the MakerNote extraction as I didn’t need it) and registered my own package, which is called exif-makernote-fix.

Extracting EXIF data is dead simple with the package:

new ExifImage({ image: '/path/to/image.jpg' }, function (error, exifData) {
  console.log(exifData); //returns object
});

Look at the returned information and you’ll spot something interesting under the GPS section:

gps:
    { GPSLatitudeRef: 'N',
      GPSLatitude: [ 50, 46.92, 0 ],
      GPSLongitudeRef: 'W',
      GPSLongitude: [ 0, 58.42, 0 ],
      GPSAltitudeRef: 0,
      GPSAltitude: 0,
      GPSTimeStamp: [ 10, 18, 2777 ],
      GPSImgDirectionRef: 'T',
      GPSImgDirection: 116.82364729458918 }

The GPS data is stored as an array of numbers (e.g. GPSLatitude: [ 10, 25, 22.682 ]), representing degrees, minutes, and seconds. We need to convert those to decimal numbers.

If GPSLatitudeRef is South (latitude between 0 and -90), the sign flips. Same for GPSLongitudeRef when it’s West (longitude between 0 and -180).

A quick visual:

        N (+)
    W (-)    E (+)
        S  (-)

That means we need a conversion function to turn degrees, minutes, and seconds into decimals:

var extractAndConvertGPSData = function extractAndConvertGPSData(location) {
  // only progress if the location is a valid data object
  if (typeof location === 'object') {
    // everything south of the equator has a negative latitude value
    if (location.latitudeReference === 'S') {
      location.latitude[0] = -location.latitude[0];
    }

    // everything west from the prime meridian has a negative longitude value
    if (location.longitudeReference === 'W') {
      location.longitude[0] = -location.longitude[0];
    }

    // the object that will hold the new, decimal lat/long pair
    var decimalLocation = {};
    var absoluteDegreeLatitude = Math.abs(
      Math.round(location.latitude[0] * 1000000)
    );
    var absoluteMinuteLatitude = Math.abs(
      Math.round(location.latitude[1] * 1000000)
    );
    var absoluteSecondLatitude = Math.abs(
      Math.round(location.latitude[2] * 1000000)
    );

    var absoluteDegreeLongitude = Math.abs(
      Math.round(location.longitude[0] * 1000000)
    );
    var absoluteMinuteLongitude = Math.abs(
      Math.round(location.longitude[1] * 1000000)
    );
    var absoluteSecondLongitude = Math.abs(
      Math.round(location.longitude[2] * 1000000)
    );

    var latitudeSign = location.latitude[0] < 0 ? -1 : 1;
    var longitudeSign = location.longitude[0] < 0 ? -1 : 1;

    decimalLocation.latitude =
      (Math.round(
        absoluteDegreeLatitude +
          absoluteMinuteLatitude / 60 +
          absoluteSecondLatitude / 3600
      ) *
        latitudeSign) /
      1000000;
    decimalLocation.longitude =
      (Math.round(
        absoluteDegreeLongitude +
          absoluteMinuteLongitude / 60 +
          absoluteSecondLongitude / 3600
      ) *
        longitudeSign) /
      1000000;
    return decimalLocation;
  }
};

The script kicks off by calling importProcess(). This function checks the arguments, makes sure the file or folder path exists. If it’s a directory, it iterates through all jpg/jpeg files, extracts GPS data, and inserts them into the database. For individual files, it validates the file first and then does the same:

var importProcess = function importProcess(callback) {
  // get the path as the first agrument
  var arg = process.argv[2];
  // make sure the argument exists (either file or folder)
  var exists = fs.existsSync(arg);
  // store the collection of files in an array
  var files = [];

  if (exists) {
    // check whether the path is a directory
    if (fs.statSync(arg).isDirectory()) {
      fs.readdirSync(arg).filter(function (file) {
        // only process files with jpg extension
        if (
          file.toLowerCase().substr(-4) === '.jpg' ||
          file.toLowerCase().substr(-5) === '.jpeg'
        ) {
          files.push(arg + '/' + file);
        }
      });

      // extract the GPS data out of the files
      getGPSInfo(files, function (data) {
        // insert data to database
        insertData(data, arg);
      });
    }

    // handle the scenario where the argument is a file
    else if (fs.statSync(arg).isFile()) {
      // extract GPS data out of one file
      getGPSInfo(arg, function (data) {
        // insert data to database
        insertData(data, arg);
      });
    }
  }

  // invalid or no argument provided
  else {
    arg = arg === undefined ? 'not supplied' : arg;
    console.log('The argument ' + arg + ' is not a valid path/file.');
    process.exit(1);
  }
};

I won’t walk through the getGPSInfo function in detail (it’s nearly 100 lines). In short, it uses ExifImage to extract the required info, runs it through extractAndConvertGPSData, and builds the data structure for insertion.

Let’s look at the insert process. The function uses MarkLogic’s db.documents.write() method:

var insertData = function insertData(data, path) {
  if (
    path.toLowerCase().substr(-4) === '.jpg' ||
    path.toLowerCase().substr(-5) === '.jpeg'
  ) {
    var file = path;
  } else {
    var file = path + '/' + data.filename;
  }

  db.documents
    .write({
      uri: '/image/' + data.filename + '.json',
      contentType: 'application/json',
      collections: ['image'],
      content: data,
    })
    .result(function (response) {
      console.log(
        'Successfully inserted JSON doc: ',
        response.documents[0].uri
      );
    });

  var ws = db.documents.createWriteStream({
    uri: '/binary/' + data.filename,
    contentType: 'image/jpeg',
    collections: ['binary'],
  });
  ws.result(function (response) {
    console.log('Successfully inserted JPEG doc: ' + response.documents[0].uri);
  });
  fs.createReadStream(file).pipe(ws);
};

A few things to notice here:

  • The URI can be built up dynamically
  • The db.documents.createWriteStream() method uses node streams, which work like unix pipes: read data from a source, pipe it to a destination. This makes the insert much faster
  • The contentType key tells MarkLogic whether to handle a document as binary or JSON
  • The collections key tags documents so queries can efficiently target subsets. In other words, I can now retrieve all JSON documents by reading everything in the image collection

Here’s a gif showing the import process. The Chrome window shows the MarkLogic query console, a handy interface for viewing your database contents:

In the next article, we’ll look at how the data gets served from the database via Node.js and ExpressJS. Stay tuned!