Skip to main content

MarkLogic and Node.js

6 min read

Older Article

This article was published 11 years ago. Some information may be outdated or no longer applicable.

It’s finally here. The latest release of MarkLogic (the NoSQL database) is out. This release ships with a bunch of new features, and one of the biggest is the Node.js Client API. Node.js developers can now connect to the database through an npm package.

For this article, we’re assuming you’ve already loaded documents into the database.

There’s quite a lot we can do via this npm package. At a high level, it’s possible to:

  • Manage documents (CRUD operations)
  • Run searches against the database

Any type of document can be loaded: JSON, binary documents (images, PDFs), RDF triples, text and XML files.

Let’s look at a few examples. Install the package with npm install marklogic and we’re ready to go:

var marklogic = require('marklogic');
var connection = require('./connection').connection;
var db = marklogic.createDatabaseClient(connection);

The first line pulls in the npm package. The second line specifies the connection details. Here’s what connection.js looks like:

var connection = {
  host: 'localhost', //MarkLogic hostname
  port: 5040, //MarkLogic REST application port number
  user: 'tamas', //MarkLogic username
  password: 'mypass123', //MarkLogic password
};

module.exports.connection = connection;

To read more about setting up a REST application, refer to the MarkLogic documentation.

To get a database connection, call .createDatabaseClient and pass in the connection details.

Once that’s done, we can read a document from the database. Documents inside MarkLogic are referenced by their URI (think of it like a primary key in an RDBMS).

Every time we want to work with documents (create, read, update or delete), we use methods in the db.documents namespace. So this statement fetches a document:

db.documents.read('/country/italy.json').results();

The .read method returns a promise, so we need to call .then to see the actual data:

db.documents
  .read('/country/italy.json')
  .result()
  .then(function (doc) {
    console.log(doc);
  });

As you can see from the output, the database returns a lot more than just the document content. To see only the content, modify the console.log statement to grab the content key: console.log(doc[0].content).

The db.documents.read function also accepts a comma-separated list of URIs or an array. In that case, the promise returns multiple documents, so we iterate with .forEach:

var uris = [
  '/country/italy.json',
  '/country/hungary.json',
  '/country/colombia.json',
];
db.documents
  .read(uris)
  .result()
  .then(function (documents) {
    documents.forEach(function (document) {
      console.log(document.content.capital);
    });
  });

To retrieve all documents from the database, we use the query() method instead of .read(). But running the code below only returns 10 documents:

var marklogic = require('marklogic');
var connection = require('./connection').connection;
var db = marklogic.createDatabaseClient(connection);
var qb = marklogic.queryBuilder;

db.documents
  .query(qb.where())
  .result()
  .then(function (documents) {
    documents.forEach(function (document) {
      console.log(document.content.name.common);
    });
    console.log('Total documents: ' + documents.length);
  })
  .catch(function (error) {
    console.log(error);
  });

That’s expected. The API caps the number of returned documents at 10 by default. This prevents accidentally pulling every document from a database that might contain hundreds of thousands of records. We can tweak this by adding the .slice() method, which specifies the start index and page length. If the length isn’t specified, it defaults to 10. If the index isn’t specified, it defaults to 1.

Update the qb.where() section to qb.where().slice(1, 20) and re-run. You’ll see the first 20 documents returned.

Notice how the documents come back in a seemingly random order? To fix that, we first need to create a string-type Range Index on the id element. Do this through the MarkLogic Admin Interface: select the appropriate database, find ‘Range Element index’ in the menu, and add a String type Range Index. (More on Range Indexes in MarkLogic)

Creating this range index with a string datatype means the data will now be sorted alphabetically. We also need to tell our query to use the index. Update the code and add qb.sort('id'): qb.where().orderBy(qb.sort('id')).slice(1, 20). The result should be immediate: the script now returns the first 20 countries in alphabetical order.

Next, let’s see how to find out dynamically how many documents the system is returning. Update .slice(1, 20) to read withOptions({categories: 'none'}):

var marklogic = require('marklogic');
var connection = require('./connection').connection;
var db = marklogic.createDatabaseClient(connection);
var qb = marklogic.queryBuilder;

db.documents
  .query(qb.where().orderBy(qb.sort('id')).withOptions({ categories: 'none' }))
  .result()
  .then(function (documents) {
    console.log(documents);
  })
  .catch(function (error) {
    console.log(error);
  });

Now running the script no longer returns document content. Instead, it returns calculated values about the documents. The properties we care about are total, start and page-length.

We’ve now got enough pieces to build a basic Node.js/Express application that pulls all documents from the MarkLogic database, paginates 10 per page, and shows a country datasheet when you click on a country. I’ll skip the Express setup and jump straight into the router code.

router.route('/:page?').get(routes.index);
router.route('/country/:country').get(routes.country);

Two routes. The first takes an optional page parameter. The second accepts a country name. Here are the underlying functions:

var marklogic = require('marklogic');
var connection = require('./connection').connection;
var db = marklogic.createDatabaseClient(connection);
var qb = marklogic.queryBuilder;

var getPaginationData = function () {
  return db.documents
    .query(
      qb.where().orderBy(qb.sort('id')).withOptions({ categories: 'none' })
    )
    .result();
};
var getDocuments = function (from) {
  return db.documents
    .query(qb.where().orderBy(qb.sort('id')).slice(from))
    .result();
};

var getCountryInfo = function (uri) {
  return db.documents.read(uri).result();
};

getPaginationData() returns all the pagination information. getDocuments() returns documents from a starting position. getCountryInfo() returns a full document by URI.

Here’s the index route handler:

var index = function (req, res) {
  var counter = 0;
  var countryNames = [];
  var pageData = {};
  var page = 1;
  if (req.params.page) {
    page = parseInt(req.params.page);
  }
  getPaginationData()
    .then(function (data) {
      var totalDocuments = data.total;
      var perPage = data['page-length'];
      var totalPages = totalDocuments / perPage;
      pageData.totalPages = totalPages;
      getDocuments(perPage * page - 9)
        .then(function (documents) {
          documents.forEach(function (document) {
            counter++;
            countryNames.push(document.content.id);
            if (counter === documents.length) {
              pageData.result = countryNames;
              res.render('index', { data: pageData });
            }
          });
        })
        .catch(function (error) {
          console.log('Error', error);
        });
    })
    .catch(function (error) {
      console.log('Error', error);
    });
};

The key thing to notice is how we build up the pageData object. This object gets sent to the rendering engine (Jade) so we can access its content from the frontend.

The country route is simpler. We just need to make sure we pass the right URI to getCountryInfo:

var country = function (req, res) {
  var country = req.params.country;
  var referer = req.headers.referer;
  var uri = '/country/' + country.toLowerCase().replace(/\s/g, '') + '.json';
  getCountryInfo(uri).then(function (countryInfo) {
    countryInfo[0].content.referer = referer;
    res.render('country', { data: countryInfo[0].content });
  });
};

On the frontend, we iterate through the pageData object using Jade’s each ... in iterator:

each country in data.result
    p
      a(href='/country/' + country) #{country}
  - var n = 1;
  nav
    ul.pagination
      while n
The [codebase for this app is on GitHub](https://github.com/tpiros/world). Have a look at the code and give the MarkLogic Node.js Client API a spin.