MarkLogic and Node.js

Finally it's here - the latest release of the amazing NoSQL database MarkLogic is out. This latest release comes with a lot of great features and one of them is the Node.js client API which means that finally Node.js developers can connect to the database using an npm package.

For the purposes of this article we are assuming that you have already loaded documents into the database. I wrote an article on how to utilise MarkLogic's Server Side JavaScript to load documents into the database.

There are quite a few things that we can do via this npm package, at a very high level, amongst other things, it is possible to:

  • Manage documents (CRUD operations)
  • Execute search against the database

Any type of a document can be loaded into the database - JSON, binary documents (such as images or PDF files) along with RDF triples, text and XML files.

Let's have a look at a few examples. To install this package we need to execute a standard npm statement npm install marklogic and we are ready to use it inside our code:

var marklogic = require('marklogic');  
var connection = require('./connection').connection;  
var db  = marklogic.createDatabaseClient(connection);  

The first line exports the npm package itself, the second line specifies the connection details to the database. The content of connection.js is the following:

var connection = {  
  host: 'localhost', //MarkLogic hostname
  port: 5040, //MarkLogic REST application port number
  user: 'tamas', //MarkLogic username
  password: 'mypass123' //MarkLogic password
}

module.exports.connection = connection;  
To read more on how to setup a REST application please read this article.

In order to get a database connection we need to call the .createDatabaseClient method and pass in our connection detials.

Once this is done, we can now try to read a document from the database. Documents inside the MarkLogic database can be referred to via their URI (think along the lines of a primary key in a RDBMS for accessing a particular record).

Everytime when we'd like to work with documents in the database (create, read, update or delete them) we have to use methods that are in the db.documents namespace. In light of this, the following statement will return us a document from the database:

db.documents.read('/country/italy.json').results()
The .read method returns a promise so in order to see the actual data we need to call .then first:
db.documents.read('/country/italy.json').result().then(function(doc) {  
  console.log(doc);
});

As you can see from the screenshot above the database returns a lot more information other than the actual content of the document. In order to see the document's content only we can modify the console.log statement and only retrieve the content key from the returned object - console.log(doc[0].content);.

The db.documents.read function also accepts either a comma separated list of URIs or an array of URIs. In this case the promise returns multiple documents and it is therefore evident that we need to iterate through the dataset using .forEach:

var uris = ['/country/italy.json', '/country/hungary.json', '/country/colombia.json'];  
db.documents.read(uris).result().then(function(documents) {  
  documents.forEach(function(document) {
    console.log(document.content.capital); 
  });
})

In order to retrieve all the documents from the database we need to use the query() method instead of .read(). However, running the code below will result in only 10 documents returned:

var marklogic  = require('marklogic');  
var connection = require('./connection').connection;  
var db         = marklogic.createDatabaseClient(connection);  
var qb         = marklogic.queryBuilder;

db.documents.query(  
  qb.where()
)
.result()
.then(function(documents) {
  documents.forEach(function(document) {
    console.log(document.content.name.common);
  });
  console.log('Total documents: ' + documents.length);
})
.catch(function(error) {
  console.log(error);
});

This is the expected behaviour. The API limits the number of documents that are being returned by the database and the default value is 10. This limit is in place to avoid accidentally selecting all documents from a database which can of course contain a few hundred thousand documents. We can modify our code and add the .slice() method which specifies the start index and page length of the result slice retrieved for the qualified documents. If the length is unspecified, it defaults to 10. If the index is unspecified, it defaults to 1.

Update the qb.where() section and change it to the following: qb.where().slice(1, 20) - re run your script and you will see that the first 20 documents are returned from the database.

Did you notice how these documents seem to be in a random order? To change this, first, we need to create a string type Range Index on the id element. This can be achieved using the MarkLogic Admin Interface - select the appropriate database first, find the 'Range Element index' option in the menu and add a String type Range Index. (More on Range Indexes in MarkLogic)

Creating this range index with a string datatype means that the data is now going to be sorted alphabetically. We also need to tell our query to make use of this index. Update the code and add the qb.sort('id') method: qb.where().orderBy(qb.sort('id')).slice(1, 20) - the result should immediately be obvious, the script now returns the first 20 countries in an alphabetical order.

As a next step we'll see how you can ask the system how many documents it's returning to you dynamically. Let's update our .slice(1, 20) method to read withOptions({categories: 'none'}) with some additional changes:
var marklogic  = require('marklogic');  
var connection = require('./connection').connection;  
var db         = marklogic.createDatabaseClient(connection);  
var qb         = marklogic.queryBuilder;

db.documents.query(  
  qb.where().orderBy(qb.sort('id')).withOptions({categories: 'none'})
)
.result()
.then(function(documents) {
  console.log(documents);
})
.catch(function(error) {
  console.log(error);
});

Notice how now running the script no longer returns the documents' content but instead some calculated values about the documents. The information that we're going to re-use are the total, start and page-length properties.

We now have enough knowledge to put together a very basic Node.js/Express application that selects all the documents from the MarkLogic database, uses pagination to select 10 documents per page and when clicking on a particular country a datasheet is shown displaying some key features of the country itself. I'm not going to spend time explaining how to setup a Node.js/Express app instead I'm going to jump straight in and talk about the Express router code.
router.route('/:page?').get(routes.index);  
router.route('/country/:country').get(routes.country);  

We have two routes defined - the first route takes an optional page parameter. The second route accepts a country named parameter. Let's have a look at the underlying functions:

var marklogic  = require('marklogic');  
var connection = require('./connection').connection;  
var db         = marklogic.createDatabaseClient(connection);  
var qb         = marklogic.queryBuilder;

var getPaginationData = function() {  
  return db.documents.query(
    qb.where().orderBy(qb.sort('id')).withOptions({categories: 'none'})
  ).result();
}
var getDocuments = function(from) {  
  return db.documents.query(
    qb.where().orderBy(qb.sort('id')).slice(from)
  ).result();
}

var getCountryInfo = function(uri) {  
  return db.documents.read(uri).result();
}

The getPaginationData() function will return all the information about the pagination, getDocuments() returns documents from the database and it accepts a from parameter which will determine the start page for the results and finally the getCountryInfo document returns a whole document from the database based on the URI.

Let's see how the index route handler looks like:

var index = function(req, res) {  
  var counter      = 0;
  var countryNames = [];
  var pageData     = {};
  var page = 1;
  if (req.params.page) {
    page = parseInt(req.params.page);
  }
  getPaginationData().then(function(data) {
    var totalDocuments  = data.total;
    var perPage         = data['page-length'];
    var totalPages      = totalDocuments / perPage;
    pageData.totalPages = totalPages;
    getDocuments(perPage * page - 9).then(function(documents) {
      documents.forEach(function(document) {
        counter++;
        countryNames.push(document.content.id);
        if (counter === documents.length) {
          pageData.result = countryNames;
          res.render('index', {data: pageData});
        }
      });
    }).catch(function(error) {
      console.log('Error', error);
    });
  }).catch(function(error) {
    console.log('Error', error);
  });
};

The main thing here to note is how we are building up the pageData object. This object is sent to the rendering engine (Jade) and will allow us access the object's content from our frontend.

The country route is a lot simpler - we have to make sure though that the right URI is passed in to the getCountryInfo function:

var country = function(req, res) {  
  var country = req.params.country;
  var referer = req.headers.referer;
  var uri     = '/country/' + country.toLowerCase().replace(/\s/g, '') + '.json';
  getCountryInfo(uri).then(function(countryInfo) {
    countryInfo[0].content.referer = referer;
    res.render('country', {data: countryInfo[0].content});
  });
};

Inside the frontend we can iterate through the pageData object using Jade's each ... in iterator:

each country in data.result  
    p
      a(href='/country/' + country) #{country}
  - var n = 1;
  nav
    ul.pagination
      while n <= data.totalPages
        li
          a(href='/'+ n)= n++

Here's an animated GIF to see the final application in action:

The codebase for this app is in GitHub - I do encourage you to have a look at the code as well as to give the MarkLogic Node.js client API a go.

Show Comments