MongoDB vs MySQL (aka MEAN vs LAMP) for a Big Data project

I posted a lot of articles in the past about the MEAN stack (MongoDB, ExpressJS, AngularJS, Node.js) discussing various projects and works that I have been developing. I come from a traditional LAMP (Linux, Apache, MySQL, PHP) background and have always worked with these technologies so it was an interesting change to work with only JavaScript. The reason why I mention this is because I started collaboration on a project that is originally built with LAMP and I have managed to convince the stakeholders to use the MEAN stack instead, and there are multiple benefits to this which I will try to explain in this article concentrating on mostly the 'M' part, that is, MongoDB vs MySQL.

The project in question is heavily utilising geolocation services, geo lookups and this means that it's executing frequent, complex database queries - queries that include joins and sorts, and this is, unfortunately the drawback of the project. The amount of data being stored in the various underlying database tables could potentially mean that this project is actually a 'big data' project. There are about 280k entries in a table that is being joined with two other tables each having about 10k rows.

Just a quick note here - if you ever want to count the number of rows in your MySQL database and break it down by table-by-table, you can do that by executing the following SQL query:

SELECT table_name, TABLE_ROWS FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'yourDB';

Also note that the above will work well for MyISAM tables but not for innoDB, as per the documentation: "For InnoDB tables, the row count is only a rough estimate used in SQL optimization."

In mongoDB you would have to run the following small script to get a similar count, first to list all collections from the database:

var MongoClient = require('mongodb').MongoClient;

MongoClient.connect('mongodb://localhost:27017/yourDB, function(err, db) {
    if(err) throw err;
    db.collectionNames({namesOnly: true}, function(err, coll) {
      if(err) throw err;
      console.log(coll)
  });
});

And then iterate through coll and call the count command.

Why would someone want to use MongoDB over MySQL at the first place - or in other words, what differences are there between the two database systems?

The no 's' - no schema, no SQL
In MongoDB you don't define schemas nor you run SQL queries. Any document that you insert into a collection can have different keys, and you can end up having inconsistent JSON documents. This is great if you have an application that you are extending or changing rapidly - adding a new field no longer means that you have to alter the table and run UPDATE commands.

Also, there are no SQL queries per se, all the data retrieval happens via standard CRUD calls (Create, Read, Update, Delete -Insert, Find, Update, Remove as they are called in MongoDB).

Document storage
MongoDB originally started to be a document management system and later it evolved to be a database system, henceforth every record is a JSON document within a collection (collections are tables in MySQL).

Geo-spatial indexes
In MongoDB indexes can be applied to pretty much everything, there's absolutely no limitation, furthermore 2D and 3D spatial indexes can be added as well.

The project in question is going to use latitude and location points (2D spatial indexing) and will find nearby points using MongoDB's built in $near feature:

db.places.find( { loc : { $near :
                           { $geometry :
                               { type : "Point" ,
                                 coordinates: [ 40 , 5 ] } },
                             $maxDistance : 100
                } } )

I can also add a limit to this call which will only return a subset of the JSON documents.

The project itself is ambitious and has a great potential to being used by hundreds of thousand people from mobile devices. I will keep on blogging about the progress and will get you insights about how it works - hopefully at the end I will be able to share a great tutorial as well on geospatial data used in MongoDB with more code examples.

I am also hoping to present the project, it's pitfalls and advantages at a conference in the near future.

Show Comments