Skip to main content

MongoDB vs MySQL (aka MEAN vs LAMP) for a Big Data project

3 min read

Older Article

This article was published 13 years ago. Some information may be outdated or no longer applicable.

I’ve written plenty about the MEAN stack (MongoDB, ExpressJS, AngularJS, Node.js) across various projects. I come from a traditional LAMP (Linux, Apache, MySQL, PHP) background and spent years building with those tools. Moving to pure JavaScript was a genuine shift. I mention this because I recently joined a project originally built on LAMP and convinced the stakeholders to switch to MEAN. There are real benefits to this, and I’ll walk through them here, focusing mostly on the “M” part: MongoDB vs MySQL.

The project leans heavily on geolocation services and geo lookups. That means frequent, complex database queries with joins and sorts, and that’s where the pain sits. The volume of data across the underlying tables pushes this into “big data” territory. One table holds about 280k entries, joined against two other tables each carrying roughly 10k rows.

Quick tip: if you want to count rows in your MySQL database broken down table by table, run the following SQL query:

SELECT table_name, TABLE_ROWS FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'yourDB';

Note that this works well for MyISAM tables but not for InnoDB. Per the documentation: “For InnoDB tables, the row count is only a rough estimate used in SQL optimisation.”

In MongoDB you’d run a small script to get a similar count. First, list all collections from the database:

var MongoClient = require('mongodb').MongoClient;

MongoClient.connect('mongodb://localhost:27017/yourDB, function(err, db) {
    if(err) throw err;
    db.collectionNames({namesOnly: true}, function(err, coll) {
      if(err) throw err;
      console.log(coll)
  });
});

Then iterate through coll and call the count command.

So why pick MongoDB over MySQL? What actually separates them?

The no ‘s’ - no schema, no SQL

MongoDB doesn’t enforce schemas. You don’t write SQL queries. Any document you insert into a collection can carry different keys, and you can end up with inconsistent JSON documents. That’s a feature, not a bug, when your application changes rapidly. Adding a new field no longer means altering the table and running UPDATE commands.

There are no SQL queries as such. All data retrieval happens through standard CRUD calls (Create, Read, Update, Delete, or Insert, Find, Update, Remove in MongoDB’s terminology).

Document storage

MongoDB started life as a document management system before evolving into a full database. Every record is a JSON document living inside a collection (collections are the equivalent of tables in MySQL).

Geo-spatial indexes

In MongoDB, indexes can be applied to practically anything. There’s no real limitation. 2D and 3D spatial indexes are available too.

This project will use latitude and longitude points (2D spatial indexing) and find nearby points using MongoDB’s built-in $near feature:

db.places.find({
  loc: {
    $near: { $geometry: { type: 'Point', coordinates: [40, 5] } },
    $maxDistance: 100,
  },
});

I can also add a limit to this call, which returns only a subset of the JSON documents.

The project itself is ambitious and could end up serving hundreds of thousands of people from mobile devices. I’ll keep posting about the progress and sharing how it works. With any luck, I’ll have a full tutorial on geospatial data in MongoDB with more code examples by the end.

I’m also hoping to present the project (its pitfalls and advantages) at a conference soon.