Skip to main content

How is MarkLogic different from MongoDB?

6 min read

Older Article

This article was published 10 years ago. Some information may be outdated or no longer applicable.

Every conference. Every meetup. Someone asks me: “How is MarkLogic different from MongoDB?” (Or some variation of it.) So I figured I’d put all the answers in one place. Here’s the showdown.

Comparing these two databases means looking at both the technical and the business angle. We’ll focus on the technical differences here, though technical choices often trigger business-level decisions.

ACID transactions

You’ve probably heard someone say “NoSQL means no ACID.” Not true. MarkLogic has full ACID support.

Quick refresher on what ACID means. It’s an acronym: Atomicity, Consistency, Isolation, Durability. These properties guarantee reliable database transactions. Atomicity means every part of a transaction must succeed, or the whole thing fails. Consistency guarantees the database moves from one valid state to another. Isolation means concurrent transactions don’t step on each other. Durability means committed changes persist, even if the system crashes.

There are two ways to implement ACID: locking or multiversioning. MarkLogic uses multiversion concurrency control (MVCC). That means you can still read a document while it’s being written to, and reading doesn’t block writing.

MongoDB, by contrast, uses what’s called “eventual consistency.” Say you’ve got a cluster of MongoDB servers. Clients can hit any of them. You update a document. With MongoDB’s default settings, there’s no guarantee that update is durable before it’s acknowledged, and no guarantee it’ll be replicated to a majority of servers before someone reads.

Even with the strictest settings (all reads from a single primary server using majority read concern, all writes using majority write concern), you can still end up with stale data. If the primary server records Update A but fails before replicating it to a majority of nodes, the client thinks the write failed. But Update A might’ve reached some secondaries. The client has no way to know whether their write actually landed.

Sharding & Clusters

How do these databases scale out? Ideally, you just add servers and everything keeps humming. MarkLogic does exactly that. You can mix cloud and physical server deployments and still maintain high availability.

MongoDB makes you provision all the hardware for a highly available cluster up front. You also need to choose a sharding key (a sharding key determines how data gets distributed across servers). Once you’ve picked that key, you’re stuck with it. Changing it requires a five-step process that starts with dumping all your data out of MongoDB. Sharding also breaks several critical features: point-in-time recovery, in-document isolation, certain secondary indexes and performance options. MongoDB’s own docs tell you to plan around these limitations.

MarkLogic customers get auto-scaling tools that keep performance steady whether you’re running dozens of 3-node AWS clusters or a single fifty-node physical cluster. Because MarkLogic can act as application server, database, and search engine simultaneously, topologies stay smaller and simpler.

Indexes

When you load a document into MarkLogic, the system automatically indexes word tokens and document structure. This index (the Universal Index) gives you search out of the box. You can add term list indexes for “yes or no” questions: do any documents contain the term ‘xyz’? You can enable wildcard search indexes too. For inequality queries (“show me documents where price < 25”), you’d set up range indexes against XML elements, attributes, or JSON properties. MarkLogic also supports geospatial indexes and triple indexes (yes, it’s a triplestore too, which matters if you work with semantics or RDF triples).

MongoDB? Single attribute indexing. Your queries can only use two indexes at a time. Anything more complex requires compound indexes, which are structures referencing multiple fields. The catch: compound indexes force a specific order, which restricts how you sort. Want to sort on two keys ascending and one descending? Build another index. Different sort order? Yet another index. Their own tech support recommends building compound indexes rather than querying across two indexes.

Collections

Document-based NoSQL databases need a way to organise documents. In relational databases, you’ve got tables. In NoSQL, you’ve got collections: category labels that group documents together.

MongoDB only lets you put a document in one collection. That’s a limitation. What if you want a document in two or more? Consider a recipe for a vegetarian Asian dish. You could drop it into a “recipes” collection. Wouldn’t it be useful to also have it in “vegetarian” and “asian” without touching the document itself? MarkLogic collections let you slice your data however you like. Query all recipes, or just the vegetarian ones, or just the Asian ones when your dinner guests have specific tastes.

Search has been part of MarkLogic’s core from the beginning. It wasn’t bolted on after the fact.

When you run a query, documents come back ranked by relevance. A complex algorithm determines how relevant each document is against the full document set. You’ve got multiple ways to tweak relevance scoring, giving you fine control over what the final result set looks like.

The search toolkit is broad: geospatial data, snippets, facets, highlighted results, type-ahead, multi-language support, stemming. All built in.

MongoDB’s query interface works like a standard database WHERE clause. No relevance ranking. Documents come back in document order, and you sort by property. For real search, you’ll need a third-party solution, which means a more complex architecture and more skills to hire for. They do offer text search via text indexes and $text, but it’s limited and the scalability is questionable.

Security

MarkLogic 8 received common criteria certification. That’s the most widely recognised security certification for IT products globally, accepted by 25 countries including the United States, Canada, India, Japan, Australia, and many EU nations. MarkLogic is one of only six DBMS vendors to hold this certification, and the only NoSQL company in that group. Security’s been baked into the product from day one.

At the start of this post, I mentioned that technical understanding can drive business decisions. If you’ve read through these points, you can see why an organisation would want a database with full ACID support, built-in search, and a strong security model.

If you’re interested, download the latest version of MarkLogic. It’s available for multiple operating systems and comes with a free developer licence.