Converting your data from MySQL to MongoDB
Older Article
This article was published 13 years ago. Some information may be outdated or no longer applicable.
As I mentioned in my previous post, I’m working with a friend on a project that was originally built for LAMP and is now being ported to the MEAN stack.
The very first job was converting data from MySQL to MongoDB. That’s what this post covers.
Note that some of the code snippets in this article use Italian variable names. We’re working on renaming those to English for better readability.
When converting data, keep in mind there’s a massive difference in how you normalise data between MySQL and MongoDB. Tables disappear. Collections take their place. These collections can hold nested data (data that used to live in separate tables). Instead of writing JOIN statements, you’ll use JavaScript array/object operations.
At a high level, these are the tables that make up the existing database:
- Services: Holds information about Public Transportation services (buses, trams, underground, etc) such as Line number, Last stop, type of transport etc
- Schedule: lists the departure schedules for the services by each stop
- Stops: holds all the possible stops and stores various information, such as the GPS position of a stop
- Route: Stores the route of a particular transportation service, with ordinal stop numbers (1st to Nth)
Various joins are needed to extract the right information. For example, if I want to display departure times for a certain service, I need to join the Services and Schedule tables together.
Here’s what a “simple” query looks like right now to get a list of public transport vehicles passing through a given stop:
SELECT
DISTINCT
Linea as 'Line',
Capolinea as 'Direction',
Tipo as 'Vehicle'
FROM
percorsi as routes,
linee as lines
WHERE
lines.Percorso = routes.Percorso AND
Palina = 73565
(For the super curious: bus stop 73565 is Via Collatina, Rome.)
The query looks short, but it takes a reasonable time to execute. Once it finishes, the dataset still needs massaging into a JSON object.
So how does the same information sit in MongoDB? Instead of four tables, two collections exist:
Stops collection
{
"stopID": "ID",
"name": "name",
"GPS": {
"type": "Point",
"coordinates": ["lat", "long"]
}
}
Note The GPS field has a 2dsphere index instead of the 2d index. (This means it only runs on MongoDB version 2.4 or newer.)
Line collection
{
"Route": "routeID",
"Line": "line",
"Description": "Description",
"Vehicle": "vehicle",
"Stops": ["1st", "Nth"],
"Departures": ["date", "date", "date"]
}
The whole structure is cleaner and faster. So the query shown above now becomes:
var lts = db.Lines.find({ $and: [{ Stops: 73565 }] }); //Lines through stop
for (i = 0; i < lts.length(); i++) {
var direction = db.Stops.find(
{ stopID: lts[i].Stops[lts[i].Stops.length - 1] },
{ name: 1 }
)[0].name;
print({ Line: lts[i].Line, Vehicle: lts[i].Vehicle, Direction: direction });
}
It might look dense at first glance, but it’s actually a fairly simple way of pulling out the same information.
Now, the actual conversion. There’s no direct path from MySQL to MongoDB, but a few techniques work well.
If your database is simple and you only have individual tables to import (meaning no JOINs between them), you can export your data from MySQL to a CSV file and then import that CSV with mongoimport into MongoDB.
To export from MySQL to CSV, use a query like this:
SELECT * FROM books INTO OUTFILE '/tmp/books.csv'
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
If your MySQL server sits on a remote machine and you want to copy the books.csv file onto your server, have a look at automating such processes using scp and expect.
Once you have your CSV file, run this command (modify as needed):
mongoimport --db users --collection books --type csv --file /tmp/books.csv
This works nicely for single tables or sets of tables with no relationship between them. But relational databases usually have, well, relations. My advice for converting those:
Write a script in your preferred language (PHP, Python, Perl, etc.) that queries these tables using JOIN statements and returns the dataset in JSON format. Save that to disc and import it later with mongoimport. It’s hard to give a universal example since datasets vary wildly, but here’s the conceptual idea.
Say I have a Books table with book titles, page numbers, and an Author ID. I also have an Author table with a name, author ID, and publisher ID. And a Publisher table with a Publisher ID and name. The relationship: one Publisher can have multiple Authors. One Author belongs to one Publisher. One Author can have multiple books. Our MongoDB could look like this:
Book collection
{
"title": title,
"pages": pagenumbers,
"author": authorID
}
Author collection
{
"name": author-name,
"ID": author-id
}
Publisher collection
{
"name": publisher-name,
"authors": [{author-id1, author-id2}]
}
You could probably stuff all this information into one big collection called ‘Books’ instead of creating three separate ones:
{
"title": title,
"pages": pagenumbers,
"author": [{
"id": authod-id,
"name": author-name
}],
"publisher": [
"name": publisher-name
]
}
But for this article we’re assuming three collections.
The script’s job is to pull the right data in the right format from MySQL and save it. Or, better yet, the same script could generate a JSON file and immediately call mongoimport. (If you take this approach, I’d recommend a few test runs to make sure your JSON structure is correct before importing into MongoDB.) Here’s an example structure:
$my_file = 'data.json';
$con=mysqli_connect("localhost","root","rootpwd","database");
if (mysqli_connect_errno()) {
echo "Failed to connect to MySQL: " . mysqli_connect_error();
}
$result = mysqli_query($con,"SELECT * FROM Books b JOIN Authors a ON b.author_id = a.id");
$rows = mysqli_fetch_array($result);
print json_encode($rows);
mysqli_close($con);
$handle = fopen($my_file, 'w') or die('Cannot open file: '.$my_file);
fwrite($handle, $rows);
The example above is crude and should only give you a rough idea of how to approach this. You’ll probably need to massage your JSON data further.
To sum up: when converting your database, make sure you do proper de-normalisation. You can likely store two or three tables’ worth of information in one collection as an array or nested object. If you need automated conversions, you’ll have to write the code yourself and format the CSV or JSON file to fit your needs.
In the next post I’ll write about MongoDB’s geospatial features, which are brilliant. Thanks for reading.