MongoDB Schema Design: Data as Documents
This is a write-up of Kyle Banker's talk at the 2010 MongoUK conference in London. It was very useful; I only wish I'd been able to note more of it down.
Introduction
Kyle's advice is to get to know the database by playing with it. Head over to try.mongodb.org and have a play with it. Become comfortable just playing around with data.
It's important to think about the design goals of MongoDB. Mongo is something between a key-value store and a relational database. Key-value stores don't know anything about the content of their values; they're very simple. As a result of the simplicity they scale extremely well.
On the other hand, a relational database has a rich understanding of the structure of your data. Mongo sits somewhere between the two; it scales really well, but still understands something about your data.
The basic unit of data in MongoDB is the document. There are various ways we can access the data in a document. For example, map-reduce allows us to access our documents in ways that you might not immediately expect from the structure of the document.
Data modelling
To model data in Mongo we need to think in terms of "rich documents" (to explain the concept of a rich document Kyle showed a slide with a document containing several embedded documents). We can write queries that search for documents as a function of the contents of their embedded documents.
We can update rich documents atomically:
db.orders.update({'_id': order_id},
{'$push': {'line_items':
{'sku': 'wm-123',
'price': 1299,
'title': 'Wes Mongtgomery; Smokin\''}}});
You can use the positional operator to set a new quanity, in a single operation:
db.orders.update(
{'_id': order_id, 'line_items.sku': 'wm-123'},
{'$set': {'line_items.$.quantity': 2}});
Map-reduce allows you to write something similar to a SQL view; you define a map that pulls some fields out of a document. Then you can define a reduce function that knows how to operate on those fields, storing whatever it sees fit in another collection. You can then run queries on that collection.
Rich documents
You can think of a rich document (i.e. a document with embedded documents within it) as being "pre-joined" data structures.
Arrays are very useful in MongoDB. The database knows how to index an array so that you can query it for documents whose arrays contain a specific value. Given a bunch of documents with this data structure and index:
link = {
title: "MongoDB Schema Design",
url: "http://cookbook.mongodb.org",
tags: ['mongodb', 'schemas', 'denormalization']
}
db.links.save(link);
db.links.ensureIndex({tags: 1});
...you can query for documents that have specific tags:
db.links.find({tags: 'schemas'});
Embedding vs referencing
When should you embed data within another document, and when should you store the potentially embedded data in a separate collection?
Embedded documents are good for fast queries. The embedded documents are also always available with the parent documents, without the need to run further queries to retrieve them.
Embedded and nested documents are good for storing complex hierarchies. Again, the document appears with the parent. They are, however, harder to query as you have to specify multiple levels when querying on inner documents.
Normalised documents are a perfectably acceptable way to use MongoDB. Think about the needs of your app and the tradeoffs that you'll need to make when writing your queries. Normalised documents provide maximum flexibility.
E-commerce
Kyle gave lots of examples demonstrating Mongo's suitability for use in e-commerce applications. If you're interested you should take a look at the slides (you'll find the link below). There were lots of code examples that I didn't manage to note down during his talk.
Somebody asked about MongoDB's suitability for managing financial transactions, given the lack of multi-object transactions. Kyle has a blog post on handling E-commerce Inventory with Mongo on his web site. John Nunemaker has written a follow up post that is also worth a read.
Summary
A very interesting talk from an excellent presenter. The slides are online. Kyle is @hwaet on Twitter.