Category Archives: MongoDB Basics

How to truncate multiple MongoDB collections in a Database

MongoDB has a drop() command that you can use to delete everything in a certain collection, but this also, unfortunately, will drop your indexes and other things too. What I wanted was a way that I could “truncate” the collection (borrowing from MySQL) and retain the indexes etc too.

The following snippet will do that, plus it has a built in “Oops, I changed my mind” safety check in case you need to cancel the collection truncate command.

var dbName = 'myDB';
db.getSiblingDB(dbName).getCollectionNames().forEach(function(collName) {
    // Drop all collections except system ones (indexes/profile)
    if (!collName.startsWith("system.")) {
        // Safety net
        print("WARNING: going to drop ["+dbName+"."+collName+"] in 5s .. hit Ctrl-C if you've changed your mind!");
        sleep(5000);
        db[collName].drop();
    }
})

This would be best saved as a UDF in your mongo shell, and probably made to take a parameter for the db too…

MongoDB for everyone – Part 8 – dbrefs and denormalization

Most of the MongoDB drivers (as well as the database engine itself) supports a concept called “dbrefs”. Basically, what a dbref is, is a reference to another document in a different collection. A dbref will have two pieces of information attached to it, an ID of the document that it references, as well as the collection that it is in.

Many ORM’s and abstraction layers that you may use within your own projects will create dbrefs by default, some may be configured not to. If at all possible, I would suggest avoiding dbrefs as much as possible, and rather use denormalization techniques to make use of larger documents in your collections. Remember, that you are now working with documents and not relationships, so that in bred fear of data duplication should be relaxed somewhat!

That being said, however, you should model your data to your application usage. Take, for instance a blog, with comments. You could have a collection for users, comments, and posts with many dbrefs, or you could stick the entire post into a single document (comments included!) That may be a bit extreme though, so consider using adding a user name field for any commenters inside of your blog document, as opposed to just a user ID. That way, and without adding very much data to your blog document, you avoid a lookup on a dbref and make your application faster! Win!

Thinking denormalised may take some getting used to, but it works well with document centric data stores. Just make sure you don’t hit those 16MB document size limits!

 

 

MongoDB basics for everyone – Part 7 – Arrays and embedded documents

One of the big things that you will notice when you start working with MongoDB, coming from a relational world, is the lack of joins. Joins are fun and all, but as we all probably know, not very scalable. This means that we usually end up doing “joins” in our applications anyway. MongoDB has no support for joins, although you do have access to so called “dbrefs”, which we usually do not make use of due to their [lack of] speed.

We can create a relation to another document of course, which also makes use of indexes and all the good things that we need in our apps, but this usually requires two

find()

commands to be executed. As an example, we could make a rudimentary “likes” collection:

use likes
db.things.insert({"_id":"thing1", "thing":"ice-cream"});
db.things.insert({"_id":"thing2", "thing":"cookies"});
db.things.insert({"_id":"thing3", "thing":"pumpkin"});

Let us assume we also have a people (users) collection:

db.users.insert({"_id":"user1", "name": "Paul"});
db.users.insert({"_id":"user2", "name": "John"});
db.users.insert({"_id":"user3", "name": "Fred"});

So the problem here is to relate the things that the users like to the users in some way. There are a number of ways to achieve this, without using joins!

Example 1: Using arrays:

We will simply create an array of the things that the users like within the user document. We can either do that as a String of the thing that the user likes, or as an array of the document ID of the thing (which will make maintenance a bit easier, as you only have to change the fields in one place)

db.users.update({"_id":"user1"}, { $set: {"likes":["thing1", "thing2"]}});

You will notice that I have used the OjectID’s of the “things” collection.

OR

db.users.update({"_id":"user2"}, { $set: {"likes":["ice-cream", "cookies"]}});

In the above example, I have simply set an array of things that I like. Please note that this is not as maintainable as the first example, due to the fact that if I update “ice-cream” to “icecream”, for example, I will be forced to to an atomic update across the entire “users” collection, which may take some time if I have a few million users.

A third way of approaching the problem is to embed another document within my document. Remember that MongoDB documents are limited to a size of 16MB, so this may not be the best option for you, but in less data intensive collections (i.e. without images, video, or other GridFS types), it should do just fine!

db.users.update({"_id":"user3"}, { $set: {"likes":{"thing1":"ice-cream", "thing2":"cookies"}}});

Which will give me an embedded JSON document within my “user” document about user “Paul”.

Do a quick

db.users.find().pretty()

to view your handy work!

We would now like to query our shiny new collection to find all the users that like “cookies”

The methods to do so are as below (we are also introducing the $in operator for working with arrays):

db.users.find({"likes":{$in:["thing2"]}})

which will return

{ "_id" : "user1", "likes" : [ "thing1", "thing2" ], "name" : "Paul" }

Next, we work with the array of things that were named (user2):

db.users.find({"likes":{$in:["cookies"]}})

which returns

{ "_id" : "user2", "likes" : [ "ice-cream", "cookies" ], "name" : "John" }

Finally, the embedded document query:

db.users.find({"likes.thing2":"cookies"})

which will return

{ "_id" : "user3", "likes" : { "thing1" : "ice-cream", "thing2" : "cookies" }, "name" : "Fred" }

The key here is that although we have the power of dynamic schema within MongoDB, you still want to think about schema design and plan with that 16MB document size limit in mind!

MongoDB basics for everyone – Part 6 – modifiers and operators

In order to demonstrate the selector and modifier behaviour in MongoDB, we will insert a slightly more complex document to experiment on.

db.testcollection.insert({"name":"Rapheal", "artist": true, "ninjaturtle": true})
db.testcollection.insert({"name":"Michelangelo", "artist": true, "ninjaturtle": true})
db.testcollection.insert({"name":"Donatello", "artist": true, "ninjaturtle": true})
db.testcollection.insert({"name":"Leonardo", "artist": true, "ninjaturtle": true})
db.testcollection.insert({"name":"Picasso", "artist": true, "ninjaturtle": false})
db.testcollection.insert({"name":"Monet", "artist": true, "ninjaturtle": false})

You can pass in certain parameters to both find() and findOne() in order to select specific documents in the store. Let’s make an example of this by only selecting the documents that are about Ninja Turtles from above.

If you were to do a full find() on this collection, you should get the following back:

> db.testcollection.find()
{ "_id" : ObjectId("51ef62b7305e05be29bf242c"), "name" : "Rapheal", "artist" : true, "ninjaturtle" : true }
{ "_id" : ObjectId("51ef62b7305e05be29bf242d"), "name" : "Michelangelo", "artist" : true, "ninjaturtle" : true }
{ "_id" : ObjectId("51ef62b7305e05be29bf242e"), "name" : "Donatello", "artist" : true, "ninjaturtle" : true }
{ "_id" : ObjectId("51ef62b7305e05be29bf242f"), "name" : "Leonardo", "artist" : true, "ninjaturtle" : true }
{ "_id" : ObjectId("51ef62b7305e05be29bf2430"), "name" : "Picasso", "artist" : true, "ninjaturtle" : false }
{ "_id" : ObjectId("51ef62b9305e05be29bf2431"), "name" : "Monet", "artist" : true, "ninjaturtle" : false }

However, our example will be to select back only the Ninja Turtles in the collection, so we pass in the “ninjaturtles” selector to the find command:

> db.testcollection.find({"ninjaturtle":true})
{ "_id" : ObjectId("51ef62b7305e05be29bf242c"), "name" : "Rapheal", "artist" : true, "ninjaturtle" : true }
{ "_id" : ObjectId("51ef62b7305e05be29bf242d"), "name" : "Michelangelo", "artist" : true, "ninjaturtle" : true }
{ "_id" : ObjectId("51ef62b7305e05be29bf242e"), "name" : "Donatello", "artist" : true, "ninjaturtle" : true }
{ "_id" : ObjectId("51ef62b7305e05be29bf242f"), "name" : "Leonardo", "artist" : true, "ninjaturtle" : true }

In this way, you can make your queries return fewer results and ensure that indexes are used correctly. This will also work on the “dynamic schema” that we spoke about earlier, so, if for example, you wanted to add “sculptor” to the Michelangelo field (yes, I know about the others, this is not an art holy war, it is a Mongo example…) I could then get back all of the sculptors, who are also ninjaturtles.

First we will introduce the $set command modifier to update the “Michelangelo” document:

db.testcollection.update({"name": "Michelangelo"}, {$set:{"sculptor":true}})

As you can see, we did not need to do any other schema changes, or have to add a NULL or something to all the other documents, Mongo allows us to carry on as we were:

db.testcollection.find()
{ "_id" : ObjectId("51ef62b7305e05be29bf242c"), "name" : "Rapheal", "artist" : true, "ninjaturtle" : true }
{ "_id" : ObjectId("51ef62b7305e05be29bf242e"), "name" : "Donatello", "artist" : true, "ninjaturtle" : true }
{ "_id" : ObjectId("51ef62b7305e05be29bf242f"), "name" : "Leonardo", "artist" : true, "ninjaturtle" : true }
{ "_id" : ObjectId("51ef62b7305e05be29bf2430"), "name" : "Picasso", "artist" : true, "ninjaturtle" : false }
{ "_id" : ObjectId("51ef62b9305e05be29bf2431"), "name" : "Monet", "artist" : true, "ninjaturtle" : false }
{ "_id" : ObjectId("51ef62b7305e05be29bf242d"), "artist" : true, "name" : "Michelangelo", "ninjaturtle" : true, "sculptor" : true }

You will now notice that the Michelangelo document has an additional field. We can then select (with modifiers) all the artists that are Ninja Turtles as well as sculptors:

db.testcollection.find({"ninjaturtle": true, "sculptor":true})

Which will gve us:

{ "_id" : ObjectId("51ef62b7305e05be29bf242d"), "artist" : true, "name" : "Michelangelo", "ninjaturtle" : true, "sculptor" : true }

The other documents are ignored due to failing the criteria for selection, and of course, you see no errors due to schema mismatches! Great!

Let me just quickly now go through a few of the other modifiers that are commonly used in MongoDB.

$inc – increment a field integer value by one. e.g.
Let’s take the following document as our example:

db.monkeys.insert({"name":"vervet", "population":2})
db.collection.update({field:value}, {$inc:{field1: amount}})

So in orde to update our “monkeys” collection, we can simply increment the population field:

db.monkeys.update({"name":"vervet"}, { $inc: {"population":1}})

which will result in

{ "_id" : ObjectId("51ef691c305e05be29bf2432"), "name" : "vervet", "population" : 3 }

The only other operators that I would like to outline here are $set and $unset. $set sets an item in much the same way that the SET keyword does in SQL, while $unset does the opposite.

As an example:

We suddeny have a new family of vervet monkeys moving into our collection, so instead of incrementing the population with arbitrary numbers, we simply $set the population number

db.monkeys.update({"name":"vervet"}, { $set: {"population":7}})
db.monkeys.find()
{ "_id" : ObjectId("51ef691c305e05be29bf2432"), "name" : "vervet", "population" : 7 }

When $unset is used, the syntax is very similar:
We will take the example that we have discovered that not only vervet monkeys live in the area, so we must unset the name field:

db.monkeys.update({"name":"vervet"}, { $unset: {"name":""}})
db.monkeys.find()
{ "_id" : ObjectId("51ef691c305e05be29bf2432"), "population" : 7 }

Please keep in mind that these are very simple examples to demonstrate the principals behind the documents!

MongoDB basics for everyone – Part 5 Using find() and findOne()

You would have noticed from the previous examples that the find() command returns a document from our tutorial collection with a rather strange looking _id field.
The _id field is always there and should be a unique identifier for the particular record in the collection. You have the option of overriding the _id field with an id of your choice, but please do keep in mind that it is required to be unique.

The find() command in MongoDB will always return a cursor. You can assign the cursor to a variable in Javascript to demonstrate this right in the mongo shell!

Take the following as an example:

first, let’s add another document to our collection:

db.testcollection.insert({“test”: 2})

now, we will assign find() to a Javascript variable:

var mycursor = db.testcollection.find()

as you can see, there is no output, but we can check that the cursor has indeed found something with

mycursor.hasNext()

which should return true. Now let us iterate through the documents found and display them as we go along

mycursor.next()

will display the next document that the cursor has found, so we can manually have a look at all the documents by using hasNext() and next() in succession. Once we reach the end of the found documents in our cursor,

hasNext()

will simply display false and we can stop the iteration. A good homework assignment here would be to write a quick Javascript function that will iterate through all of the found documents and display them as find() does.

(Spoiler alert) Answer:

for( var c = db.testcollection.find(); c.hasNext(); ) {
print( c.next());
}

or use built in

forEach()

In later versions of MongoDB find() will automatically paginate the resultset and allow you to use the it command within the shell to iterate over the documents found.

The

findOne()

command is very similar to

find()

, except that it will not return a cursor, but it will only return a single document. In general, it will always return the first document that satisfies all of the selection criteria and sorting criteria, which we will explore a little more in the next post.

MongoDB Basics for everyone – Part 4 Intro to the Mongo Shell

The commandline interface to MongoDB is a modified (and modifiable) JavaScript shell. This means that you can interact with the database using pretty simple Javascript. This also means that it is really quite simple to define your own javascript functions to do certain things if needs be.

You certainly don’t need to be a Javascript expert to work within the Mongo Shell, but some rudimentary knowledge will go a long way!

We will have a look at some of the more common commands used in the Mongo shell here, but I do encourage you to also have a look at the excellent docs at http://www.mongodb.org as well for further information.

We will start by opening up a terminal or command prompt and entering the Mongo shell with

mongo

This will give you a prompt showing the current version of Mongo shell installed and an information line saying that it has connected to a “test” database.

To display all the available databases, we use the command

show dbs

and it will list all available databases. In order to use a specific database, you will use

use <dbname>

. For now let us stay in the “test” database for our purposes now.

At any time whilst in the Mongo shell, you can use the built in help command. In order to see the available help use

db.help()

. This will display a list of available commands that you can use to interact with your database. As an aside, please note that if you omit the parentheses on the db commands, the shell will print out the function declaration, and not execute it! E.g. if we execute

db.stats

instead of

db.stats()

you will see:

> db.stats
function (scale) {
return this.runCommand({dbstats:1, scale:scale});
}

If we then go and execute the function as a function, we get:

> db.stats()
{
"db" : "test",
"collections" : 4,
"objects" : 18,
"avgObjSize" : 64,
"dataSize" : 1152,
"storageSize" : 24576,
"numExtents" : 4,
"indexes" : 3,
"indexSize" : 24528,
"fileSize" : 201326592,
"nsSizeMB" : 16,
"ok" : 1
}

Your stats will be different from mine, depending on how you have used Mongo up till now!

In order to check your server status, you can use

db.serverStatus()

which will print out a long JSON document all about the currently running process. It will give host info, as well as current read and write lock information on each of your databases. If you would only like to see more information about your host, you can use the

db.hostInfo()

command. Always take note of the

system.cpuAddrSize

field, as a 32bit architecture is a lot more limiting than a full 64bit system! You should always ensure that your mongo server is running on 64bit architecture to gain all the goodness that goes with Mongo!

For the purposes of this introduction, we will only look at one other shell command, and that is the command to drop a database. This command is obviously very useful when testing and exploring, so that you don’t end up polluting your Mongo server with a bunch of trash databases that are not useful to anyone!

To drop a database, you need to create one first. Create a database with the “use” keyword i.e.

use monkeys

which will output:

> use monkeys
switched to db monkeys

If you then insert something into a collection (types) in the monkeys db, you will see that it has been created and is working.

db.types.insert({"name":"Vervet monkey"})
db.types.findOne()
{ "_id" : ObjectId("51de563f2ceec8ed658e7221"), "name" : "Vervet monkey" }

Great! Now lets drop the monkeys db…

db.dropDatabase()
{ "dropped" : "monkeys", "ok" : 1 }

One last note! If you ever need to see which database you are currently working in, simply type the

db

command (without parentheses) and the shell will print out the database name.

I will do a further, more in depth exploration of the Mongo Shell later on, but that should keep you going for the time being!

MongoDB basics for everyone – Part 3 Database Commands

In a relational database, we would normally do something along the lines of CREATE DATABASE `somedb` and then USE that database. MongoDB is slightly different in this regard as MongoDB does not use a traditional “schema”, but more of a “dynamic schema”. We will get to more on this a little later, but do keep it in mind.

If you haven’t already, connect to the Mongo shell with mongo in a terminal window or command prompt. We will be working with a database called tutorial that will contain a number of collections. In order to connect to the database, enter the use tutorial command. You will notice that we did not need to create a database explicitly and we are now connected to the tutorial database. You can confirm this by simply entering db at the mongo shell prompt, which will return the current database.

Enter the command

show collections

into the shell now and see what it returns. There should be a single collection called system.indexes available to you. This is a system collection that stores the indexes for this database, so you need not worry too much about it for the purposes of this book.

Remember that dynamic schema? We are now going to exploit that again to create a new collection and insert a document to that collection in one smooth operation.

Now would probably be a good time to introduce the “query language” that MongoDB uses, but for now, we will simply note that all queries need to be structured in valid JSON (actually BSON, but we will cover that later too).

We will now quickly insert a document to a new collection called testcollection with a single key “test”, with the value of 1.

db.testcollection.insert({“test”:1})

Simple as that! Lets have a quick look at our collections again and you should see that we now have an additional collection. Great news! One last thing that we need to do is to check that our JSON document is indeed in the database, which we can determine with

db.testcollection.find()

which should return a single record. In the next chapter, we will look closer at the find() and findOne() commands for MongoDB. A last command that I would like to introduce quickly is the show dbs command. Show dbs will always print out a list of the databases wherever you are, much like the SHOW DATABASES; command in MySQL.

MongoDB basics for everyone – Part 2 Installation

1.1 Ubuntu Gnu/linux

First off, there is a version of MongoDB available in the regular Ubuntu repositories, but this version is somewhat out of date and we will be using some features that require at least version 2.x of MongoDB.

With this in mind, we will install MongoDB from the 10gen maintained repositories for Ubuntu. Most of you will be using a relatively new version of Ubuntu, which will support upstart, so we will concentrate on that.

In order to avoid GPG key errors when updating and working with software sources, we need to import the 10gen public GPG key. In a terminal window, type:

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10

This may fail if your internet connection is bad, but keep on trying until it succeeds. You will not be able to take advantage of automatic updates and bug fixes through apt if you do not import the key. You will, however, still be able to install MongoDB from the apt repositories.

Edit the file

/etc/apt/sources.list.d/10gen.list

and add the line

deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen

to it.

You should now do a

sudo apt-get update

to refresh your apt sources and download the headers.

Afterwards, you can simply install the packages with

sudo apt-get install mongodb-10gen

The service should start automatically, but if not, you can control it using Upstart in the following manner

sudo service mongodb start
sudo service mongodb stop
sudo service mongodb restart

If everything has gone well, you should be able to start the mongo shell with the mongo command.

If there is an issue in starting the service, you may need to manually create the

/data/db

directory which Mongo defaults to using. You can also specify the directory by passing
the

--dbpath

directive to point to another directory, or configuring the dbpath in

/etc/mongodb

1.2 Mac OSX

MongoDB on Mac OSX can be installed in one of two ways, using the package management tools MacPorts or Homebrew.

Using Homebrew is probably easier, as you simply need to open up a system terminal, and type in the following commands

brew update

to update your package manager,

brew install mongodb

and then, at a later stage to upgrade MongoDB

brew update
brew upgrade mongodb

Installing using MacPorts is also relatively simple, but due to the fact that the code will need to be compiled on your system, it may take some time. To start the build using MacPorts issue the command

port install mongodb

Bear in mind that neither MacPorts or Homebrew come with any of the control scripts, but if your PATH is configured correctly they will be in your system path. You will need to start the mongod process manually and connect to it via mongo.

You may also choose to build MongoDB from the 10gen sources, in which case, you simply need to download, extract and fire up the mongod process, and you are ready to go!

1.3 Windows

Installing MongoDB on Windows is as simple as any of the other platforms. Please ensure that you download the latest stable version for your platform (64bit or 32bit)

MongoDB is self-contained and does not have any other system dependencies. You can run MongoDB from any folder you choose. You may install MongoDB in any directory (e.g. D:\database\mongodb)

Start up a command prompt by selecting the Start button, All Programs, Accessories and then Command Prompt. As MongoDB requires a data directory, please create one using

md data
md data\db

You may then start the mongod process specifying the data directory.

C:\mongodb\bin\mongod.exe

If you have created your db directory in an alternative location, please use the –dbpath parameter to specify where it is

C:\mongodb\bin\mongod.exe --dbpath "d:\test\mongoDBdata"

Do not allow mongod.exe to be accessible to public networks without running in “Secure Mode” MongoDB is designed to be run in “trusted environments” and the database does not enable authentication or “Secure Mode” by default.

MongoDB can also be set up as a Windows service. First, configure the system with

md C:\mongodb\log

Then create a configuration file for the logpath

echo logpath=C:\mongodb\log\mongo.log > C:\mongodb\mongod.cfg

Then, install and run the mongoDB service

C:\mongodb\bin\mongod.exe --config C:\mongodb\mongod.cfg --install
net start MongoDB

If you have followed all of the instructions on your particular platform, ensure that you can start the MongoDB shell with the mongo command. This will connect the shell to your running local server. If your server was installed on a different server, you may connect to it by passing the IP address or fully qualified domain name of the MongoDB server into the mongo shell command. e.g.

mongo 192.168.1.101:27017

Where 27017 is the standard port number that MongoDB process runs on.

You can always check which version of the server that you have connected to by entering

db.version()

into the shell and hopefully you will see the version number of the server that you have just intstalled.

Take note that the shell is a modified Javascript shell. That means that some global commands will always be valid, like help and exit. The global database identifier is the db keyword, and can be used to get additional help on the database that you are currently using. A good example of this would be to get the status on a database with db.status() which will return some useful information about the database.

MongoDB basics for everyone! Part 1

Introduction

As most of you will be coming from a relational database world, some of the NoSQL concepts that we use will be somewhat foreign to you. It is the aim of this simple (short) book to dispell some of the mysteries around MongoDB and make your life just a little easier with the power of NoSQL and MongoDB!

First off, let’s define a couple of concepts that willl be used throughout this series:

NoSQL – Not Only SQL. It does NOT mean No SQL. This is a horrible misnomer and you should forget that you ever heard it. Not Only SQL means that these types of data stores can do more than simple SQL as will be demonstrated in later chapters.

MongoDB is a document based database. This means that you will work with the concept of a document as an analogy to a “row” and collections as analogous to “tables”. The database therefore is made up of collections, that contain documents that contain fields.

Type conventions – all commands and things in general that you could conceivably copy and paste into either the Mongo shell or a terminal/command prompt window will be in this font (Monotype).