Open Source Contributions – the easy way

If you still believe that you are not able to make contributions to Free and Open Source Software, you are mistaken. Granted, not everyone can write tons of code, but you can even contribute in small ways, like writing docs, HOWTO’s or blog posts. Turns out that you can even contribute via Twitter as well these days, as the following image depicts!

json-stat

MongoDB basics for everyone – Part 6 – modifiers and operators

In order to demonstrate the selector and modifier behaviour in MongoDB, we will insert a slightly more complex document to experiment on.

db.testcollection.insert({"name":"Rapheal", "artist": true, "ninjaturtle": true})
db.testcollection.insert({"name":"Michelangelo", "artist": true, "ninjaturtle": true})
db.testcollection.insert({"name":"Donatello", "artist": true, "ninjaturtle": true})
db.testcollection.insert({"name":"Leonardo", "artist": true, "ninjaturtle": true})
db.testcollection.insert({"name":"Picasso", "artist": true, "ninjaturtle": false})
db.testcollection.insert({"name":"Monet", "artist": true, "ninjaturtle": false})

You can pass in certain parameters to both find() and findOne() in order to select specific documents in the store. Let’s make an example of this by only selecting the documents that are about Ninja Turtles from above.

If you were to do a full find() on this collection, you should get the following back:

> db.testcollection.find()
{ "_id" : ObjectId("51ef62b7305e05be29bf242c"), "name" : "Rapheal", "artist" : true, "ninjaturtle" : true }
{ "_id" : ObjectId("51ef62b7305e05be29bf242d"), "name" : "Michelangelo", "artist" : true, "ninjaturtle" : true }
{ "_id" : ObjectId("51ef62b7305e05be29bf242e"), "name" : "Donatello", "artist" : true, "ninjaturtle" : true }
{ "_id" : ObjectId("51ef62b7305e05be29bf242f"), "name" : "Leonardo", "artist" : true, "ninjaturtle" : true }
{ "_id" : ObjectId("51ef62b7305e05be29bf2430"), "name" : "Picasso", "artist" : true, "ninjaturtle" : false }
{ "_id" : ObjectId("51ef62b9305e05be29bf2431"), "name" : "Monet", "artist" : true, "ninjaturtle" : false }

However, our example will be to select back only the Ninja Turtles in the collection, so we pass in the “ninjaturtles” selector to the find command:

> db.testcollection.find({"ninjaturtle":true})
{ "_id" : ObjectId("51ef62b7305e05be29bf242c"), "name" : "Rapheal", "artist" : true, "ninjaturtle" : true }
{ "_id" : ObjectId("51ef62b7305e05be29bf242d"), "name" : "Michelangelo", "artist" : true, "ninjaturtle" : true }
{ "_id" : ObjectId("51ef62b7305e05be29bf242e"), "name" : "Donatello", "artist" : true, "ninjaturtle" : true }
{ "_id" : ObjectId("51ef62b7305e05be29bf242f"), "name" : "Leonardo", "artist" : true, "ninjaturtle" : true }

In this way, you can make your queries return fewer results and ensure that indexes are used correctly. This will also work on the “dynamic schema” that we spoke about earlier, so, if for example, you wanted to add “sculptor” to the Michelangelo field (yes, I know about the others, this is not an art holy war, it is a Mongo example…) I could then get back all of the sculptors, who are also ninjaturtles.

First we will introduce the $set command modifier to update the “Michelangelo” document:

db.testcollection.update({"name": "Michelangelo"}, {$set:{"sculptor":true}})

As you can see, we did not need to do any other schema changes, or have to add a NULL or something to all the other documents, Mongo allows us to carry on as we were:

db.testcollection.find()
{ "_id" : ObjectId("51ef62b7305e05be29bf242c"), "name" : "Rapheal", "artist" : true, "ninjaturtle" : true }
{ "_id" : ObjectId("51ef62b7305e05be29bf242e"), "name" : "Donatello", "artist" : true, "ninjaturtle" : true }
{ "_id" : ObjectId("51ef62b7305e05be29bf242f"), "name" : "Leonardo", "artist" : true, "ninjaturtle" : true }
{ "_id" : ObjectId("51ef62b7305e05be29bf2430"), "name" : "Picasso", "artist" : true, "ninjaturtle" : false }
{ "_id" : ObjectId("51ef62b9305e05be29bf2431"), "name" : "Monet", "artist" : true, "ninjaturtle" : false }
{ "_id" : ObjectId("51ef62b7305e05be29bf242d"), "artist" : true, "name" : "Michelangelo", "ninjaturtle" : true, "sculptor" : true }

You will now notice that the Michelangelo document has an additional field. We can then select (with modifiers) all the artists that are Ninja Turtles as well as sculptors:

db.testcollection.find({"ninjaturtle": true, "sculptor":true})

Which will gve us:

{ "_id" : ObjectId("51ef62b7305e05be29bf242d"), "artist" : true, "name" : "Michelangelo", "ninjaturtle" : true, "sculptor" : true }

The other documents are ignored due to failing the criteria for selection, and of course, you see no errors due to schema mismatches! Great!

Let me just quickly now go through a few of the other modifiers that are commonly used in MongoDB.

$inc – increment a field integer value by one. e.g.
Let’s take the following document as our example:

db.monkeys.insert({"name":"vervet", "population":2})
db.collection.update({field:value}, {$inc:{field1: amount}})

So in orde to update our “monkeys” collection, we can simply increment the population field:

db.monkeys.update({"name":"vervet"}, { $inc: {"population":1}})

which will result in

{ "_id" : ObjectId("51ef691c305e05be29bf2432"), "name" : "vervet", "population" : 3 }

The only other operators that I would like to outline here are $set and $unset. $set sets an item in much the same way that the SET keyword does in SQL, while $unset does the opposite.

As an example:

We suddeny have a new family of vervet monkeys moving into our collection, so instead of incrementing the population with arbitrary numbers, we simply $set the population number

db.monkeys.update({"name":"vervet"}, { $set: {"population":7}})
db.monkeys.find()
{ "_id" : ObjectId("51ef691c305e05be29bf2432"), "name" : "vervet", "population" : 7 }

When $unset is used, the syntax is very similar:
We will take the example that we have discovered that not only vervet monkeys live in the area, so we must unset the name field:

db.monkeys.update({"name":"vervet"}, { $unset: {"name":""}})
db.monkeys.find()
{ "_id" : ObjectId("51ef691c305e05be29bf2432"), "population" : 7 }

Please keep in mind that these are very simple examples to demonstrate the principals behind the documents!

MongoDB basics for everyone – Part 5 Using find() and findOne()

You would have noticed from the previous examples that the find() command returns a document from our tutorial collection with a rather strange looking _id field.
The _id field is always there and should be a unique identifier for the particular record in the collection. You have the option of overriding the _id field with an id of your choice, but please do keep in mind that it is required to be unique.

The find() command in MongoDB will always return a cursor. You can assign the cursor to a variable in Javascript to demonstrate this right in the mongo shell!

Take the following as an example:

first, let’s add another document to our collection:

db.testcollection.insert({“test”: 2})

now, we will assign find() to a Javascript variable:

var mycursor = db.testcollection.find()

as you can see, there is no output, but we can check that the cursor has indeed found something with

mycursor.hasNext()

which should return true. Now let us iterate through the documents found and display them as we go along

mycursor.next()

will display the next document that the cursor has found, so we can manually have a look at all the documents by using hasNext() and next() in succession. Once we reach the end of the found documents in our cursor,

hasNext()

will simply display false and we can stop the iteration. A good homework assignment here would be to write a quick Javascript function that will iterate through all of the found documents and display them as find() does.

(Spoiler alert) Answer:

for( var c = db.testcollection.find(); c.hasNext(); ) {
print( c.next());
}

or use built in

forEach()

In later versions of MongoDB find() will automatically paginate the resultset and allow you to use the it command within the shell to iterate over the documents found.

The

findOne()

command is very similar to

find()

, except that it will not return a cursor, but it will only return a single document. In general, it will always return the first document that satisfies all of the selection criteria and sorting criteria, which we will explore a little more in the next post.

MongoDB Basics for everyone – Part 4 Intro to the Mongo Shell

The commandline interface to MongoDB is a modified (and modifiable) JavaScript shell. This means that you can interact with the database using pretty simple Javascript. This also means that it is really quite simple to define your own javascript functions to do certain things if needs be.

You certainly don’t need to be a Javascript expert to work within the Mongo Shell, but some rudimentary knowledge will go a long way!

We will have a look at some of the more common commands used in the Mongo shell here, but I do encourage you to also have a look at the excellent docs at http://www.mongodb.org as well for further information.

We will start by opening up a terminal or command prompt and entering the Mongo shell with

mongo

This will give you a prompt showing the current version of Mongo shell installed and an information line saying that it has connected to a “test” database.

To display all the available databases, we use the command

show dbs

and it will list all available databases. In order to use a specific database, you will use

use <dbname>

. For now let us stay in the “test” database for our purposes now.

At any time whilst in the Mongo shell, you can use the built in help command. In order to see the available help use

db.help()

. This will display a list of available commands that you can use to interact with your database. As an aside, please note that if you omit the parentheses on the db commands, the shell will print out the function declaration, and not execute it! E.g. if we execute

db.stats

instead of

db.stats()

you will see:

> db.stats
function (scale) {
return this.runCommand({dbstats:1, scale:scale});
}

If we then go and execute the function as a function, we get:

> db.stats()
{
"db" : "test",
"collections" : 4,
"objects" : 18,
"avgObjSize" : 64,
"dataSize" : 1152,
"storageSize" : 24576,
"numExtents" : 4,
"indexes" : 3,
"indexSize" : 24528,
"fileSize" : 201326592,
"nsSizeMB" : 16,
"ok" : 1
}

Your stats will be different from mine, depending on how you have used Mongo up till now!

In order to check your server status, you can use

db.serverStatus()

which will print out a long JSON document all about the currently running process. It will give host info, as well as current read and write lock information on each of your databases. If you would only like to see more information about your host, you can use the

db.hostInfo()

command. Always take note of the

system.cpuAddrSize

field, as a 32bit architecture is a lot more limiting than a full 64bit system! You should always ensure that your mongo server is running on 64bit architecture to gain all the goodness that goes with Mongo!

For the purposes of this introduction, we will only look at one other shell command, and that is the command to drop a database. This command is obviously very useful when testing and exploring, so that you don’t end up polluting your Mongo server with a bunch of trash databases that are not useful to anyone!

To drop a database, you need to create one first. Create a database with the “use” keyword i.e.

use monkeys

which will output:

> use monkeys
switched to db monkeys

If you then insert something into a collection (types) in the monkeys db, you will see that it has been created and is working.

db.types.insert({"name":"Vervet monkey"})
db.types.findOne()
{ "_id" : ObjectId("51de563f2ceec8ed658e7221"), "name" : "Vervet monkey" }

Great! Now lets drop the monkeys db…

db.dropDatabase()
{ "dropped" : "monkeys", "ok" : 1 }

One last note! If you ever need to see which database you are currently working in, simply type the

db

command (without parentheses) and the shell will print out the database name.

I will do a further, more in depth exploration of the Mongo Shell later on, but that should keep you going for the time being!

MongoDB basics for everyone – Part 3 Database Commands

In a relational database, we would normally do something along the lines of CREATE DATABASE `somedb` and then USE that database. MongoDB is slightly different in this regard as MongoDB does not use a traditional “schema”, but more of a “dynamic schema”. We will get to more on this a little later, but do keep it in mind.

If you haven’t already, connect to the Mongo shell with mongo in a terminal window or command prompt. We will be working with a database called tutorial that will contain a number of collections. In order to connect to the database, enter the use tutorial command. You will notice that we did not need to create a database explicitly and we are now connected to the tutorial database. You can confirm this by simply entering db at the mongo shell prompt, which will return the current database.

Enter the command show collections into the shell now and see what it returns. There should be a single collection called system.indexes available to you. This is a system collection that stores the indexes for this database, so you need not worry too much about it for the purposes of this book.

Remember that dynamic schema? We are now going to exploit that again to create a new collection and insert a document to that collection in one smooth operation.

Now would probably be a good time to introduce the “query language” that MongoDB uses, but for now, we will simply note that all queries need to be structured in valid JSON (actually BSON, but we will cover that later too).

We will now quickly insert a document to a new collection called testcollection with a single key “test”, with the value of 1.

db.testcollection.insert({“test”:1})

Simple as that! Lets have a quick look at our collections again and you should see that we now have an additional collection. Great news! One last thing that we need to do is to check that our JSON document is indeed in the database, which we can determine with

db.testcollection.find()

which should return a single record. In the next chapter, we will look closer at the find() and findOne() commands for MongoDB. A last command that I would like to introduce quickly is the show dbs command. Show dbs will always print out a list of the databases wherever you are, much like the SHOW DATABASES; command in MySQL.

MongoDB basics for everyone – Part 2 Installation

1.1 Ubuntu Gnu/linux

First off, there is a version of MongoDB available in the regular Ubuntu repositories, but this version is somewhat out of date and we will be using some features that require at least version 2.x of MongoDB.

With this in mind, we will install MongoDB from the 10gen maintained repositories for Ubuntu. Most of you will be using a relatively new version of Ubuntu, which will support upstart, so we will concentrate on that.

In order to avoid GPG key errors when updating and working with software sources, we need to import the 10gen public GPG key. In a terminal window, type:

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10

This may fail if your internet connection is bad, but keep on trying until it succeeds. You will not be able to take advantage of automatic updates and bug fixes through apt if you do not import the key. You will, however, still be able to install MongoDB from the apt repositories.

Edit the file

/etc/apt/sources.list.d/10gen.list

and add the line

deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen

to it.

You should now do a

sudo apt-get update

to refresh your apt sources and download the headers.

Afterwards, you can simply install the packages with

sudo apt-get install mongodb-10gen

The service should start automatically, but if not, you can control it using Upstart in the following manner

sudo service mongodb start
sudo service mongodb stop
sudo service mongodb restart

If everything has gone well, you should be able to start the mongo shell with the mongo command.

If there is an issue in starting the service, you may need to manually create the

/data/db

directory which Mongo defaults to using. You can also specify the directory by passing
the

--dbpath

directive to point to another directory, or configuring the dbpath in

/etc/mongodb

1.2 Mac OSX

MongoDB on Mac OSX can be installed in one of two ways, using the package management tools MacPorts or Homebrew.

Using Homebrew is probably easier, as you simply need to open up a system terminal, and type in the following commands

brew update

to update your package manager,

brew install mongodb

and then, at a later stage to upgrade MongoDB

brew update
brew upgrade mongodb

Installing using MacPorts is also relatively simple, but due to the fact that the code will need to be compiled on your system, it may take some time. To start the build using MacPorts issue the command

port install mongodb

Bear in mind that neither MacPorts or Homebrew come with any of the control scripts, but if your PATH is configured correctly they will be in your system path. You will need to start the mongod process manually and connect to it via mongo.

You may also choose to build MongoDB from the 10gen sources, in which case, you simply need to download, extract and fire up the mongod process, and you are ready to go!

1.3 Windows

Installing MongoDB on Windows is as simple as any of the other platforms. Please ensure that you download the latest stable version for your platform (64bit or 32bit)

MongoDB is self-contained and does not have any other system dependencies. You can run MongoDB from any folder you choose. You may install MongoDB in any directory (e.g. D:\database\mongodb)

Start up a command prompt by selecting the Start button, All Programs, Accessories and then Command Prompt. As MongoDB requires a data directory, please create one using

md data
md data\db

You may then start the mongod process specifying the data directory.

C:\mongodb\bin\mongod.exe

If you have created your db directory in an alternative location, please use the –dbpath parameter to specify where it is

C:\mongodb\bin\mongod.exe --dbpath "d:\test\mongoDBdata"

Do not allow mongod.exe to be accessible to public networks without running in “Secure Mode” MongoDB is designed to be run in “trusted environments” and the database does not enable authentication or “Secure Mode” by default.

MongoDB can also be set up as a Windows service. First, configure the system with

md C:\mongodb\log

Then create a configuration file for the logpath

echo logpath=C:\mongodb\log\mongo.log > C:\mongodb\mongod.cfg

Then, install and run the mongoDB service

C:\mongodb\bin\mongod.exe --config C:\mongodb\mongod.cfg --install
net start MongoDB

If you have followed all of the instructions on your particular platform, ensure that you can start the MongoDB shell with the mongo command. This will connect the shell to your running local server. If your server was installed on a different server, you may connect to it by passing the IP address or fully qualified domain name of the MongoDB server into the mongo shell command. e.g.

mongo 192.168.1.101:27017

Where 27017 is the standard port number that MongoDB process runs on.

You can always check which version of the server that you have connected to by entering

db.version()

into the shell and hopefully you will see the version number of the server that you have just intstalled.

Take note that the shell is a modified Javascript shell. That means that some global commands will always be valid, like help and exit. The global database identifier is the db keyword, and can be used to get additional help on the database that you are currently using. A good example of this would be to get the status on a database with db.status() which will return some useful information about the database.

MongoDB basics for everyone! Part 1

Introduction

As most of you will be coming from a relational database world, some of the NoSQL concepts that we use will be somewhat foreign to you. It is the aim of this simple (short) book to dispell some of the mysteries around MongoDB and make your life just a little easier with the power of NoSQL and MongoDB!

First off, let’s define a couple of concepts that willl be used throughout this series:

NoSQL – Not Only SQL. It does NOT mean No SQL. This is a horrible misnomer and you should forget that you ever heard it. Not Only SQL means that these types of data stores can do more than simple SQL as will be demonstrated in later chapters.

MongoDB is a document based database. This means that you will work with the concept of a document as an analogy to a “row” and collections as analogous to “tables”. The database therefore is made up of collections, that contain documents that contain fields.

Type conventions – all commands and things in general that you could conceivably copy and paste into either the Mongo shell or a terminal/command prompt window will be in this font (Monotype).

Mahout MongoDBDataModel setup in Spring-3.2.2

Reposted from old site – original date: Monday 3 June 2013

Mahout (http://mahout.apache.org) is a set of machine learning libraries licensed under the Apache Software License for doing a bunch of fun stuff with machine learning and recommendations (making use of the Mahout Taste libraries). This opens up a lot of very nice possibilities for data mining of various data sets within your application(s)

The setup of Mahout to work with spring-data-mongodb is a little tricky sometimes (especially when you are using Spring-3.2.2 or later) as the Mahout MongoDB driver is somewhat outdated. Luckily this is pretty easily remedied with a little bit of extra tweaking.

I will assume that your spring-data-mongodb is up and running in your project already, as the crux of this post is the integration bit…

That being said, you will need to import the Mahout libraries into your pom.xml file, starting with mahout-core and at least mahout-integration. You will also need uncommons-math and some other libs, depending on what you would like to achieve anyway.

The tricky part comes in on the dependency heirarchy of your POM file. If you take a look at the dependency heirarchy for the Mongo driver, you will notice that Mahout will depend on a slighly (OK, very) outdated version of the MongoDB driver. This needs to be removed, so that the [b]only[/b] MongoDB driver is the one that you get with the latest version of Spring-data-mongodb. This should be around 2.20 or so IIRC.

It is probably also a good idea to grab the latest mahout-integration code from github as well, to make sure that you are all up to date.

Once that is done, you should be able to follow any of the (admittedly sparse) Mahout tutorials on using the MongoDBDataModel. One huge caveat is that if you do not follow the schema defined by Mahout in your collection that you would like Mahout to connect to, you will need to override the default constructor with your configs.

example:

MongoDBDataModel dbm = new MongoDBDataModel(
                    String mongoHost,
                    int mongoPort,
                    String mongoDBName,
                    String collection,
                    boolean manage,
                    boolean finalRemove,
                    dateFormat,
                    String useridentifier,
                    String itemIdentifier,
                    String preferenceField,
                    String mappingCollection
);

Note that the data model construction will fail horribly without a specified mappingCollection. This is a dynamically created collection that will hold mapping results, so it can be called just about anything, but [b]it must be declared![/b]

So, taking the above connection constructor, we can provide an example constructor that looks something like:

private MongoDBDataModel getModel() throws UnknownHostException,
            MongoException {
        MongoDBDataModel dbm = new MongoDBDataModel(
                "localhost",
                27017,
                mydb,
                mydatacollection,
                false, // leave this as false unless you want Mahout to manage i.e. delete your entries in your collection!
                false,
                dateFormat,
                "customerId",
                "item",
                "rating",
                "likemap"
        );
        return dbm;
    }

You can then go ahead and use your MongoDBDataModel in the rest of your code. The rest of the API is pretty simple to understand, so if you would like some more details on the various bits and pieces of Mahout, please leave a comment and I will attempt to find time to get it done too!

YouTube has not won anything

Reposted from old site – original date: Monday 6 May 2013

This is a response to Eric Schmidt’s claim that YouTube has won the battle against TV before it started.

See article at http://www.thedenverchannel.com/news/u-s-world/youtube-has-already-won-battle-with-television-executive-claims

Firstly, what battle? Secondly, YouTube has won nothing, especially in Africa. Where it counts by the way. Why does that count? I’ll give you a billion reasons.
Let us disambiguate some of the statements.

  1. YouTube cannot even be compared to linear TV, so that whole argument is marketing troll, or stupidity. We’ll call it trolling for now.
  2. What I [b]think[/b] Schmidt was thinking when he uttered this nonsense, is that the new way of doing TV is Video On Demand (VOD)
  3. YouTube as a service provider, with paid for VOD, [b]could[/b] be a player, sometime in the near future.
  4. YouTube has a very long way to go before they are a serious contender in the space.
  5. Anyone paying USD for TV services in Africa is going to have a bad time.

Let me explain a bit more. People are used to using YouTube for 10 minute clips. There will have to be a major mindset change in user interaction to change that behaviour.
Most folks view YouTube clips on PC or small screen devices and have pretty much no idea how to consume YouTube video in any other way. People in Africa, still are familiar with opening browser tabs, loading 3 videos overnight (buffering) and then playing them hours or even a day later. This may not be the case everywhere, of course, but I have seen this recently in my travels. People that consume YouTube occasionally via HTPC systems like XBMC are few and far between, certainly not the majority, and certainly want to see that in 1080i/p at least resolution, requiring a faster than 4Mb/s ADSL line (very expensve).

As an ADSL subscriber with a 4Mb/s line, I can tell you that (with discounts), it is more expensive than a Premium DStv package (includes Transactional VOD service).

On the other hand, we have TV. Let’s forget linear TV, because that is last season, for sure. People want the VOD style, “I want to watch this now”. What needs to happen is that we focus on getting cheaper devices into every home and make VOD services available to more people. This is currently happening (check DStv’s website for new packages) and with my own daily work, there are many more very good things in the pipeline…

If YouTube think that this is an easy task, they have a lot of surprises coming.

MongoDB Index creation trick that will save you time and frustration in production

Reposted from old site – original date: Sunday 25 November 2012

Right, so you have a huge dataset in MongoDB in production and your replicaSet is purring away serving all your data. What fun, when suddenly, someone decides that we need to add another index to the database. Quickly, you think, we have 2 options:

1. Lock up the database and do a fast foreground index create and hope we don’t lose too much money on failed/queued transactions OR
2. Do a really slow background index on the data and hope for the best with respect to new documents coming in.

Either of these scenarios is not ideal in any mans language, so there is, of course, a trick…

Remember, you are running a replicaSet? Yeah, so what does a replicaSet do? Replicates, yeah! OK, so we can use that to our advantage!

Take a single node [b]out[/b] of the replicaSet and then build a nice foreground index on it. Then insert the node back into the replicaSet and let it replicate across all the nodes. Simple, yet effective. No chewing up all your resources, waiting for hours or locking any databases!

WIN!