Mahout MongoDBDataModel setup in Spring-3.2.2

Reposted from old site – original date: Monday 3 June 2013

Mahout (http://mahout.apache.org) is a set of machine learning libraries licensed under the Apache Software License for doing a bunch of fun stuff with machine learning and recommendations (making use of the Mahout Taste libraries). This opens up a lot of very nice possibilities for data mining of various data sets within your application(s)

The setup of Mahout to work with spring-data-mongodb is a little tricky sometimes (especially when you are using Spring-3.2.2 or later) as the Mahout MongoDB driver is somewhat outdated. Luckily this is pretty easily remedied with a little bit of extra tweaking.

I will assume that your spring-data-mongodb is up and running in your project already, as the crux of this post is the integration bit…

That being said, you will need to import the Mahout libraries into your pom.xml file, starting with mahout-core and at least mahout-integration. You will also need uncommons-math and some other libs, depending on what you would like to achieve anyway.

The tricky part comes in on the dependency heirarchy of your POM file. If you take a look at the dependency heirarchy for the Mongo driver, you will notice that Mahout will depend on a slighly (OK, very) outdated version of the MongoDB driver. This needs to be removed, so that the [b]only[/b] MongoDB driver is the one that you get with the latest version of Spring-data-mongodb. This should be around 2.20 or so IIRC.

It is probably also a good idea to grab the latest mahout-integration code from github as well, to make sure that you are all up to date.

Once that is done, you should be able to follow any of the (admittedly sparse) Mahout tutorials on using the MongoDBDataModel. One huge caveat is that if you do not follow the schema defined by Mahout in your collection that you would like Mahout to connect to, you will need to override the default constructor with your configs.

example:

MongoDBDataModel dbm = new MongoDBDataModel(
                    String mongoHost,
                    int mongoPort,
                    String mongoDBName,
                    String collection,
                    boolean manage,
                    boolean finalRemove,
                    dateFormat,
                    String useridentifier,
                    String itemIdentifier,
                    String preferenceField,
                    String mappingCollection
);

Note that the data model construction will fail horribly without a specified mappingCollection. This is a dynamically created collection that will hold mapping results, so it can be called just about anything, but [b]it must be declared![/b]

So, taking the above connection constructor, we can provide an example constructor that looks something like:

private MongoDBDataModel getModel() throws UnknownHostException,
            MongoException {
        MongoDBDataModel dbm = new MongoDBDataModel(
                "localhost",
                27017,
                mydb,
                mydatacollection,
                false, // leave this as false unless you want Mahout to manage i.e. delete your entries in your collection!
                false,
                dateFormat,
                "customerId",
                "item",
                "rating",
                "likemap"
        );
        return dbm;
    }

You can then go ahead and use your MongoDBDataModel in the rest of your code. The rest of the API is pretty simple to understand, so if you would like some more details on the various bits and pieces of Mahout, please leave a comment and I will attempt to find time to get it done too!

Liked this post? Follow this blog to get more. 

  • Antony Jackson

    Could you provide a sample application for mahout and mongodb integration using spring?

    • Paul Scott

      Sure thing. Will try and do one soon and post it here