Tag Archives: Java

How to start an Android app at boot time

I wanted to have my custom Android ROM boot up and start an application specific to my needs start immediately. This is the way to accomplish it:

In your AndroidManifest.xml document (application part):

<receiver android:enabled="true" android:name=".BootUpReceiver"
        android:permission="android.permission.RECEIVE_BOOT_COMPLETED">
        <intent-filter>
                <action android:name="android.intent.action.BOOT_COMPLETED" />
                <category android:name="android.intent.category.DEFAULT" />
        </intent-filter>
</receiver>

You also need to set up a permission with

<uses-permission android:name="android.permission.RECEIVE_BOOT_COMPLETED" />

and then create the BootUpReceiver class to handle it

public class BootUpReceiver extends BroadcastReceiver{
        @Override
        public void onReceive(Context context, Intent intent) {
                Intent i = new Intent(context, MyWhateverActivity.class);  
                i.addFlags(Intent.FLAG_ACTIVITY_NEW_TASK);
                context.startActivity(i);  
        }
}

Spring data Neo4j using Java based config

There are very few examples in the wild on using Spring’s Java config methods to configure and start a Neo4j Graph Database service to use in your application. This post will serve as a primer to get you started on your Neo4j application and hopefully save you some bootstrap time as well!

The first thing that we need to do is make sure that you have a running Neo4j server up and ready for action, as well as a new Spring project that you can start with.

In your project POM XML file, you need to add in a few dependencies to work with Neo4j. In this example, I have used Neo4j-1.9.5-community (Spring Data Neo4j for Neo4j-2.x was not available at the time of writing). I have used Spring Framework 3.2.3-RELEASE as the Spring version, and Sping-data-Neo4j-2.3.2-RELEASE.

<dependencies>
		<!-- Spring and Transactions -->
		<dependency>
			<groupId>org.springframework</groupId>
			<artifactId>spring-context</artifactId>
			<version>${spring-framework.version}</version>
		</dependency>
		<dependency>
			<groupId>org.springframework</groupId>
			<artifactId>spring-tx</artifactId>
			<version>${spring-framework.version}</version>
		</dependency>

		<!-- Logging with SLF4J & LogBack  // clipped... -->
                <!-- JavaConfig needs this library -->
		<dependency>
			<groupId>cglib</groupId>
			<artifactId>cglib</artifactId>
			<version>2.2.2</version>
		</dependency>

		<!-- Test Artifacts // clipped -->
                <dependency>
			<groupId>org.springframework</groupId>
			<artifactId>spring-beans</artifactId>
			<version>${spring-framework.version}</version>
		</dependency>
		<dependency>
			<groupId>org.springframework</groupId>
			<artifactId>spring-core</artifactId>
			<version>${spring-framework.version}</version>
		</dependency>
                <dependency>
			<groupId>cglib</groupId>
			<artifactId>cglib-nodep</artifactId>
			<version>2.2</version>
		</dependency>
		<dependency>
			<groupId>org.hibernate.javax.persistence</groupId>
			<artifactId>hibernate-jpa-2.0-api</artifactId>
			<version>1.0.1.Final</version>
			<optional>true</optional>
		</dependency>
		<dependency>
			<groupId>javax.validation</groupId>
			<artifactId>validation-api</artifactId>
			<version>1.0.0.GA</version>
		</dependency>
		<dependency>
			<groupId>org.springframework</groupId>
			<artifactId>spring-expression</artifactId>
			<version>${spring-framework.version}</version>
		</dependency>
		<dependency>
            <groupId>org.springframework.data</groupId>
            <artifactId>spring-data-neo4j-aspects</artifactId>
            <version>${spring-data-neo4j.version}</version>
            <exclusions>
                <exclusion>
                    <groupId>org.hibernate.javax.persistence</groupId>
                    <artifactId>hibernate-jpa-2.0-api</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
		<dependency>
			<groupId>org.springframework.data</groupId>
			<artifactId>spring-data-neo4j-rest</artifactId>
			<version>${spring-data-neo4j.version}</version>
		</dependency>
		<dependency>
			<groupId>org.codehaus.mojo</groupId>
			<artifactId>aspectj-maven-plugin</artifactId>
			<version>1.2</version>
			<type>maven-plugin</type>
		</dependency>
	</dependencies>

NOTE: I have clipped some of the less relevant bits for testing and standard Spring dependencies, but if you would like a full POM example, please just let me know!

The next big thing is that you now need to define your graphDatabaseService as a bean that you can then use via the @Autowired annotation in the rest of your code.

import org.neo4j.graphdb.GraphDatabaseService;
import org.neo4j.graphdb.factory.GraphDatabaseFactory;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.context.annotation.Import;
import org.springframework.context.annotation.aspectj.EnableSpringConfigured;
import org.springframework.data.neo4j.aspects.config.Neo4jAspectConfiguration;
import org.springframework.data.neo4j.config.EnableNeo4jRepositories;
import org.springframework.data.neo4j.config.Neo4jConfiguration;
import org.springframework.transaction.annotation.EnableTransactionManagement;
import org.springframework.transaction.jta.JtaTransactionManager;

@Configuration
@Import(Neo4jAspectConfiguration.class)
@EnableTransactionManagement
@EnableNeo4jRepositories("com.company.your.repos")
@EnableSpringConfigured
public class AppConfig extends Neo4jConfiguration {

	@Bean
	public GraphDatabaseService graphDatabaseService() {
                // if you want to use Neo4j as a REST service
		//return new SpringRestGraphDatabase("http://localhost:7474/db/data/");
                // Use Neo4j as Odin intended (as an embedded service)
		GraphDatabaseService service = new GraphDatabaseFactory().newEmbeddedDatabase("/tmp/graphdb");
		return service;
	}

Great! You are just about done! Now create a simple entity with the @NodeEntity annotation and save some data to it! You now have a working graph application!

This is really easy once you know how, sometimes, though, getting to know how is hard!

If you enjoyed this post, or found it useful, please leave a comment and I will start a new series on Neo4j in Java/Spring!

SAMBA with VFS2

I needed to connect, with authentication, to a SAMBA shared drive to download and process data files for one of my applications. Thinking VFS2 would do the job, I had a go at it.Results were good, so now documented here:

private static void GetFiles() throws IOException {
		jcifs.Config.registerSmbURLHandler();

		NtlmPasswordAuthentication auth = new NtlmPasswordAuthentication(
				prop.getProperty("smbDomain"), prop.getProperty("smbUser"),
				prop.getProperty("smbPass"));

		
        StaticUserAuthenticator authS = new StaticUserAuthenticator(
				prop.getProperty("smbDomain"), prop.getProperty("smbUser"),
				prop.getProperty("smbPass"));

		FileSystemOptions opts = new FileSystemOptions();
		
		DefaultFileSystemConfigBuilder.getInstance().setUserAuthenticator(opts,
				authS);
		
	    SmbFile smbFile = new SmbFile(prop.getProperty("smbURL"),auth);
		FileSystemManager fs = VFS.getManager();
		String[] files = smbFile.list();
		
		for(String file:files) {
			SmbFile remFile = new SmbFile(prop.getProperty("smbURL") + file, auth);
			SmbFileInputStream smbfos = new SmbFileInputStream(remFile);
			OutputStream out = new FileOutputStream(file);
			byte[] b = new byte[8192];  
            int n;  
            while ((n = smbfos.read(b)) > 0) {  
                out.write(b, 0, n);  
            }  
			smbfos.close();
			out.close(); 
		}
		
	}

As you can see from the above, this simply copies the files to local, preserving the filename, which is exactly what I want. If you need to write files, it is very similar, just use the SmbFileOutputStream instead!

Nice! VFS2 comes through once again!

Java HTTPClient

Just a quick post with an example of using the Java HTTPClient to make requests to remote web servers. I did not find much in the way of succint examples, so here is one:

import org.apache.commons.httpclient.*;
import org.apache.commons.httpclient.methods.*;
import org.apache.commons.httpclient.params.HttpMethodParams;

import java.io.*;

public class MyClient {

	public MyClient() {
	}

	private String url;
	
	public String getUrl() {
		return url;
	}

	public void setUrl(String url) {
		this.url = url;
	}

	public byte[] grok() {
		// Create an instance of HttpClient.
		HttpClient client = new HttpClient();

		// Create a method instance.
		GetMethod method = new GetMethod(url);

		// Provide custom retry handler is necessary
		method.getParams().setParameter(HttpMethodParams.RETRY_HANDLER,
				new DefaultHttpMethodRetryHandler(3, false));

		try {
			// Execute the method.
			int statusCode = client.executeMethod(method);

			if (statusCode != HttpStatus.SC_OK) {
				System.err.println("Method failed: " + method.getStatusLine());
			}

			// Read the response body.
			byte[] responseBody = method.getResponseBody();

			// Deal with the response.
			// Use caution: ensure correct character encoding and is not binary
			// data
			return responseBody;

		} catch (HttpException e) {
			System.err.println("Fatal protocol violation: " + e.getMessage());
			e.printStackTrace();
		} catch (IOException e) {
			System.err.println("Fatal transport error: " + e.getMessage());
			e.printStackTrace();
		} finally {
			// Release the connection.
			method.releaseConnection();
		}
		return null;
	}
}

Done.

Downloading files via HDFS and the Java API

Last post covered uploading files, so I thought it would be useful to do a quick download client as well. Again, we are using DFSClient and BufferedInput and BufferedOutputStreams to do the work. I split the file into 1024 byte chunks in the byte array, but for larger files, I guess you may want to modify that too.

Enough jabbering, to the code!

public void downloadFile() {
		try {
			Configuration conf = new Configuration();
			conf.set("fs.defaultFS", this.hdfsUrl);
			DFSClient client = new DFSClient(new URI(this.hdfsUrl), conf);
			OutputStream out = null;
			InputStream in = null;
			try {
				if (client.exists(sourceFilename)) {
					in = new BufferedInputStream(client.open(sourceFilename));
					out = new BufferedOutputStream(new FileOutputStream(
							destinationFilename, false));

					byte[] buffer = new byte[1024];

					int len = 0;
					while ((len = in.read(buffer)) > 0) {
						out.write(buffer, 0, len);
					}
				}
				else {
					System.out.println("File does not exist!");
				}
			} finally {
				if (client != null) {
					client.close();
				}
				if (out != null) {
					out.close();
				}
			}
		} catch (Exception e) {
			e.printStackTrace();
		}
	}

I use simple getters() and setters() to set the source and destination filenames, and have set the hdfsUrl to my namenode URI on the correct port.

Cloudera Hadoop and HBase example code

Earlier, I posted about connecting to Hadoop via a Java based client. I decided to try out Cloudera’s offering (http://www.cloudera.com) where they provide a manager app as well as an easy way to set up Hadoop for both Enterprise (includes support) and a free version.

I downloaded the free version of the Cloudera Manager, and quickly set up a 4 node Hadoop cluster using their tools. I must say, that as far as easy to use goes, they have done an awesome job!

Once everything was up and running, I wanted to create a Java based remote client to talk to my shiny new cluster. This was pretty simple, once I had figured out to use the Cloudera Maven repositories and which versions and combinations of packages to use.

I will save you the trouble and post the results here.

Versions in use are latest

hadoop version
Hadoop 2.0.0-cdh4.4.0
Subversion file:///var/lib/jenkins/workspace/generic-package-ubuntu64-12-04/CDH4.4.0-Packaging-Hadoop-2013-09-03_18-48-35/hadoop-2.0.0+1475-1.cdh4.4.0.p0.23~precise/src/hadoop-common-project/hadoop-common -r c0eba6cd38c984557e96a16ccd7356b7de835e79
Compiled by jenkins on Tue Sep  3 19:33:54 PDT 2013
From source with checksum ac7e170aa709b3ace13dc5f775487180
This command was run using /usr/lib/hadoop/hadoop-common-2.0.0-cdh4.4.0.jar

With this information, we now know which versions of the packages to use from the Cloudera Maven repository.

<dependencies>
		<dependency>
			<groupId>org.apache.hadoop</groupId>
			<artifactId>hadoop-hdfs</artifactId>
			<version>2.0.0-cdh4.4.0</version>
		</dependency>
		<dependency>
			<groupId>org.apache.hadoop</groupId>
			<artifactId>hadoop-client</artifactId>
			<version>2.0.0-cdh4.4.0</version>
		</dependency>
		<dependency>
			<groupId>org.apache.hbase</groupId>
			<artifactId>hbase</artifactId>
			<version>0.94.6-cdh4.4.0</version>
		</dependency>
	</dependencies>

I also make sure to add the Cloudera Maven repository in my pom.xml file

<repositories>
		<repository>
			<id>cloudera</id>
			<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
		</repository>
	</repositories>

That is pretty much the hard part. If you don’t need HBase, then leave it off, the “hadoop-client” should do most of what you want.

Hadoop HDFS “abstraction” class in Java

Recently I found myself working with Hadoop’s HDFS, which is the Hadoop file system in a Java project. The docs for working with Hadoop (especially HDFS) are somewhat sparse, as I guess most folks prefer to keep their code pretty secretive. Well, seeing as though the following will never give anyone a business advantage over anyone else, but will probably spare a few folks some sleepless nights, here you go!

First off, you need to start a new Maven project. I just used the simple archetype as I was just messing about really. You then will need to add the following dependencies to your POM.xml file (Oh, using Maven 3…)

<dependencies>
		<dependency>
			<groupId>junit</groupId>
			<artifactId>junit</artifactId>
			<version>3.8.1</version>
			<scope>test</scope>
		</dependency>

		<dependency>
			<groupId>org.apache.hadoop</groupId>
			<artifactId>hadoop-core</artifactId>
			<version>1.1.2</version>
		</dependency>
		<dependency>
			<groupId>org.apache.hadoop</groupId>
			<artifactId>hadoop-tools</artifactId>
			<version>1.2.1</version>
		</dependency>
		<dependency>
			<groupId>org.apache.hbase</groupId>
			<artifactId>hbase-client</artifactId>
			<version>0.95.0</version>
		</dependency>
		<dependency>
			<groupId>org.apache.zookeeper</groupId>
			<artifactId>zookeeper</artifactId>
			<version>3.3.2</version>
			<exclusions>
				<exclusion>
					<groupId>com.sun.jmx</groupId>
					<artifactId>jmxri</artifactId>
				</exclusion>
				<exclusion>
					<groupId>com.sun.jdmk</groupId>
					<artifactId>jmxtools</artifactId>
				</exclusion>
				<exclusion>
					<groupId>javax.jms</groupId>
					<artifactId>jms</artifactId>
				</exclusion>
			</exclusions>
		</dependency>
		<dependency>
			<groupId>commons-logging</groupId>
			<artifactId>commons-logging</artifactId>
			<version>1.0.3</version>
		</dependency>
	</dependencies>

You will notice a couple of excludes there. They are really important as some of that code is dead and discontinued.

Next up, we need a class to interact with our hdfs store.

public class HbaseExample {
	private static Configuration hBaseConfig = null;
	private static String hbaseHost = "ubuntu.local";
	private static String zookeeperHost = "ubuntu.local";

	/**
	 * Initialization
	 */
	static {
		hBaseConfig = HBaseConfiguration.create();
		hBaseConfig.setInt("timeout", 120000);
		hBaseConfig.set("hbase.master", "*" + hbaseHost + ":9000*");
		hBaseConfig.set("hbase.zookeeper.quorum", zookeeperHost);
		hBaseConfig.set("hbase.zookeeper.property.clientPort", "2181");
        new HTablePool(hBaseConfig, 10);
	}

You will see that all is pretty standard and that I have defined a few static properties to hold the values of the zookeeper and Hadoop hosts. Note that these will work off IP addresses, but Hadoop most definitely prefers FQDN’s. The final line in the method is a sticky one. Basically is you do not create a connection pool, your code can take up to 5 seconds to re-initialize the database, which is obviously not cool.

After that, there is nothing too tricksy. I will paste a copy of the whole class next so that you can  check your imports etc as well as have a look at full CRUD on an HDFS store:

package za.co.paulscott.hdfstest;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Delete;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.HTablePool;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.exceptions.MasterNotRunningException;
import org.apache.hadoop.hbase.exceptions.ZooKeeperConnectionException;
import org.apache.hadoop.hbase.util.Bytes;

public class HbaseExample {
	private static Configuration hBaseConfig = null;
	private static String hbaseHost = "ubuntu.local";
	private static String zookeeperHost = "ubuntu.local";

	/**
	 * Initialization
	 */
	static {
		hBaseConfig = HBaseConfiguration.create();
		hBaseConfig.setInt("timeout", 120000);
		hBaseConfig.set("hbase.master", "*" + hbaseHost + ":9000*");
		hBaseConfig.set("hbase.zookeeper.quorum", zookeeperHost);
		hBaseConfig.set("hbase.zookeeper.property.clientPort", "2181");
        new HTablePool(hBaseConfig, 10);
	}

	/**
	 * Create a table
	 */
	public static void creatTable(String tableName, String[] familys)
			throws Exception {
		HBaseAdmin admin = new HBaseAdmin(hBaseConfig);
		boolean exists = false;
		try {
			exists = admin.tableExists(tableName);
		} catch (NullPointerException e) {
			exists = false;
		}
		if (exists = true) {
			System.out.println("table already exists!");
		} else {
			HTableDescriptor tableDesc = new HTableDescriptor(tableName);
			for (int i = 0; i < familys.length; i++) {
				tableDesc.addFamily(new HColumnDescriptor(familys[i]));
			}
			admin.createTable(tableDesc);
			System.out.println("create table " + tableName + " ok.");
		}
	}

	/**
	 * Delete a table
	 */
	public static void deleteTable(String tableName) throws Exception {
		try {
			HBaseAdmin admin = new HBaseAdmin(hBaseConfig);
			admin.disableTable(tableName);
			admin.deleteTable(tableName);
			System.out.println("delete table " + tableName + " ok.");
		} catch (MasterNotRunningException e) {
			e.printStackTrace();
		} catch (ZooKeeperConnectionException e) {
			e.printStackTrace();
		}
	}

	/**
	 * Put (or insert) a row
	 */
	public static void addRecord(String tableName, String rowKey,
			String family, String qualifier, String value) throws Exception {
		//System.out.print("Adding record to table:  " + tableName);
		try {
			HTable table = new HTable(hBaseConfig, tableName);
			Put put = new Put(Bytes.toBytes(rowKey));
			put.add(Bytes.toBytes(family), Bytes.toBytes(qualifier),
					Bytes.toBytes(value));
			table.put(put);
			System.out.println("insert recored " + rowKey + " to table "
					+ tableName + " ok.");
		} catch (IOException e) {
			e.printStackTrace();
		}
	}

	/**
	 * Delete a row
	 */
	public static void delRecord(String tableName, String rowKey)
			throws IOException {
		HTable table = new HTable(hBaseConfig, tableName);
		List list = new ArrayList();
		Delete del = new Delete(rowKey.getBytes());
		list.add(del);
		table.delete(list);
		System.out.println("del recored " + rowKey + " ok.");
	}

	/**
	 * Get a row
	 */
	public static void getOneRecord(String tableName, String rowKey)
			throws IOException {
		HTable table = new HTable(hBaseConfig, tableName);
		Get get = new Get(rowKey.getBytes());
		Result rs = table.get(get);
		for (KeyValue kv : rs.raw()) {
			System.out.print(new String(kv.getRow()) + " ");
			System.out.print(new String(kv.getFamily()) + ":");
			System.out.print(new String(kv.getQualifier()) + " ");
			System.out.print(kv.getTimestamp() + " ");
			System.out.println(new String(kv.getValue()));
		}
	}

	/**
	 * Scan (or list) a table
	 */
	public static void getAllRecord(String tableName) {
		try {
			HTable table = new HTable(hBaseConfig, tableName);
			Scan s = new Scan();
			ResultScanner ss = table.getScanner(s);
			for (Result r : ss) {
				for (KeyValue kv : r.raw()) {
					System.out.print(new String(kv.getRow()) + " ");
					System.out.print(new String(kv.getFamily()) + ":");
					System.out.print(new String(kv.getQualifier()) + " ");
					System.out.print(kv.getTimestamp() + " ");
					System.out.println(new String(kv.getValue()));
				}
			}
		} catch (IOException e) {
			e.printStackTrace();
		}
	}

	public static void main(String[] agrs) {
		try {
			String tablename = "scores";
			String[] familys = { "grade", "course" };
			HbaseExample.creatTable(tablename, familys);

			// add record paul
			HbaseExample.addRecord(tablename, "paul", "grade", "", "5");
			HbaseExample.addRecord(tablename, "paul", "course", "", "90");
			HbaseExample.addRecord(tablename, "paul", "course", "math", "97");
			HbaseExample.addRecord(tablename, "paul", "course", "art", "87");
			// add record caz
			HbaseExample.addRecord(tablename, "caz", "grade", "", "4");
			HbaseExample.addRecord(tablename, "caz", "course", "math", "89");

			System.out.println("===========get one record========");
			HbaseExample.getOneRecord(tablename, "paul");

			System.out.println("===========show all record========");
			HbaseExample.getAllRecord(tablename);

			System.out.println("===========del one record========");
			HbaseExample.delRecord(tablename, "caz");
			HbaseExample.getAllRecord(tablename);

			System.out.println("===========show all records========");
			HbaseExample.getAllRecord(tablename);
		} catch (Exception e) {
			e.printStackTrace();
		}
	}

}

As you can see, there is also a Main method, so to execute this in Eclipse, you can simply run as… Java application, or compile is as a jar (using goal jar:jar) and run it.

NOTE: Make sure that you change the host(s)!

NOTE2: This is an example to connect to a remote hadoop cluster!

NOTE3: This is one too many notes, so go now and have fun!

Mahout MongoDBDataModel setup in Spring-3.2.2

Reposted from old site – original date: Monday 3 June 2013

Mahout (http://mahout.apache.org) is a set of machine learning libraries licensed under the Apache Software License for doing a bunch of fun stuff with machine learning and recommendations (making use of the Mahout Taste libraries). This opens up a lot of very nice possibilities for data mining of various data sets within your application(s)

The setup of Mahout to work with spring-data-mongodb is a little tricky sometimes (especially when you are using Spring-3.2.2 or later) as the Mahout MongoDB driver is somewhat outdated. Luckily this is pretty easily remedied with a little bit of extra tweaking.

I will assume that your spring-data-mongodb is up and running in your project already, as the crux of this post is the integration bit…

That being said, you will need to import the Mahout libraries into your pom.xml file, starting with mahout-core and at least mahout-integration. You will also need uncommons-math and some other libs, depending on what you would like to achieve anyway.

The tricky part comes in on the dependency heirarchy of your POM file. If you take a look at the dependency heirarchy for the Mongo driver, you will notice that Mahout will depend on a slighly (OK, very) outdated version of the MongoDB driver. This needs to be removed, so that the [b]only[/b] MongoDB driver is the one that you get with the latest version of Spring-data-mongodb. This should be around 2.20 or so IIRC.

It is probably also a good idea to grab the latest mahout-integration code from github as well, to make sure that you are all up to date.

Once that is done, you should be able to follow any of the (admittedly sparse) Mahout tutorials on using the MongoDBDataModel. One huge caveat is that if you do not follow the schema defined by Mahout in your collection that you would like Mahout to connect to, you will need to override the default constructor with your configs.

example:

MongoDBDataModel dbm = new MongoDBDataModel(
                    String mongoHost,
                    int mongoPort,
                    String mongoDBName,
                    String collection,
                    boolean manage,
                    boolean finalRemove,
                    dateFormat,
                    String useridentifier,
                    String itemIdentifier,
                    String preferenceField,
                    String mappingCollection
);

Note that the data model construction will fail horribly without a specified mappingCollection. This is a dynamically created collection that will hold mapping results, so it can be called just about anything, but [b]it must be declared![/b]

So, taking the above connection constructor, we can provide an example constructor that looks something like:

private MongoDBDataModel getModel() throws UnknownHostException,
            MongoException {
        MongoDBDataModel dbm = new MongoDBDataModel(
                "localhost",
                27017,
                mydb,
                mydatacollection,
                false, // leave this as false unless you want Mahout to manage i.e. delete your entries in your collection!
                false,
                dateFormat,
                "customerId",
                "item",
                "rating",
                "likemap"
        );
        return dbm;
    }

You can then go ahead and use your MongoDBDataModel in the rest of your code. The rest of the API is pretty simple to understand, so if you would like some more details on the various bits and pieces of Mahout, please leave a comment and I will attempt to find time to get it done too!

Low level MongoDB in Java Spring Framework

Reposted from old site – original date: Friday 31 August 2012

I recently discovered that doing lower level functions like distinct() and some of the $near (i.e. spatial) functions in MongoDB using the Spring Framework is not all that easy.

After a little poking around, I have come up with the following solution, which is OK and works, but can probably be improved upon. I am still pretty new to Spring, so if you have a better idea, then please do let me know in comments.

Anyway, here it is:

First off, we use the @Autowired annotation to bring in the base MongoTemplate from spring-data-mongodb

@Autowired
MongoTemplate mongoTemplate;

Once we have that, we can use it to make some queries. Note that this is the slightly smelly part because you have to tell Spring what the return type is and it doesn’t really like that…

// Get the distinct stuff from MongoDB
List<String> coll = mongoTemplate.getCollection("mycollection").distinct("myfield");

In the above code you will notice that I have defined a List type variable called coll that uses the @Autowired MongoTemplate variable to get a collection and then a field using distinct. This is analogous to db.whatever.distinct(“term) on the Mongo shell.

Hope this saves someone else a few minutes at least, and as I said, if you have a better idea, please do share!

Social Deposits using SWORD

Reposted from old site – original date: Wednesday 7 October 2009
Social Deposits using SWORD and Fedora commons

Based on the work of Staurt Lewis, http://blog.stuartlewis.com/ I would like to introduce the concept of social deposits.
In Stuart’s blog post on the subject, he makes the example of faculty members doing their own deposits through Facebook (http://www.facebook.com) to a DSpace repository.
Recently, I see that the Fedora commons SWORD API has stabilised to a usable degree and is now relatively easy to build as well. This opens the possibility of using SWORD deposits from a number of applications, including #Chisimba and #Facebook, as well as other apps like Microsoft #Word and OpenOffice. This basically means that you will be able to deposit to the institutional repository from almost anywhere that you can send an email from.

The main advantage that Fedora commons has over the DSpace route (HINT: It [b]isn’t[/b] simplicity in set up) is that it automatically creates usable RDF triples from the data, thereby skipping a processing step that would need to be done in DSpace.

The bigger picture here for [b]institutions[/b] and [b]institutional repositories[/b] is that [b]students[/b] can also make deposits of much more unstructured data. This is almost a whole other post/concept, but I would like to see loads of very unstructured data (contextually unstructured but not temporally unstructured) data being added to create mine-able (is that a word?) repositories of information generated during day to day experience at the institution. I am, of course, talking about SMS messages and MXit conversations as well. This can be a valuable resource to gauge the level of engagement at the institution as well as provide great temporal and historical data on the daily influences that may be affecting students. It could very well also become a great resource for researchers to use for more academic pursuits…