Monthly Archives: May 2014

Using OpenCV for great customer service

OpenCV is an Open Source Computer Vision library that can be used in a variety of applications. There are a few wrappers for it that will expose the OpenCV API in a number of languages, but we will look at the Python wrapper in this post.

One application that I was thinking could be done very quickly and easily, would be to use facial recognition to look up a customer before servicing them. This can easily be achieved using a simple cheap webcam mounted at the entrance to a service centre that captures people’s faces as they enter the building. This can then be used to look up against a database of images to identify the customer and all their details immediately on the service centre agent’s terminal. If a customer is a new customer, the agent could then capture the info for next time.

Privacy issues aside, this should be relatively easy to implement.

#!/usr/bin/python
import sys
import cv2.cv as cv
from optparse import OptionParser

# Parameters for haar detection
# From the API:
# The default parameters (scale_factor=2, min_neighbors=3, flags=0) are tuned
# for accurate yet slow object detection. For a faster operation on real video
# images the settings are:
# scale_factor=1.2, min_neighbors=2, flags=CV_HAAR_DO_CANNY_PRUNING,
# min_size=<minimum possible face size

min_size = (20, 20)
image_scale = 2
haar_scale = 1.2
min_neighbors = 2
haar_flags = 0

def detect_and_draw(img, cascade):
    # allocate temporary images
    gray = cv.CreateImage((img.width,img.height), 8, 1)
    small_img = cv.CreateImage((cv.Round(img.width / image_scale),
                   cv.Round (img.height / image_scale)), 8, 1)

    # convert color input image to grayscale
    cv.CvtColor(img, gray, cv.CV_BGR2GRAY)

    # scale input image for faster processing
    cv.Resize(gray, small_img, cv.CV_INTER_LINEAR)

    cv.EqualizeHist(small_img, small_img)

    if(cascade):
        t = cv.GetTickCount()
        faces = cv.HaarDetectObjects(small_img, cascade, cv.CreateMemStorage(0),
                                     haar_scale, min_neighbors, haar_flags, min_size)
        t = cv.GetTickCount() - t
        print "detection time = %gms" % (t/(cv.GetTickFrequency()*1000.))
        if faces:
            for ((x, y, w, h), n) in faces:
                # the input to cv.HaarDetectObjects was resized, so scale the
                # bounding box of each face and convert it to two CvPoints
                pt1 = (int(x * image_scale), int(y * image_scale))
                pt2 = (int((x + w) * image_scale), int((y + h) * image_scale))
                cv.Rectangle(img, pt1, pt2, cv.RGB(255, 0, 0), 3, 8, 0)

    cv.ShowImage("result", img)

if __name__ == '__main__':

    parser = OptionParser(usage = "usage: %prog [options] [filename|camera_index]")
    parser.add_option("-c", "--cascade", action="store", dest="cascade", type="str", help="Haar cascade file, default %default", default = "../data/haarcascades/haarcascade_frontalface_alt.xml")
    (options, args) = parser.parse_args()

    cascade = cv.Load(options.cascade)

    if len(args) != 1:
        parser.print_help()
        sys.exit(1)

    input_name = args[0]
    if input_name.isdigit():
        capture = cv.CreateCameraCapture(int(input_name))
    else:
        capture = None

    cv.NamedWindow("result", 1)

    if capture:
        frame_copy = None
        while True:
            frame = cv.QueryFrame(capture)
            if not frame:
                cv.WaitKey(0)
                break
            if not frame_copy:
                frame_copy = cv.CreateImage((frame.width,frame.height),
                                            cv.IPL_DEPTH_8U, frame.nChannels)
            if frame.origin == cv.IPL_ORIGIN_TL:
                cv.Copy(frame, frame_copy)
            else:
                cv.Flip(frame, frame_copy, 0)

            detect_and_draw(frame_copy, cascade)

            if cv.WaitKey(10) >= 0:
                break
    else:
        image = cv.LoadImage(input_name, 1)
        detect_and_draw(image, cascade)
        cv.WaitKey(0)

    cv.DestroyWindow("result")

So as you can see, by using the bundled OpenCV Haar detection XML documents for frontal face detection, we are almost there already! Try it with:

python ./facedetect.py -c /usr/local/share/OpenCV/haarcascades/haarcascade_frontalface_alt.xml 0

Where 0 is the index of the camera you wish to use.

An introduction to Apache Spark

What is Apache Spark?

Apache Spark is a fast and general engine for large-scale data processing.

Related documents

http://spark.apache.org

You can find the latest Spark documentation, including a programming
guide, on the project webpage at http://spark.apache.org/documentation.html.

Setup

Spark needs to be downloaded and installed on your local machine
Spark requires Scala 2.10. The project is built using Simple Build Tool (SBT),
which can be obtained (http://www.scala-sbt.org). If SBT is installed we
will use the system version of sbt otherwise we will attempt to download it
automatically. To build Spark and its example programs, run:

./sbt/sbt assembly

Once you’ve built Spark, the easiest way to start using it is the shell:

./bin/spark-shell

Or, for the Python API, the Python shell (`./bin/pyspark`).

Spark also comes with several sample programs in the `examples` directory.
To run one of them, use `./bin/run-example <class> <params>`. For example:

./bin/run-example org.apache.spark.examples.SparkLR local[2]

will run the Logistic Regression example locally on 2 CPUs.

Each of the example programs prints usage help if no params are given.

All of the Spark samples take a `<master>` parameter that is the cluster URL
to connect to. This can be a mesos:// or spark:// URL, or “local” to run
locally with one thread, or “local[N]” to run locally with N threads.

Tests

Testing first requires building Spark. Once Spark is built, tests
can be run using:

`./sbt/sbt test`

Hadoop versions

Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported
storage systems. Because the protocols have changed in different versions of
Hadoop, you must build Spark against the same version that your cluster runs.
You can change the version by setting the `SPARK_HADOOP_VERSION` environment
when building Spark.

For Apache Hadoop versions 1.x, Cloudera CDH MRv1, and other Hadoop
versions without YARN, use:

# Apache Hadoop 1.2.1
$ SPARK_HADOOP_VERSION=1.2.1 sbt/sbt assembly

# Cloudera CDH 4.2.0 with MapReduce v1
$ SPARK_HADOOP_VERSION=2.0.0-mr1-cdh4.2.0 sbt/sbt assembly

For Apache Hadoop 2.2.X, 2.1.X, 2.0.X, 0.23.x, Cloudera CDH MRv2, and other Hadoop versions
with YARN, also set `SPARK_YARN=true`:

# Apache Hadoop 2.0.5-alpha
$ SPARK_HADOOP_VERSION=2.0.5-alpha SPARK_YARN=true sbt/sbt assembly

# Cloudera CDH 4.2.0 with MapReduce v2
$ SPARK_HADOOP_VERSION=2.0.0-cdh4.2.0 SPARK_YARN=true sbt/sbt assembly

# Apache Hadoop 2.2.X and newer
$ SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true sbt/sbt assembly

When developing a Spark application, specify the Hadoop version by adding the
“hadoop-client” artifact to your project’s dependencies. For example, if you’re
using Hadoop 1.2.1 and build your application using SBT, add this entry to
`libraryDependencies`:

“org.apache.hadoop” % “hadoop-client” % “1.2.1″

If your project is built with Maven, add this to your POM file’s `<dependencies>` section:

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>1.2.1</version>
</dependency>

Spark could be very well suited for more in depth data mining from social streams like Twitter/Facebook
Spark has an advanced DAG execution engine that supports cyclic data flow and in-memory computing.
Write applications quickly in Java, Scala or Python.
Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala and Python shells.
Combine SQL, streaming, and complex analytics.
Spark powers a stack of high-level tools including Shark for SQL, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these frameworks seamlessly in the same application.
Spark can run on Hadoop 2′s YARN cluster manager, and can read any existing Hadoop data.
If you have a Hadoop 2 cluster, you can run Spark without any installation needed. Otherwise, Spark is easy to run standalone or on EC2 or Mesos. It can read from HDFS, HBase, Cassandra, and any Hadoop data source.

Examples

Once Spark is built, open an interactive Scala shell with

bin/spark-shell

You can then start working with the engine

We will do a quick analysis of an apache2 log file (access.log)

// Load the file up for analysis
val textFile = sc.textFile("/var/log/apache2/access.log")
// Count the number of lines in the file
textFile.count()
// Display the first line of the file
textFile.first()
// Display the Number of lines containing PHP
val linesWithPHP = textFile.filter(line => line.contains("PHP"))
// Count the lines with PHP
val linesWithPHP = textFile.filter(line => line.contains("PHP")).count()
// Do the classic MapReduce WordCount example
val wordCounts = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b)
wordCounts.collect()

Apps should be run as either Maven packaged Java apps, or Scala apps. Please refer to documentation for a HOWTO

Conclusion

Overall a good product, but some Hadoop expertise is required for successful set up and working
Mid level or senior level developer required
Some Scala language expertise is advantageous

Customising F-Droid client (Android repo)

This is part 2 of a 2 part series about rolling out your own “Play Store” based on F-Droid for Android devices.

Server set up instructions can be found at http://paulscott.co.za/blog/setting-up-a-private-android-appstore-with-fdroid/

The first thing that we need to do, is to get hold of the source for the client at gitorius.

Do a git clone of the code with

git clone https://git.gitorious.org/f-droid/fdroidclient.git fdroid-client

which will give you a checkout in the fdroid-client directory.
Directly afetr the checkout completes, cd into your new fdroid-client directory and grab the git submodules with:

git submodule update --init

and then we need to prepare for our ant biulds with:

./ant-prepare.sh # This runs 'android update' on the libs and the main project
ant clean release

Which will then build the stock release version of the client. Now, the point here is to do some customising right? OK, so let’s start with that then…

The first step here is to make sure that you can install your new package manager client alongside FDroid client too. In order to do this, edit tools/change-package-name.sh and change the parameters that you need to (package name and identifier)

FDROID_PACKAGE=${1:-org.your.fdroid}
FDROID_NAME=${2:-Your FDroid}

Change these values to something closer to what you want i.e.

FDROID_PACKAGE=${1:-za.co.paulscott.fdroid}
FDROID_NAME=${2:-Pauls FDroid Client}

Execute the package name changer script and get yourself a cookie for getting this far!

Now would be a good tme to import the code into an IDE like Eclipse or Android Studio. Note, however, that due to the way this project is built and laid out, it will probably not compile, and Eclipse will moan about it. No matter though, you are only modifying code in Eclipse, and will rebuild via the command line once done.

One thing that you really should change is to edit res/values/default-repo.xml and modify it to at least include your new server repo that you set up in part 1.
You probably also want to change no-trans.xml and the “about” clause to reflect your own installations too.
If you are rolling out a complete solution, don’t forget to check out strings.xml as well, so that your application makes sense!

Once that is done, you are free to look through the code to modify whatever else you think may be useful, but I think that the above should do for most situations. If you want to change the launcher icon etc, then create a new Android icon project in Eclipse and do it that way, as it is far easier than manually.

Once all your changes are done, rebuild and ship your brand new client!

ant clean emma debug install [test]

Done!