Monday 1 April 2013

Image mosaic with Hadoop and Scoobi


I have been recently working on some prototypes for applying hadoop and map-reduce for image processing.  One useful albeit somewhat trivial use for hadoop is batch processing of an image set with an embarrassingly parallel function. I was however looking for a problem that would involve both map and reduce steps and in addition be simple and visually compelling.

Eventually I settled on creating image mosaics like this:


that is to recreate an image from a potentially huge collection of other images.


The problem is essentially simple. We split the input image into small square cells, and then for each cell we find an image that matches it best (according to some measure of sameness) and then replace the cell with the best fit image in the output mosaic.
Simple enough except that to get good results one needs a huge set of reference images. How big the set needs to be depends on the dimensions of the cell the input image is split into.  For example if we consider 16x16 cells there is (256 ^3) ^ (16 x 16)  = 2^17088  different combinations of how the cell may look like. Try to even cover 1% of that!

In general the larger set of reference images the better results and here is where hadoop comes to the rescue, because the mosaic creation can be quite naturally expressed as a map reduce computation. 

In the map phase we for each image of the reference set we calculate its distance (or sameness) with all the cells of the input image. The result of the map phase can be visualized as heat map with green cells representing a better fit (one heat map per one reference image)


In the reduce phase we for each cell we find the image with the best fit (the greenest one in the heat map) and choose it as a replacement.
This is a bit sketch and the actual implementation my does not involve heat maps, but hopefully it’s easy enough to understand.
A working implementation using Scoobi can be found here: https://bitbucket.org/piotrszul/scoobi-mosaic

Why Scoobi? Well because it allows expressing map-reducing in functional programming paradigm.

Here are some sample mosaics created from ~10M images harvested from Flicker.

 



To create them however I need to switch from java graphics to a C image processing library (opencv with javacv bindings) as the calculation of image sameness is quite computationally intensive and the java version was just way to slow.

Using a C library with Scoobi and optimizing the cluster for image processing were interesting challenges by themselves so I will write about them in next posts.