Tuesday 11 February 2014

Hadoop and OpenCV

Compiling OpenCV with Java


The info how to compile OpenCV is available at:
http://docs.opencv.org/doc/tutorials/introduction/desktop_java/java_dev_intro.html

Here are some additional notes for opencv-2.4.8

  • Ubuntu 12.04: It compiles and works fine as is. However some extra packages are required including: build–essentials, libjpg-dev, python-dev, ant, libpng-dev and perhaps some other (see the cmake output). JAVA_HOME needs to be set to a JDK
  • On Centos 6.2: cmake needs to be upgraded to 2.8.x (2.8.12.2). Some extra packages may be required e.g.: libjpg-devel, python-devel etc. There is bug in opencv-2.4.8 that results in SEGV while loading the library to JavaVM – to fix apply the patch before compilation: https://github.com/djetter99/opencv/commit/6bf599b1bca8a58c7a656ddc169f7be0fc3094c6
  • On SUSE Linux Enterprise Server 11 SP2 (bragg cluster): apply the patch (see CentoOS). Load modules cmake, and gcc (does not compile with intel cc)


To apply the patch use (in opencv sources root dir):

# wget https://github.com/djetter99/opencv/commit/6bf599b1bca8a58c7a656ddc169f7be0fc3094c6.patch
# git apply 6bf599b1bca8a58c7a656ddc169f7be0fc3094c6.patch

Loading custom native libraries from in Hadoop


Native libraries need to be on the path defined by java java.library.path system property. It’s a bit confusing how to pass it to Hadoop workers as it depends on the mode (local vs distributed) and version.

For local run (that is one that does not involve spawning child jvms):

  • set the additional path in JAVA_LIBRARY_PATH env variable


For distributed runs:

  • copy the library to standard hadoop native path (e.g. /usr/lib/hadoop/lib/native)
  • use the distributed cache as described: http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-common/NativeLibraries.html 
  • in version 0.20 and 1.0 pass the additional paths (need to be available on worker nodes) in mapred.child.java.opts by defining java.library.path e.g.: -Dmapred.child.java.opts=”-Djava.library.path=/usr/local/lib”
  • in version 2.0 pass the additional paths in the mapred.child.env property by defining JAVA_LIBRARY_PATH env variable (e.g.: -Dmapred.child.env=”JAVA_LIBRARY_PATH=/usr/local/lib”)