Harish Mallipeddi RSS

Avid Pythonista with a secret love for Erlang.

harish.mallipeddi at gmail

 Photos

 LinkedIn

 Twitter

 Projects

Older posts

Aug
19th
Wed
permalink

LZO compression for Hadoop-0.20+

Starting from Hadoop-0.20 onwards, any code related to LZO compression has been removed from the Hadoop source tree. This is because the LZO code is licensed under GPL and hence incompatible with Hadoop’s Apache license. One more thing you should know is LZO compression is only supported via a native library (AFAIK there’s no pure Java implementation of it). LzoCodec and LzopCodec are almost the same (LzopCodec is compatible with the output from the lzop unix utility).

Here are the steps to get LzopCodec working with Hadoop-0.20 (see the gist embed below). I’m assuming you’ve already downloaded and installed the Hadoop-0.20 release tarball. We’ll be adding the compiled library to Hadoop-0.20’s lib/ folder. Repackage it into a tarball and push it to your cluster using whatever magic you use and you should have LZO compression working.

Comments
blog comments powered by Disqus