Cloudera's Distribution for Apache Hadoop (CDH) 3 Beta 4 Released Cloudera is happy to announce the release of Beta 4 of version 3 of our Distribution for Apache Hadoop (CDH3 Beta 4). == Core Hadoop, HDFS, and MapReduce == As the final beta in the CDH3 series, the major changes in this release have focused on stability, security, and scalability. Stability and bug fixes: CDH3 Beta 4 fixes several important bugs in both MapReduce and HDFS based on our experiences deploying CDH3 in production environments as well as important backports from the upstream Apache project. This release also includes several improvements that ease the diagnosis of configuration problems and simplify the configuration of permissions for local storage directories. Security: CDH3 Beta 4 includes a fix for one major security vulnerability that was known in Beta 3. Cloudera encourages users running in a secured environment to upgrade as soon as possible. HDFS now also includes support for the "sticky bit" as found in POSIX file permissions. When set on a directory, files and directories within that directory may only be deleted or renamed by the item's owner, the directory's owner, or the superuser. Cloudera recommends that any directories currently set to mode 777 (eg /tmp/) be chmodded to mode 1777 to include the sticky bit. Scalability: CDH3 Beta 4 merges many of the scalability improvements contributed by Yahoo! in their 0.20.100 branch of Apache Hadoop. This includes a reduction in the amount of memory required by the NameNode, improvements to MapReduce scheduling throughput, and more scalable RPC servers. == New Component Versions == CDH3 Beta 4 updates several components to new upstream version numbers: HBase: Updated from 0.89.20100924 to 0.90.1. This new release continues to focus on stability, but also includes a few important performance improvements and an experimental feature to reduce the frequency of lengthy garbage collection pauses. ZooKeeper: Updated from 3.3.1 to 3.3.2. This is a bug fix release that addresses several important bugs, including two fixes for connection cleanup in clients, and one server side thread leak. Flume: Updated from 0.9.1 to 0.9.3. This release broadens support to include Flume nodes on Windows machines, exposes in-depth metric information via a JSON interface, improves performance of tail and exec sources, enables support for Apache Avro based RPCs, and patches issues related to multi-master configurations, output format plugins and manual failover chains. Pig: Updated from 0.7.0 to 0.8.0. This release includes new features such as scalar types, custom partitioners, Python UDFs, a new unit testing framework for Pig scripts, better statistics on Pig jobs, and support for integrating custom MapReduce jobs into a Pig flow. Hive: Updated from 0.5.0 to 0.7.0-rc0. This release includes important bug fixes as well as new features and performance improvements. These include support for views, multiple databases, and dynamic partitions, automatic merging of small files, new and improved join strategies, storage handlers, support for local execution modes, archiving, and significant improvements to the JDBC driver. Sqoop: Updated from 1.1.0 to 1.2.0. This release includes important bug fixes as well as some improvements and new features, including support for Oracle catalog views, the ability to load command line options from configuration files, and several other usability improvements. Oozie: Updated from 2.2.1 to 2.3.0. This release includes several usability improvements and bug fixes. New features include Oozie sharelib and Oozie HTTP Kerberos authentication. Oozie sharelib bundles Hadoop streaming, Pig, Hive and Sqoop JARs simplifying workflow application deployment. Oozie HTTP Kerberos authentication provides secure authentication for Oozie clients and the Oozie web-console. Oozie examples have been rewritten making them simpler. Whirr: Updated from 0.1.0 to 0.3.0. This release includes support for Apache HBase, improvements in startup time, and recipes for common configurations. Hue: Updated from 1.1 to 1.2. The 1.2.0 is a minor release, largely focused on bug fixes and compatibility with the release of Cloudera's Distribution for Apache Hadoop 3, Beta 4. == Upgrading == *** If you are upgrading from a release of CDH prior to CDH3 Beta 3, please read the release notes for the Beta 3 release as well as those for CDH3 Beta 4. - This release changes the wire format for Hadoop's RPC mechanism. Thus, you must upgrade any existing hadoop client software at the same time as you upgrade the server. - In CDH3 Beta 4 it's important that, if you set mapred.child.ulimit, it must be more than 2 times the heap size value set in mapred.child.java.opts. For example, if you set a 1G heap, set mapred.child.ulimit to 2.5GB. Child processes are now guaranteed to fork at least once, and the fork momentarily requires twice the overhead in virtual memory.