CDH 2 Release Notes
This update removes
the Thrift jars
packaged with the Hadoop package so that they no longer conflict with
the Thrift jars packaged with Hive. If you have written a MapReduce
program that uses Thrift you will need to package a Thrift jar with
your program.
The following lists all Apache Hadoop Jiras included in CDH 2
that are not included in the Apache Hadoop base version 0.20.1. The
hadoop-0.20.1+169.113.CHANGES.txt
file lists all changes included in CDH 2. The patch for each
change can be found in the cloudera/patches directory in the release tarball.
Changes Not In Hadoop 0.20.1
CDH
Bug
- [DISTRO-27] - CombineFileInputFormat incompatible
Common
Bug
- [HADOOP-6881] - The efficient comparators aren't always used except for BytesWritable and Text
- [HADOOP-5861] - s3n files are not getting split by default
- [HADOOP-6928] - Fix BooleanWritable comparator in 0.20
- [HADOOP-6833] - IPC leaks call parameters when exceptions thrown
- [HADOOP-6762] - exception while doing RPC I/O closes channel
- [HADOOP-6724] - IPC doesn't properly handle IOEs thrown by socket factory
- [HADOOP-6723] - unchecked exceptions thrown in IPC Connection orphan clients
- [HADOOP-6254] - s3n fails with SocketTimeoutException
- [HADOOP-6522] - TestUTF8 fails
- [HADOOP-6643] - Set executable bit for python cloud scripts in the distribution
- [HADOOP-2366] - Space in the value for dfs.data.dir can cause great problems
- [HADOOP-6453] - Hadoop wrapper script shouldn't ignore an existing JAVA_LIBRARY_PATH
- [HADOOP-6460] - Namenode runs of out of memory due to memory leak in ipc Server
- [HADOOP-6498] - IPC client bug may cause rpc call hang
- [HADOOP-6506] - Failing tests prevent the rest of test targets from execution.
- [HADOOP-6505] - sed in build.xml fails
- [HADOOP-6503] - contrib projects should pull in the ivy-fetched libs from the root project
- [HADOOP-5647] - TestJobHistory fails if /tmp/_logs is not writable to. Testcase should not depend on /tmp
- [HADOOP-6462] - contrib/cloud failing, target "compile" does not exist
- [HADOOP-6315] - GzipCodec should not represent BuiltInZlibInflater as decompressorType
- [HADOOP-5623] - Streaming: process provided status messages are overwritten every 10 seoncds
- [HADOOP-5759] - IllegalArgumentException when CombineFileInputFormat is used as job InputFormat
- [HADOOP-6184] - Provide a configuration dump in json format.
- [HADOOP-6269] - Missing synchronization for defaultResources in Configuration.addResource
- [HADOOP-5891] - If dfs.http.address is default, SecondaryNameNode can't find NameNode
- [HADOOP-4655] - FileSystem.CACHE should be ref-counted
- [HADOOP-6231] - Allow caching of filesystem instances to be disabled on a per-instance basis
- [HADOOP-6097] - Multiple bugs w/ Hadoop archives
- [HADOOP-5981] - HADOOP-2838 doesnt work as expected
- [HADOOP-5738] - Split waiting tasks field in JobTracker metrics to individual tasks
- [HADOOP-5442] - The job history display needs to be paged
- [HADOOP-5650] - Namenode log that indicates why it is not leaving safemode may be confusing
- [HADOOP-5805] - problem using top level s3 buckets as input/output directories
- [HADOOP-5656] - Counter for S3N Read Bytes does not work
- [HADOOP-3327] - Shuffling fetchers waited too long between map output fetch re-tries
- [HADOOP-5611] - C++ libraries do not build on Debian Lenny
- [HADOOP-5612] - Some c++ scripts are not chmodded before ant execution
Improvement
- [HADOOP-6714] - FsShell 'hadoop fs -text' does not support compression codecs
- [HADOOP-1849] - IPC server max queue size should be configurable
- [HADOOP-3659] - Patch to allow hadoop native to compile on Mac OS X
- [HADOOP-6667] - RPC.waitForProxy should retry through NoRouteToHostException
- [HADOOP-5687] - Hadoop NameNode throws NPE if fs.default.name is the default value
- [HADOOP-6454] - Create setup.py for EC2 cloud scripts
- [HADOOP-6444] - Support additional security group option in hadoop-ec2 script
- [HADOOP-6426] - Create ant build for running EC2 unit tests
- [HADOOP-5625] - Add I/O duration time in client trace
- [HADOOP-5222] - Add offset in client trace
- [HADOOP-6400] - Log errors getting Unix UGI
- [HADOOP-5640] - Allow ServicePlugins to hook callbacks into key service events
- [HADOOP-6312] - Configuration sends too much data to log4j
- [HADOOP-6279] - Add JVM memory usage to JvmMetrics
- [HADOOP-6133] - ReflectionUtils performance regression
- [HADOOP-2838] - Add HADOOP_LIBRARY_PATH config setting so Hadoop will include external directories for jni
- [HADOOP-5733] - Add map/reduce slot capacity and lost map/reduce slot capacity to JobTracker metrics
- [HADOOP-4842] - Streaming combiner should allow command, not just JavaClass
- [HADOOP-6267] - build-contrib.xml unnecessarily enforces that contrib projects be located in contrib/ dir
- [HADOOP-4936] - Improvements to TestSafeMode
- [HADOOP-4675] - Current Ganglia metrics implementation is incompatible with Ganglia 3.1
- [HADOOP-5450] - Add support for application-specific typecodes to typed bytes
- [HADOOP-1722] - Make streaming to handle non-utf8 byte array
- [HADOOP-6166] - Improve PureJavaCrc32
- [HADOOP-6148] - Implement a pure Java CRC32 calculator
- [HADOOP-5968] - Sqoop should only print a warning about mysql import speed once
- [HADOOP-5967] - Sqoop should only use a single map task
- [HADOOP-5613] - change S3Exception to checked exception
- [HADOOP-5240] - 'ant javadoc' does not check whether outputs are up to date and always rebuilds
New Feature
- [HADOOP-6466] - Add a ZooKeeper service to the cloud scripts
- [HADOOP-6392] - Run namenode and jobtracker on separate EC2 instances
- [HADOOP-6108] - Add support for EBS storage on EC2
- [HADOOP-4368] - Superuser privileges required to do "df"
- [HADOOP-5257] - Export namenode/datanode functionality through a pluggable RPC layer
- [HADOOP-5170] - Set max map/reduce tasks on a per-job basis, either per-node or cluster-wide
- [HADOOP-5469] - Exposing Hadoop metrics via HTTP
- [HADOOP-5745] - Allow setting the default value of maxRunningJobs for all pools
- [HADOOP-5887] - Sqoop should create tables in Hive metastore after importing to HDFS
- [HADOOP-5528] - Binary partitioner
- [HADOOP-5175] - Option to prohibit jars unpacking
- [HADOOP-4829] - Allow FileSystem shutdown hook to be disabled
- [HADOOP-5518] - MRUnit unit test library
- [HADOOP-5844] - Use mysqldump when connecting to local mysql instance in Sqoop
- [HADOOP-5815] - Sqoop: A database import tool for Hadoop
HDFS
Bug
- [HDFS-1240] - TestDFSShell failing in branch-20
- [HDFS-909] - Race condition between rollEditLog or rollFSImage ant FSEditsLog.write operations corrupts edits log
- [HDFS-611] - Heartbeats times from Datanodes increase when there are plenty of blocks to delete
- [HDFS-612] - FSDataset should not use org.mortbay.log.Log
- [HDFS-1024] - SecondaryNamenode fails to checkpoint because namenode fails with CancelledKeyException
- [HDFS-761] - Failure to process rename operation from edits log due to quota verification
- [HDFS-961] - dfs_readdir incorrectly parses paths
- [HDFS-908] - TestDistributedFileSystem fails with Wrong FS on weird hosts
- [HDFS-127] - DFSClient block read failures cause open DFSInputStream to become unusable
- [HDFS-877] - Client-driven block verification not functioning
- [HDFS-793] - DataNode should first receive the whole packet ack message before it constructs and sends its own ack message for the packet
- [HDFS-187] - TestStartup fails if hdfs is running in the same machine
- [HDFS-464] - Memory leaks in libhdfs
- [HDFS-861] - fuse-dfs does not support O_RDWR
- [HDFS-860] - fuse-dfs truncate behavior causes issues with scp
- [HDFS-859] - fuse-dfs utime behavior causes issues with tar
- [HDFS-858] - Incorrect return codes for fuse-dfs
- [HDFS-857] - Incorrect type for fuse-dfs capacity can cause "df" to return negative values on 32-bit machines
- [HDFS-856] - Hardcoded replication level for new files in fuse-dfs
- [HDFS-185] - Chown , chgrp , chmod operations allowed when namenode is in safemode .
- [HDFS-686] - NullPointerException is thrown while merging edit log and image
- [HDFS-423] - Unbreak FUSE build and fuse_dfs_wrapper.sh
- [HDFS-790] - c++ utils doesn't compile
- [HDFS-727] - bug setting block size hdfsOpenFile
- [HDFS-596] - Memory leak in libhdfs: hdfsFreeFileInfo() in libhdfs does not free memory for mOwner and mGroup
- [HDFS-677] - Rename failure due to quota results in deletion of src directory
- [HDFS-732] - HDFS files are ending up truncated
Improvement
- [HDFS-1205] - FSDatasetAsyncDiskService should name its threads
- [HDFS-1161] - Make DN minimum valid volumes configurable
- [HDFS-1160] - Improve some FSDataset warnings and comments
- [HDFS-457] - better handling of volume failure in Data Node storage
- [HDFS-455] - Make NN and DN handle in a intuitive way comma-separated configuration strings
- [HDFS-412] - Hadoop JMX usage makes Nagios monitoring impossible
- [HDFS-758] - Improve reporting of progress of decommissioning
- [HDFS-496] - Use PureJavaCrc32 in HDFS
New Feature
- [HDFS-528] - Add ability for safemode to wait for a minimum number of live datanodes
MapReduce
Bug
- [MAPREDUCE-1280] - Eclipse Plugin does not work with Eclipse Ganymede (3.4)
- [MAPREDUCE-1182] - Reducers fail with OutOfMemoryError while copying Map outputs
- [MAPREDUCE-1505] - Cluster class should create the rpc client only when needed
- [MAPREDUCE-118] - Job.getJobID() will always return null
- [MAPREDUCE-1163] - hdfsJniHelper.h: Yahoo! specific paths are encoded
- [MAPREDUCE-1536] - DataDrivenDBInputFormat does not split date columns correctly.
- [MAPREDUCE-1480] - CombineFileRecordReader does not properly initialize child RecordReader
- [MAPREDUCE-1395] - Sqoop does not check return value of Job.waitForCompletion()
- [MAPREDUCE-1313] - NPE in FieldFormatter if escape character is set and field is null
- [MAPREDUCE-1394] - Sqoop generates incorrect URIs in paths sent to Hive
- [MAPREDUCE-1155] - Streaming tests swallow exceptions
- [MAPREDUCE-433] - TestReduceFetch failed.
- [MAPREDUCE-1212] - Mapreduce contrib project ivy dependencies are not included in binary target
- [MAPREDUCE-1310] - CREATE TABLE statements for Hive do not correctly specify delimiters
- [MAPREDUCE-1174] - Sqoop improperly handles table/column names which are reserved sql words
- [MAPREDUCE-1235] - java.io.IOException: Cannot convert value '0000-00-00 00:00:00' from column 6 to TIMESTAMP.
- [MAPREDUCE-1146] - Sqoop dependencies break Ecpilse build on Linux
- [MAPREDUCE-1148] - SQL identifiers are a superset of Java identifiers
- [MAPREDUCE-1285] - DistCp cannot handle -delete if destination is local filesystem
- [MAPREDUCE-764] - TypedBytesInput's readRaw() does not preserve custom type codes
- [MAPREDUCE-1293] - AutoInputFormat doesn't work with non-default FileSystems
- [MAPREDUCE-1131] - Using profilers other than hprof can cause JobClient to report job failure
- [MAPREDUCE-1059] - distcp can generate uneven map task assignments
- [MAPREDUCE-1128] - MRUnit Allows Iteration Twice
- [MAPREDUCE-112] - Reduce Input Records and Reduce Output Records counters are not being set when using the new Mapreduce reducer API
- [MAPREDUCE-1089] - Fair Scheduler preemption triggers NPE when tasks are scheduled but not running
- [MAPREDUCE-1070] - Deadlock in FairSchedulerServlet
- [MAPREDUCE-968] - NPE in distcp encountered when placing _logs directory on S3FileSystem
- [MAPREDUCE-693] - Conf files not moved to "done" subdirectory after JT restart
- [MAPREDUCE-683] - TestJobTrackerRestart fails with Map task completion events ordering mismatch
- [MAPREDUCE-416] - Move the completed jobs' history files to a DONE subdirectory inside the configured history directory
- [MAPREDUCE-971] - distcp does not always remove distcp.tmp.dir
- [MAPREDUCE-923] - Sqoop's ORM uses URLDecoder on a file, which replaces plus signs in a jar file name with spaces
- [MAPREDUCE-840] - DBInputFormat leaves open transaction
- [MAPREDUCE-825] - JobClient completion poll interval of 5s causes slow tests in local mode
- [MAPREDUCE-792] - javac warnings in DBInputFormat
- [MAPREDUCE-716] - org.apache.hadoop.mapred.lib.db.DBInputformat not working with oracle
- [MAPREDUCE-799] - Some of MRUnit's self-tests were not being run
- [MAPREDUCE-685] - Sqoop will fail with OutOfMemory on large tables using mysql
- [MAPREDUCE-703] - Sqoop requires dependency on hsqldb in ivy
- [MAPREDUCE-415] - JobControl Job does always has an unassigned name
- [MAPREDUCE-680] - Reuse of Writable objects is improperly handled by MRUnit
- [MAPREDUCE-714] - JobConf.findContainingJar unescapes unnecessarily on Linux
Improvement
- [MAPREDUCE-1785] - Add streaming config option for not emitting the key
- [MAPREDUCE-1423] - Improve performance of CombineFileInputFormat when multiple pools are configured
- [MAPREDUCE-364] - Change org.apache.hadoop.examples.MultiFileWordCount to use new mapreduce api.
- [MAPREDUCE-1467] - Add a --verbose flag to Sqoop
- [MAPREDUCE-1224] - Calling "SELECT t.* from <table> AS t" to get meta information is too expensive for big tables
- [MAPREDUCE-370] - Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.
- [MAPREDUCE-999] - Improve Sqoop test speed and refactor tests
- [MAPREDUCE-967] - TaskTracker does not need to fully unjar job jars
- [MAPREDUCE-814] - Move completed Job history files to HDFS
- [MAPREDUCE-906] - Updated Sqoop documentation
- [MAPREDUCE-907] - Sqoop should use more intelligent splits
- [MAPREDUCE-885] - More efficient SQL queries for DBInputFormat
- [MAPREDUCE-876] - Sqoop import of large tables can time out
- [MAPREDUCE-918] - Test hsqldb server should be memory-only.
- [MAPREDUCE-875] - Make DBRecordReader execute queries lazily
- [MAPREDUCE-750] - Extensible ConnManager factory API
- [MAPREDUCE-749] - Make Sqoop unit tests more Hudson-friendly
- [MAPREDUCE-910] - MRUnit should support counters
- [MAPREDUCE-797] - MRUnit MapReduceDriver should support combiners
- [MAPREDUCE-782] - Use PureJavaCrc32 in mapreduce spills
- [MAPREDUCE-789] - Oracle support for Sqoop
- [MAPREDUCE-816] - Rename "local" mysql import to "direct"
- [MAPREDUCE-710] - Sqoop should read and transmit passwords in a more secure manner
- [MAPREDUCE-713] - Sqoop has some superfluous imports
- [MAPREDUCE-674] - Sqoop should allow a "where" clause to avoid having to export entire tables
- [MAPREDUCE-675] - Sqoop should allow user-defined class and package names
- [MAPREDUCE-692] - Make Hudson run Sqoop unit tests
New Feature
- [MAPREDUCE-1017] - Compression and output splitting for Sqoop
- [MAPREDUCE-768] - Configuration information should generate dump in a standard format.
- [MAPREDUCE-551] - Add preemption to the fair scheduler
- [MAPREDUCE-987] - Exposing MiniDFS and MiniMR clusters as a single process command-line
- [MAPREDUCE-461] - Enable ServicePlugins for the JobTracker
- [MAPREDUCE-938] - Postgresql support for Sqoop
- [MAPREDUCE-798] - MRUnit should be able to test a succession of MapReduce passes
- [MAPREDUCE-800] - MRUnit should support the new API
- [MAPREDUCE-705] - User-configurable quote and delimiter characters for Sqoop records and record reparsing
Test