CDH 3 Release Notes
The following lists all Apache Hadoop Jiras included in CDH 3
that are not included in the Apache Hadoop base version 0.20.2. The
hadoop-20-0.20.2+320-changes.log
file lists all changes included in CDH 3. The patch for each
change can be found in the cloudera/patches directory in the release tarball.
Changes Not In Hadoop 0.20.2
Common
Bug
- [HADOOP-6643] - Set executable bit for python cloud scripts in the distribution
- [HADOOP-2366] - Space in the value for dfs.data.dir can cause great problems
- [HADOOP-6453] - Hadoop wrapper script shouldn't ignore an existing JAVA_LIBRARY_PATH
- [HADOOP-6460] - Namenode runs of out of memory due to memory leak in ipc Server
- [HADOOP-6505] - sed in build.xml fails
- [HADOOP-6503] - contrib projects should pull in the ivy-fetched libs from the root project
- [HADOOP-5647] - TestJobHistory fails if /tmp/_logs is not writable to. Testcase should not depend on /tmp
- [HADOOP-6462] - contrib/cloud failing, target "compile" does not exist
- [HADOOP-6184] - Provide a configuration dump in json format.
- [HADOOP-6269] - Missing synchronization for defaultResources in Configuration.addResource
- [HADOOP-5891] - If dfs.http.address is default, SecondaryNameNode can't find NameNode
- [HADOOP-4655] - FileSystem.CACHE should be ref-counted
- [HADOOP-5981] - HADOOP-2838 doesnt work as expected
- [HADOOP-5738] - Split waiting tasks field in JobTracker metrics to individual tasks
- [HADOOP-5442] - The job history display needs to be paged
- [HADOOP-5650] - Namenode log that indicates why it is not leaving safemode may be confusing
- [HADOOP-6269] - Missing synchronization for defaultResources in Configuration.addResource
- [HADOOP-5805] - problem using top level s3 buckets as input/output directories
- [HADOOP-5656] - Counter for S3N Read Bytes does not work
- [HADOOP-3327] - Shuffling fetchers waited too long between map output fetch re-tries
Improvement
- [HADOOP-5687] - Hadoop NameNode throws NPE if fs.default.name is the default value
- [HADOOP-6454] - Create setup.py for EC2 cloud scripts
- [HADOOP-6444] - Support additional security group option in hadoop-ec2 script
- [HADOOP-6426] - Create ant build for running EC2 unit tests
- [HADOOP-5625] - Add I/O duration time in client trace
- [HADOOP-5222] - Add offset in client trace
- [HADOOP-6400] - Log errors getting Unix UGI
- [HADOOP-5640] - Allow ServicePlugins to hook callbacks into key service events
- [HADOOP-6312] - Configuration sends too much data to log4j
- [HADOOP-6279] - Add JVM memory usage to JvmMetrics
- [HADOOP-6133] - ReflectionUtils performance regression
- [HADOOP-2838] - Add HADOOP_LIBRARY_PATH config setting so Hadoop will include external directories for jni
- [HADOOP-5733] - Add map/reduce slot capacity and lost map/reduce slot capacity to JobTracker metrics
- [HADOOP-4842] - Streaming combiner should allow command, not just JavaClass
- [HADOOP-6267] - build-contrib.xml unnecessarily enforces that contrib projects be located in contrib/ dir
- [HADOOP-4936] - Improvements to TestSafeMode
- [HADOOP-4675] - Current Ganglia metrics implementation is incompatible with Ganglia 3.1
- [HADOOP-5640] - Allow ServicePlugins to hook callbacks into key service events
- [HADOOP-5450] - Add support for application-specific typecodes to typed bytes
- [HADOOP-1722] - Make streaming to handle non-utf8 byte array
- [HADOOP-6166] - Improve PureJavaCrc32
- [HADOOP-6148] - Implement a pure Java CRC32 calculator
- [HADOOP-5968] - Sqoop should only print a warning about mysql import speed once
- [HADOOP-5967] - Sqoop should only use a single map task
- [HADOOP-5613] - change S3Exception to checked exception
- [HADOOP-5240] - 'ant javadoc' does not check whether outputs are up to date and always rebuilds
New Feature
- [HADOOP-4012] - Providing splitting support for bzip2 compressed files
- [HADOOP-4368] - Superuser privileges required to do "df"
- [HADOOP-6466] - Add a ZooKeeper service to the cloud scripts
- [HADOOP-6392] - Run namenode and jobtracker on separate EC2 instances
- [HADOOP-6108] - Add support for EBS storage on EC2
- [HADOOP-5257] - Export namenode/datanode functionality through a pluggable RPC layer
- [HADOOP-5170] - Set max map/reduce tasks on a per-job basis, either per-node or cluster-wide
- [HADOOP-5469] - Exposing Hadoop metrics via HTTP
- [HADOOP-5745] - Allow setting the default value of maxRunningJobs for all pools
- [HADOOP-5887] - Sqoop should create tables in Hive metastore after importing to HDFS
- [HADOOP-5528] - Binary partitioner
- [HADOOP-5175] - Option to prohibit jars unpacking
- [HADOOP-4829] - Allow FileSystem shutdown hook to be disabled
- [HADOOP-5518] - MRUnit unit test library
- [HADOOP-5844] - Use mysqldump when connecting to local mysql instance in Sqoop
- [HADOOP-5815] - Sqoop: A database import tool for Hadoop
HDFS
Bug
- [HDFS-961] - dfs_readdir incorrectly parses paths
- [HDFS-908] - TestDistributedFileSystem fails with Wrong FS on weird hosts
- [HDFS-877] - Client-driven block verification not functioning
- [HDFS-464] - Memory leaks in libhdfs
- [HDFS-861] - fuse-dfs does not support O_RDWR
- [HDFS-860] - fuse-dfs truncate behavior causes issues with scp
- [HDFS-859] - fuse-dfs utime behavior causes issues with tar
- [HDFS-858] - Incorrect return codes for fuse-dfs
- [HDFS-857] - Incorrect type for fuse-dfs capacity can cause "df" to return negative values on 32-bit machines
- [HDFS-856] - Hardcoded replication level for new files in fuse-dfs
- [HDFS-423] - Unbreak FUSE build and fuse_dfs_wrapper.sh
- [HDFS-727] - bug setting block size hdfsOpenFile
- [HDFS-686] - NullPointerException is thrown while merging edit log and image
- [HDFS-127] - DFSClient block read failures cause open DFSInputStream to become unusable
Improvement
- [HDFS-1013] - Miscellaneous improvements to HTML markup for web UIs
- [HDFS-455] - Make NN and DN handle in a intuitive way comma-separated configuration strings
- [HDFS-412] - Hadoop JMX usage makes Nagios monitoring impossible
- [HDFS-630] - In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.
- [HDFS-496] - Use PureJavaCrc32 in HDFS
New Feature
- [HDFS-528] - Add ability for safemode to wait for a minimum number of live datanodes
Test
- [HDFS-696] - Java assertion failures triggered by tests
MapReduce
Bug
- [MAPREDUCE-1436] - Deadlock in preemption code in fair scheduler
- [MAPREDUCE-1375] - TestFileArgs fails intermittently
- [MAPREDUCE-1469] - Sqoop should disable speculative execution in export
- [MAPREDUCE-1395] - Sqoop does not check return value of Job.waitForCompletion()
- [MAPREDUCE-1327] - Oracle database import via sqoop fails when a table contains the column types such as TIMESTAMP(6) WITH LOCAL TIME ZONE and TIMESTAMP(6) WITH TIME ZONE
- [MAPREDUCE-1394] - Sqoop generates incorrect URIs in paths sent to Hive
- [MAPREDUCE-1313] - NPE in FieldFormatter if escape character is set and field is null
- [MAPREDUCE-1155] - Streaming tests swallow exceptions
- [MAPREDUCE-1258] - Fair scheduler event log not logging job info
- [MAPREDUCE-1212] - Mapreduce contrib project ivy dependencies are not included in binary target
- [MAPREDUCE-1310] - CREATE TABLE statements for Hive do not correctly specify delimiters
- [MAPREDUCE-1235] - java.io.IOException: Cannot convert value '0000-00-00 00:00:00' from column 6 to TIMESTAMP.
- [MAPREDUCE-1174] - Sqoop improperly handles table/column names which are reserved sql words
- [MAPREDUCE-1146] - Sqoop dependencies break Ecpilse build on Linux
- [MAPREDUCE-1148] - SQL identifiers are a superset of Java identifiers
- [MAPREDUCE-1285] - DistCp cannot handle -delete if destination is local filesystem
- [MAPREDUCE-764] - TypedBytesInput's readRaw() does not preserve custom type codes
- [MAPREDUCE-1293] - AutoInputFormat doesn't work with non-default FileSystems
- [MAPREDUCE-1131] - Using profilers other than hprof can cause JobClient to report job failure
- [MAPREDUCE-1059] - distcp can generate uneven map task assignments
- [MAPREDUCE-1128] - MRUnit Allows Iteration Twice
- [MAPREDUCE-112] - Reduce Input Records and Reduce Output Records counters are not being set when using the new Mapreduce reducer API
- [MAPREDUCE-1089] - Fair Scheduler preemption triggers NPE when tasks are scheduled but not running
- [MAPREDUCE-968] - NPE in distcp encountered when placing _logs directory on S3FileSystem
- [MAPREDUCE-693] - Conf files not moved to "done" subdirectory after JT restart
- [MAPREDUCE-683] - TestJobTrackerRestart fails with Map task completion events ordering mismatch
- [MAPREDUCE-416] - Move the completed jobs' history files to a DONE subdirectory inside the configured history directory
- [MAPREDUCE-971] - distcp does not always remove distcp.tmp.dir
- [MAPREDUCE-923] - Sqoop's ORM uses URLDecoder on a file, which replaces plus signs in a jar file name with spaces
- [MAPREDUCE-840] - DBInputFormat leaves open transaction
- [MAPREDUCE-825] - JobClient completion poll interval of 5s causes slow tests in local mode
- [MAPREDUCE-792] - javac warnings in DBInputFormat
- [MAPREDUCE-716] - org.apache.hadoop.mapred.lib.db.DBInputformat not working with oracle
- [MAPREDUCE-799] - Some of MRUnit's self-tests were not being run
- [MAPREDUCE-840] - DBInputFormat leaves open transaction
- [MAPREDUCE-716] - org.apache.hadoop.mapred.lib.db.DBInputformat not working with oracle
- [MAPREDUCE-685] - Sqoop will fail with OutOfMemory on large tables using mysql
- [MAPREDUCE-703] - Sqoop requires dependency on hsqldb in ivy
- [MAPREDUCE-415] - JobControl Job does always has an unassigned name
- [MAPREDUCE-680] - Reuse of Writable objects is improperly handled by MRUnit
- [MAPREDUCE-714] - JobConf.findContainingJar unescapes unnecessarily on Linux
Improvement
New Feature
- [MAPREDUCE-1341] - Sqoop should have an option to create hive tables and skip the table import step
- [MAPREDUCE-707] - Provide a jobconf property for explicitly assigning a job to a pool
- [MAPREDUCE-698] - Per-pool task limits for the fair scheduler
- [MAPREDUCE-1168] - Export data to databases via Sqoop
- [MAPREDUCE-706] - Support for FIFO pools in the fair scheduler
- [MAPREDUCE-1017] - Compression and output splitting for Sqoop
- [MAPREDUCE-768] - Configuration information should generate dump in a standard format.
- [MAPREDUCE-551] - Add preemption to the fair scheduler
- [MAPREDUCE-987] - Exposing MiniDFS and MiniMR clusters as a single process command-line
- [MAPREDUCE-461] - Enable ServicePlugins for the JobTracker
- [MAPREDUCE-938] - Postgresql support for Sqoop
- [MAPREDUCE-798] - MRUnit should be able to test a succession of MapReduce passes
- [MAPREDUCE-800] - MRUnit should support the new API
- [MAPREDUCE-705] - User-configurable quote and delimiter characters for Sqoop records and record reparsing
Task
Test