CDH 3 Release Notes
The following lists all Apache Hadoop Jiras included in CDH 3
that are not included in the Apache Hadoop base version 0.20.2. The
hadoop-0.20.2+887.CHANGES.txt
file lists all changes included in CDH 3. The patch for each
change can be found in the cloudera/patches directory in the release tarball.
Changes Not In Apache Hadoop 0.20.2
CDH
Bug
- [DISTRO-90] - FUSE can pick up the wrong libjvm.so
- [DISTRO-44] - Hadoop core POM missing jackson dependency
- [DISTRO-73] - FileSystems leaked when user logs on different FS URI than submit dir
- [DISTRO-38] - Autotools cannot find libssl on fedora
- [DISTRO-27] - CombineFileInputFormat incompatible
Improvement
- [DISTRO-29] - Figure out whether HBase Thrift server or HUE gets to own port 9090
- [DISTRO-32] - Make the default example conf support Hue
- [DISTRO-1] - Support default-jre in order to be able to install the hadoop packages using the default jvm in Ubuntu
Common
Bug
- [HADOOP-7140] - IPC Reader threads do not stop when server stops
- [HADOOP-6899] - RawLocalFileSystem#setWorkingDir() does not work for relative names
- [HADOOP-6669] - zlib.compress.level ignored for DefaultCodec initialization
- [HADOOP-7115] - Add a cache for getpwuid_r and getpwgid_r calls
- [HADOOP-5489] - hadoop-env.sh still refers to java1.5
- [HADOOP-5050] - TestDFSShell fails intermittently
- [HADOOP-7122] - Timed out shell commands leak Timer threads
- [HADOOP-7118] - NPE in Configuration.writeXml
- [HADOOP-5836] - Bug in S3N handling of directory markers using an object with a trailing "/" causes jobs to fail
- [HADOOP-7093] - Servlets should default to text/plain
- [HADOOP-7101] - UserGroupInformation.getCurrentUser() fails when called from non-Hadoop JAAS context
- [HADOOP-7089] - Fix link resolution logic in hadoop-config.sh
- [HADOOP-5476] - calling new SequenceFile.Reader(...) leaves an InputStream open, if the given sequence file is broken
- [HADOOP-7070] - JAAS configuration should delegate unknown application names to pre-existing configuration
- [HADOOP-7082] - Configuration.writeXML should not hold lock while outputting
- [HADOOP-6663] - BlockDecompressorStream get EOF exception when decompressing the file compressed from empty file
- [HADOOP-6496] - HttpServer sends wrong content-type for CSS files (and others)
- [HADOOP-6907] - Rpc client doesn't use the per-connection conf to figure out server's Kerberos principal
- [HADOOP-6815] - refreshSuperUserGroupsConfiguration should use server side configuration for the refresh
- [HADOOP-6951] - Distinct minicluster services (e.g. NN and JT) overwrite each other's service policies
- [HADOOP-6946] - SecurityUtils' TGT fetching does not fall back to "login" user
- [HADOOP-6881] - The efficient comparators aren't always used except for BytesWritable and Text
- [HADOOP-6939] - Inconsistent lock ordering in AbstractDelegationTokenSecretManager
- [HADOOP-5861] - s3n files are not getting split by default
- [HADOOP-6925] - BZip2Codec incorrectly implements read()
- [HADOOP-6928] - Fix BooleanWritable comparator in 0.20
- [HADOOP-6833] - IPC leaks call parameters when exceptions thrown
- [HADOOP-6781] - security audit log shouldn't have exception in it.
- [HADOOP-6776] - UserGroupInformation.createProxyUser's javadoc is broken
- [HADOOP-6760] - WebServer shouldn't increase port number in case of negative port setting caused by Jetty's race
- [HADOOP-6756] - Clean up and add documentation for configuration keys in CommonConfigurationKeys.java
- [HADOOP-6715] - AccessControlList.toString() returns empty string when we set acl to "*"
- [HADOOP-6757] - NullPointerException for hadoop clients launched from streaming tasks
- [HADOOP-6631] - FileUtil.fullyDelete() should continue to delete other files despite failure at any level.
- [HADOOP-6701] - Incorrect exit codes for "dfs -chown", "dfs -chgrp"
- [HADOOP-6640] - FileSystem.get() does RPC retries within a static synchronized block
- [HADOOP-6710] - Symbolic umask for file creation is not consistent with posix
- [HADOOP-6670] - UserGroupInformation doesn't support use in hash tables
- [HADOOP-6716] - System won't start in non-secure mode when kerb5.conf (edu.mit.kerberos on Mac) is not present
- [HADOOP-6706] - Relogin behavior for RPC clients could be improved
- [HADOOP-6718] - Client does not close connection when an exception happens during SASL negotiation
- [HADOOP-6545] - Cached FileSystem objects can lead to wrong token being used in setting up connections
- [HADOOP-6687] - user object in the subject in UGI should be reused in case of a relogin.
- [HADOOP-5958] - Use JDK 1.6 File APIs in DF.java wherever possible
- [HADOOP-6682] - NetUtils:normalizeHostName does not process hostnames starting with [a-f] correctly
- [HADOOP-6656] - Security framework needs to renew Kerberos tickets while the process is running
- [HADOOP-6653] - NullPointerException in setupSaslConnection when browsing directories
- [HADOOP-6652] - ShellBasedUnixGroupsMapping shouldn't have a cache
- [HADOOP-6649] - login object in UGI should be inside the subject
- [HADOOP-6648] - Credentials should ignore null tokens
- [HADOOP-6647] - balancer fails with "is not authorized for protocol interface NamenodeProtocol" in secure environment
- [HADOOP-6644] - util.Shell getGROUPS_FOR_USER_COMMAND method name - should use common naming convention
- [HADOOP-6634] - AccessControlList uses full-principal names to verify acls causing queue-acls to fail
- [HADOOP-6642] - Fix javac, javadoc, findbugs warnings
- [HADOOP-6638] - try to relogin in a case of failed RPC connection (expired tgt) only in case the subject is loginUser or proxyUgi.realUser.
- [HADOOP-6613] - RPC server should check for version mismatch first
- [HADOOP-5592] - Hadoop Streaming - GzipCodec
- [HADOOP-6627] - "Bad Connection to FS" message in FSShell should print message from the exception
- [HADOOP-6598] - Remove verbose logging from the Groups class
- [HADOOP-6620] - NPE if renewer is passed as null in getDelegationToken
- [HADOOP-6612] - Protocols RefreshUserToGroupMappingsProtocol and RefreshAuthorizationPolicyProtocol will fail with security enabled
- [HADOOP-6603] - Provide workaround for issue with Kerberos not resolving cross-realm principal
- [HADOOP-6609] - Deadlock in DFSClient#getBlockLocations even with the security disabled
- [HADOOP-5561] - Javadoc-dev ant target runs out of heap space
- [HADOOP-6549] - TestDoAsEffectiveUser should use ip address of the host for superuser ip check
- [HADOOP-6558] - archive does not work with distcp -update
- [HADOOP-6577] - IPC server response buffer reset threshold should be configurable
- [HADOOP-6551] - Delegation tokens when renewed or cancelled should throw an exception that explains what went wrong
- [HADOOP-6572] - RPC responses may be out-of-order with respect to SASL
- [HADOOP-6560] - HarFileSystem throws NPE for har://hdfs-/foo
- [HADOOP-6552] - KEYTAB_KERBEROS_OPTIONS in UserGroupInformation should have options for automatic renewal of keytab based tickets
- [HADOOP-6521] - FsPermission:SetUMask not updated to use new-style umask setting.
- [HADOOP-6544] - fix ivy settings to include JSON jackson.codehause.org libs for .20
- [HADOOP-6520] - UGI should load tokens from the environment
- [HADOOP-6495] - Identifier should be serialized after the password is created In Token constructor
- [HADOOP-4041] - IsolationRunner does not work as documented
- [HADOOP-5737] - UGI checks in testcases are broken
- [HADOOP-6132] - RPC client opens an extra connection for VersionedProtocol
- [HADOOP-5824] - remove OP_READ_METADATA functionality from Datanode
- [HADOOP-6441] - Prevent remote CSS attacks in Hostname and UTF-7.
- [HADOOP-5582] - Hadoop Vaidya throws number format exception due to changes in the job history counters string format (escaped compact representation).
- [HADOOP-4933] - ConcurrentModificationException in JobHistory.java
- [HADOOP-6234] - Permission configuration files should use octal and symbolic
- [HADOOP-6344] - rm and rmr fail to correctly move the user's files to the trash prior to deleting when they are over quota.
- [HADOOP-6227] - Configuration does not lock parameters marked final if they have no value.
- [HADOOP-5780] - Fix slightly confusing log from "-metaSave" on NameNode
- [HADOOP-5420] - Support killing of process groups in LinuxTaskController binary
- [HADOOP-5488] - HADOOP-2721 doesn't clean up descendant processes of a jvm that exits cleanly after running a task successfully
- [HADOOP-5980] - LD_LIBRARY_PATH not passed to tasks spawned off by LinuxTaskController
- [HADOOP-5801] - JobTracker should refresh the hosts list upon recovery
- [HADOOP-5818] - Revert the renaming from checkSuperuserPrivilege to checkAccess by HADOOP-5643
- [HADOOP-5739] - After JobTracker restart Capacity Schduler does not schedules pending tasks from already running tasks.
- [HADOOP-5203] - TT's version build is too restrictive
- [HADOOP-6762] - exception while doing RPC I/O closes channel
- [HADOOP-6722] - NetUtils.connect should check that it hasn't connected a socket to itself
- [HADOOP-6724] - IPC doesn't properly handle IOEs thrown by socket factory
- [HADOOP-6723] - unchecked exceptions thrown in IPC Connection orphan clients
- [HADOOP-6254] - s3n fails with SocketTimeoutException
- [HADOOP-6522] - TestUTF8 fails
- [HADOOP-6643] - Set executable bit for python cloud scripts in the distribution
- [HADOOP-2366] - Space in the value for dfs.data.dir can cause great problems
- [HADOOP-6453] - Hadoop wrapper script shouldn't ignore an existing JAVA_LIBRARY_PATH
- [HADOOP-6460] - Namenode runs of out of memory due to memory leak in ipc Server
- [HADOOP-6505] - sed in build.xml fails
- [HADOOP-6503] - contrib projects should pull in the ivy-fetched libs from the root project
- [HADOOP-5647] - TestJobHistory fails if /tmp/_logs is not writable to. Testcase should not depend on /tmp
- [HADOOP-6462] - contrib/cloud failing, target "compile" does not exist
- [HADOOP-6184] - Provide a configuration dump in json format.
- [HADOOP-6269] - Missing synchronization for defaultResources in Configuration.addResource
- [HADOOP-5891] - If dfs.http.address is default, SecondaryNameNode can't find NameNode
- [HADOOP-4655] - FileSystem.CACHE should be ref-counted
- [HADOOP-5981] - HADOOP-2838 doesnt work as expected
- [HADOOP-5738] - Split waiting tasks field in JobTracker metrics to individual tasks
- [HADOOP-5442] - The job history display needs to be paged
- [HADOOP-5650] - Namenode log that indicates why it is not leaving safemode may be confusing
- [HADOOP-5805] - problem using top level s3 buckets as input/output directories
- [HADOOP-5656] - Counter for S3N Read Bytes does not work
- [HADOOP-3327] - Shuffling fetchers waited too long between map output fetch re-tries
Improvement
- [HADOOP-6879] - Provide SSH based (Jsch) remote execution API for system tests
- [HADOOP-7114] - FsShell should dump all exceptions at DEBUG level
- [HADOOP-6713] - The RPC server Listener thread is a scalability bottleneck
- [HADOOP-6859] - Introduce additional statistics to FileSystem
- [HADOOP-6864] - Provide a JNI-based implementation of ShellBasedUnixGroupsNetgroupMapping (implementation of GroupMappingServiceProvider)
- [HADOOP-6818] - Provide a JNI-based implementation of GroupMappingServiceProvider
- [HADOOP-6882] - Update the patch level of Jetty
- [HADOOP-3953] - Sticky bit for directories
- [HADOOP-7110] - Implement chmod with JNI
- [HADOOP-7072] - Remove java5 dependencies from build
- [HADOOP-6578] - Configuration should trim whitespace around a lot of value types
- [HADOOP-6813] - Add a new newInstance method in FileSystem that takes a "user" as argument
- [HADOOP-6985] - Suggest that HADOOP_OPTS be preserved in hadoop-env.sh.template
- [HADOOP-6995] - Allow wildcards to be used in ProxyUsers configurations
- [HADOOP-6988] - Add support for reading multiple hadoop delegation token files
- [HADOOP-6950] - Suggest that HADOOP_CLASSPATH should be preserved in hadoop-env.sh.template
- [HADOOP-6745] - adding some java doc to Server.RpcMetrics, UGI
- [HADOOP-6693] - Add metrics to track kerberos login activity
- [HADOOP-6674] - Performance Improvement in Secure RPC
- [HADOOP-6661] - User document for UserGroupInformation.doAs
- [HADOOP-6632] - Support for using different Kerberos keys for different instances of Hadoop services
- [HADOOP-6526] - Need mapping from long principal names to local OS user names
- [HADOOP-6633] - normalize property names for JT/NN kerberos principal names in configuration
- [HADOOP-6569] - FsShell#cat should avoid calling unecessary getFileStatus before opening a file to read
- [HADOOP-6584] - Provide Kerberized SSL encryption for webservices
- [HADOOP-6589] - Better error messages for RPC clients when authentication fails
- [HADOOP-6599] - Split RPC metrics into summary and detailed metrics
- [HADOOP-6596] - Should add version to the serialization of DelegationToken
- [HADOOP-6579] - A utility for reading and writing tokens into a URL safe string.
- [HADOOP-6543] - Allow authentication-enabled RPC clients to connect to authentication-disabled RPC servers
- [HADOOP-6467] - Performance improvement for liststatus on directories in hadoop archives.
- [HADOOP-6583] - Capture metrics for authentication/authorization at the RPC layer
- [HADOOP-6559] - The RPC client should try to re-login when it detects that the TGT expired
- [HADOOP-2141] - speculative execution start up condition based on completion time
- [HADOOP-5879] - GzipCodec should read compression level etc from configuration
- [HADOOP-6161] - Add get/setEnum to Configuration
- [HADOOP-6204] - Implementing aspects development and fault injeciton framework for Hadoop
- [HADOOP-6299] - Use JAAS LoginContext for our login
- [HADOOP-5771] - Create unit test for LinuxTaskController
- [HADOOP-4656] - Add a user to groups mapping service
- [HADOOP-6203] - Improve error message when moving to trash fails due to quota issue
- [HADOOP-5675] - DistCp should not launch a job if it is not necessary
- [HADOOP-6343] - Stack trace of any runtime exceptions should be recorded in the server logs.
- [HADOOP-6304] - Use java.io.File.set{Readable|Writable|Executable} where possible in RawLocalFileSystem
- [HADOOP-6284] - Any hadoop commands crashing jvm (SIGBUS) when /tmp (tmpfs) is full
- [HADOOP-5976] - create script to provide classpath for external tools
- [HADOOP-5784] - The length of the heartbeat cycle should be configurable.
- [HADOOP-5419] - Provide a way for users to find out what operations they can do on which M/R queues
- [HADOOP-5396] - Queue ACLs should be refreshed without requiring a restart of the job tracker
- [HADOOP-6714] - FsShell 'hadoop fs -text' does not support compression codecs
- [HADOOP-1849] - IPC server max queue size should be configurable
- [HADOOP-3659] - Patch to allow hadoop native to compile on Mac OS X
- [HADOOP-4885] - Try to restore failed replicas of Name Node storage (at checkpoint time)
- [HADOOP-6667] - RPC.waitForProxy should retry through NoRouteToHostException
- [HADOOP-5687] - Hadoop NameNode throws NPE if fs.default.name is the default value
- [HADOOP-6454] - Create setup.py for EC2 cloud scripts
- [HADOOP-6444] - Support additional security group option in hadoop-ec2 script
- [HADOOP-6426] - Create ant build for running EC2 unit tests
- [HADOOP-5625] - Add I/O duration time in client trace
- [HADOOP-5222] - Add offset in client trace
- [HADOOP-6400] - Log errors getting Unix UGI
- [HADOOP-5640] - Allow ServicePlugins to hook callbacks into key service events
- [HADOOP-6312] - Configuration sends too much data to log4j
- [HADOOP-6279] - Add JVM memory usage to JvmMetrics
- [HADOOP-6133] - ReflectionUtils performance regression
- [HADOOP-2838] - Add HADOOP_LIBRARY_PATH config setting so Hadoop will include external directories for jni
- [HADOOP-5733] - Add map/reduce slot capacity and lost map/reduce slot capacity to JobTracker metrics
- [HADOOP-4842] - Streaming combiner should allow command, not just JavaClass
- [HADOOP-6267] - build-contrib.xml unnecessarily enforces that contrib projects be located in contrib/ dir
- [HADOOP-4936] - Improvements to TestSafeMode
- [HADOOP-4675] - Current Ganglia metrics implementation is incompatible with Ganglia 3.1
- [HADOOP-5450] - Add support for application-specific typecodes to typed bytes
- [HADOOP-1722] - Make streaming to handle non-utf8 byte array
- [HADOOP-6166] - Improve PureJavaCrc32
- [HADOOP-6148] - Implement a pure Java CRC32 calculator
- [HADOOP-5968] - Sqoop should only print a warning about mysql import speed once
- [HADOOP-5967] - Sqoop should only use a single map task
- [HADOOP-5613] - change S3Exception to checked exception
- [HADOOP-5240] - 'ant javadoc' does not check whether outputs are up to date and always rebuilds
New Feature
- [HADOOP-5913] - Allow administrators to be able to start and stop queues
- [HADOOP-6889] - Make RPC to have an option to timeout
- [HADOOP-5170] - Set max map/reduce tasks on a per-job basis, either per-node or cluster-wide
- [HADOOP-6408] - Add a /conf servlet to dump running configuration
- [HADOOP-5752] - Provide examples of using offline image viewer (oiv) to analyze hadoop file systems
- [HADOOP-5467] - Create an offline fsimage image viewer
- [HADOOP-6832] - Provide a web server plugin that uses a static user for the web UI
- [HADOOP-6568] - Authorization for default servlets
- [HADOOP-6600] - mechanism for authorization check for inter-server protocols
- [HADOOP-6580] - UGI should contain authentication method.
- [HADOOP-6573] - Delegation Tokens should be persisted.
- [HADOOP-6586] - Log authentication and authorization failures and successes
- [HADOOP-6566] - Hadoop daemons should not start up if the ownership/permissions on the directories used at runtime are misconfigured
- [HADOOP-6332] - Large-scale Automated Test Framework
- [HADOOP-6547] - Move the Delegation Token feature to common since both HDFS and MapReduce needs it
- [HADOOP-6510] - doAs for proxy user
- [HADOOP-6419] - Change RPC layer to support SASL based mutual authentication
- [HADOOP-6538] - Set hadoop.security.authentication to "simple" by default
- [HADOOP-6337] - Update FilterInitializer class to be more visible and take a conf for further development
- [HADOOP-6517] - Ability to add/get tokens from UserGroupInformation
- [HADOOP-4268] - Permission checking in fsck
- [HADOOP-6415] - Adding a common token interface for both job token and delegation token
- [HADOOP-4359] - Access Token: Support for data access authorization checking on DataNodes
- [HADOOP-5643] - Ability to blacklist tasktracker
- [HADOOP-4490] - Map and Reduce tasks should run as the user who submitted the job
- [HADOOP-4930] - Implement setuid executable for Linux to assist in launching tasks as job owners
- [HADOOP-6433] - Add AsyncDiskService that is used in both hdfs and mapreduce
- [HADOOP-6382] - publish hadoop jars to apache mvn repo.
- [HADOOP-4012] - Providing splitting support for bzip2 compressed files
- [HADOOP-4368] - Superuser privileges required to do "df"
- [HADOOP-6466] - Add a ZooKeeper service to the cloud scripts
- [HADOOP-6392] - Run namenode and jobtracker on separate EC2 instances
- [HADOOP-6108] - Add support for EBS storage on EC2
- [HADOOP-5257] - Export namenode/datanode functionality through a pluggable RPC layer
- [HADOOP-5469] - Exposing Hadoop metrics via HTTP
- [HADOOP-5745] - Allow setting the default value of maxRunningJobs for all pools
- [HADOOP-5887] - Sqoop should create tables in Hive metastore after importing to HDFS
- [HADOOP-5528] - Binary partitioner
- [HADOOP-5175] - Option to prohibit jars unpacking
- [HADOOP-4829] - Allow FileSystem shutdown hook to be disabled
- [HADOOP-5518] - MRUnit unit test library
- [HADOOP-5844] - Use mysqldump when connecting to local mysql instance in Sqoop
- [HADOOP-5815] - Sqoop: A database import tool for Hadoop
Task
Test
- [HADOOP-6637] - Benchmark overhead of RPC session establishment
- [HADOOP-5457] - Failing contrib tests should not stop the build
- [HADOOP-6176] - Adding a couple private methods to AccessTokenHandler for testing purposes
HDFS
Bug
- [HDFS-1597] - Batched edit log syncs can reset synctxid throw assertions
- [HDFS-1085] - hftp read failing silently
- [HDFS-1364] - HFTP client should support relogin from keytab
- [HDFS-1153] - The navigation to /dfsnodelist.jsp with invalid input parameters produces NPE and HTTP 500 error
- [HDFS-1101] - TestDiskError.testLocalDirs() fails
- [HDFS-1589] - In secure mode, Datanodes should shutdown if they come up on non-privileged ports
- [HDFS-1560] - dfs.data.dir permissions should default to 700
- [HDFS-1542] - Deadlock in Configuration.writeXml when serialized form is larger than one DFS block
- [HDFS-1250] - Namenode accepts block report from dead datanodes
- [HDFS-1464] - Fix reporting of 2NN address when dfs.secondary.http.address is default (wildcard)
- [HDFS-1377] - Quota bug for partial blocks allows quotas to be violated
- [HDFS-1301] - TestHDFSProxy need to use server side conf for ProxyUser stuff.
- [HDFS-1404] - TestNodeCount logic incorrect in branch-0.20
- [HDFS-1267] - fuse-dfs does not compile
- [HDFS-1000] - libhdfs needs to be updated to use the new UGI
- [HDFS-446] - Offline Image Viewer Ls visitor incorrectly says 'output file' instead of 'input file'
- [HDFS-1164] - TestHdfsProxy is failing
- [HDFS-1313] - HdfsProxy changes from HDFS-481 missed in y20.1xx
- [HDFS-1007] - HFTP needs to be updated to use delegation tokens
- [HDFS-1157] - Modifications introduced by HDFS-1150 are breaking aspect's bindings
- [HDFS-1130] - Pass Administrator acl to HTTPServer for common servlet access.
- [HDFS-1146] - Javadoc for getDelegationTokenSecretManager in FSNamesystem
- [HDFS-1136] - FileChecksumServlets.RedirectServlet doesn't carry forward the delegation token
- [HDFS-1006] - getImage/putImage http requests should be https for the case of security enabled.
- [HDFS-1104] - Fsck triggers full GC on NameNode
- [HDFS-1010] - HDFSProxy: Retrieve group information from UnixUserGroupInformation instead of LdapEntry
- [HDFS-481] - Bug Fixes + HdfsProxy to use proxy user to impresonate the real user
- [HDFS-955] - FSImage.saveFSImage can lose edits
- [HDFS-1080] - SecondaryNameNode image transfer should use the defined http address rather than local ip address
- [HDFS-1044] - Cannot submit mapreduce job from secure client to unsecure sever
- [HDFS-1045] - In secure clusters, re-login is necessary for https clients before opening connections
- [HDFS-1039] - Service should be set in the token in JspHelper.getUGI
- [HDFS-1036] - in DelegationTokenFetch dfs.getURI returns no port
- [HDFS-1038] - In nn_browsedfscontent.jsp fetch delegation token only if security is enabled.
- [HDFS-1015] - Intermittent failure in TestSecurityTokenEditLog
- [HDFS-1020] - The canceller and renewer for delegation tokens should be long names.
- [HDFS-1019] - Incorrect default values for delegation tokens in hdfs-default.xml
- [HDFS-1017] - browsedfs jsp should call JspHelper.getUGI rather than using createRemoteUser()
- [HDFS-1014] - Error in reading delegation tokens from edit logs.
- [HDFS-965] - TestDelegationToken fails in trunk
- [HDFS-111] - UnderReplicationBlocks should use generic types
- [HDFS-938] - Replace calls to UGI.getUserName() with UGI.getShortUserName()
- [HDFS-195] - Need to handle access token expiration when re-establishing the pipeline for dfs write
- [HDFS-781] - Metrics PendingDeletionBlocks is not decremented
- [HDFS-625] - ListPathsServlet throws NullPointerException
- [HDFS-587] - Test programs support only default queue.
- [HDFS-1260] - 0.20: Block lost when multiple DNs trying to recover it to different genstamps
- [HDFS-1254] - 0.20: mark dfs.support.append to be true by default for the 0.20-append branch
- [HDFS-1240] - TestDFSShell failing in branch-20
- [HDFS-1207] - 0.20-append: stallReplicationWork should be volatile
- [HDFS-1197] - Blocks are considered "complete" prematurely after commitBlockSynchronization or DN restart
- [HDFS-1118] - DFSOutputStream socket leak when cannot connect to DataNode
- [HDFS-1186] - 0.20: DNs should interrupt writers at start of recovery
- [HDFS-915] - Hung DN stalls write pipeline for far longer than its timeout
- [HDFS-1218] - 20 append: Blocks recovered on startup should be treated with lower priority during block synchronization
- [HDFS-445] - pread() fails when cached block locations are no longer valid
- [HDFS-1204] - 0.20: Lease expiration should recover single files, not entire lease holder
- [HDFS-1202] - DataBlockScanner throws NPE when updated before initialized
- [HDFS-606] - ConcurrentModificationException in invalidateCorruptReplicas()
- [HDFS-1141] - completeFile does not check lease ownership
- [HDFS-1215] - TestNodeCount infinite loops on branch-20-append
- [HDFS-1122] - client block verification may result in blocks in DataBlockScanner prematurely
- [HDFS-1057] - Concurrent readers hit ChecksumExceptions if following a writer to very end of file
- [HDFS-561] - Fix write pipeline READ_TIMEOUT
- [HDFS-611] - Heartbeats times from Datanodes increase when there are plenty of blocks to delete
- [HDFS-894] - DatanodeID.ipcPort is not updated when existing node re-registers
- [HDFS-142] - In 0.20, move blocks being written into a blocksBeingWritten directory
- [HDFS-988] - saveNamespace can corrupt edits log
- [HDFS-101] - DFS write pipeline : DFSClient sometimes does not detect second datanode failure
- [HDFS-909] - Race condition between rollEditLog or rollFSImage ant FSEditsLog.write operations corrupts edits log
- [HDFS-612] - FSDataset should not use org.mortbay.log.Log
- [HDFS-1024] - SecondaryNamenode fails to checkpoint because namenode fails with CancelledKeyException
- [HDFS-961] - dfs_readdir incorrectly parses paths
- [HDFS-908] - TestDistributedFileSystem fails with Wrong FS on weird hosts
- [HDFS-877] - Client-driven block verification not functioning
- [HDFS-464] - Memory leaks in libhdfs
- [HDFS-861] - fuse-dfs does not support O_RDWR
- [HDFS-860] - fuse-dfs truncate behavior causes issues with scp
- [HDFS-859] - fuse-dfs utime behavior causes issues with tar
- [HDFS-858] - Incorrect return codes for fuse-dfs
- [HDFS-857] - Incorrect type for fuse-dfs capacity can cause "df" to return negative values on 32-bit machines
- [HDFS-856] - Hardcoded replication level for new files in fuse-dfs
- [HDFS-423] - Unbreak FUSE build and fuse_dfs_wrapper.sh
- [HDFS-727] - bug setting block size hdfsOpenFile
- [HDFS-686] - NullPointerException is thrown while merging edit log and image
- [HDFS-127] - DFSClient block read failures cause open DFSInputStream to become unusable
Improvement
- [HDFS-1601] - Pipeline ACKs are sent as lots of tiny TCP packets
- [HDFS-1114] - Reducing NameNode memory usage by an alternate hash table
- [HDFS-1119] - Refactor BlocksMap with GettableSet
- [HDFS-599] - Improve Namenode robustness by prioritizing datanode heartbeats over client requests
- [HDFS-1298] - Add support in HDFS to update statistics that tracks number of file system operations in FileSystem
- [HDFS-1315] - Add fsck event to audit log and remove other audit log events corresponding to FSCK listStatus and open calls
- [HDFS-1383] - Better error messages on hftp
- [HDFS-1061] - Memory footprint optimization for INodeFile object.
- [HDFS-1307] - Add start time, end time and total time taken for FSCK to FSCK report
- [HDFS-1626] - Make BLOCK_INVALIDATE_LIMIT configurable
- [HDFS-1353] - Remove most of getBlockLocation optimization
- [HDFS-1378] - Edit log replay should track and report file offsets in case of errors
- [HDFS-1387] - Update HDFS permissions guide for security
- [HDFS-1178] - The NameNode servlets should not use RPC to connect to the NameNode
- [HDFS-1012] - documentLocation attribute in LdapEntry for HDFSProxy isn't specific to a cluster
- [HDFS-1011] - Improve Logging in HDFSProxy to include cluster name associated with the request
- [HDFS-1081] - Performance regression in DistributedFileSystem::getFileBlockLocations in secure systems
- [HDFS-1033] - In secure clusters, NN and SNN should verify that the remote principal during image and edits transfer
- [HDFS-1023] - Allow http server to start as regular principal if https principal not defined.
- [HDFS-994] - Provide methods for obtaining delegation token from Namenode for hftp and other uses
- [HDFS-998] - The servlets should quote server generated strings sent in the response
- [HDFS-786] - Implement getContentSummary(..) in HftpFileSystem
- [HDFS-946] - NameNode should not return full path name when lisitng a diretory or getting the status of a file
- [HDFS-737] - Improvement in metasave output
- [HDFS-764] - Moving Access Token implementation from Common to HDFS
- [HDFS-758] - Improve reporting of progress of decommissioning
- [HDFS-1209] - Add conf dfs.client.block.recovery.retries to configure number of block recovery attempts
- [HDFS-1210] - DFSClient should log exception when block recovery fails
- [HDFS-1205] - FSDatasetAsyncDiskService should name its threads
- [HDFS-1248] - Misc cleanup/logging improvements for branch-20-append
- [HDFS-1203] - DataNode should sleep before reentering service loop after an exception
- [HDFS-895] - Allow hflush/sync to occur in parallel with new writes to the file
- [HDFS-1211] - 0.20 append: Block receiver should not log "rewind" packets at INFO level
- [HDFS-1056] - Multi-node RPC deadlocks during block recovery
- [HDFS-1055] - Improve thread naming for DataXceivers
- [HDFS-1054] - Remove unnecessary sleep after failure in nextBlockOutputStream
- [HDFS-826] - Allow a mechanism for an application to detect that datanode(s) have died in the write pipeline
- [HDFS-1161] - Make DN minimum valid volumes configurable
- [HDFS-1160] - Improve some FSDataset warnings and comments
- [HDFS-457] - better handling of volume failure in Data Node storage
- [HDFS-1013] - Miscellaneous improvements to HTML markup for web UIs
- [HDFS-455] - Make NN and DN handle in a intuitive way comma-separated configuration strings
- [HDFS-412] - Hadoop JMX usage makes Nagios monitoring impossible
- [HDFS-630] - In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.
- [HDFS-496] - Use PureJavaCrc32 in HDFS
New Feature
- [HDFS-1318] - HDFS Namenode and Datanode WebUI information needs to be accessible programmatically for scripts
- [HDFS-1330] - Make RPCs to DataNodes timeout
- [HDFS-461] - Analyzing file size distribution.
- [HDFS-1150] - Verify datanodes' identities to clients in secure clusters
- [HDFS-1096] - allow dfsadmin/mradmin refresh of superuser proxy group mappings
- [HDFS-999] - Secondary namenode should login using kerberos if security is configured
- [HDFS-985] - HDFS should issue multiple RPCs for listing a large directory
- [HDFS-992] - Re-factor block access token implementation to conform to the generic Token interface in Common
- [HDFS-814] - Add an api to get the visible length of a DFSDataInputStream.
- [HDFS-204] - Revive number of files listed metrics
- [HDFS-1005] - Fsck security
- [HDFS-991] - Allow browsing the filesystem over http using delegation tokens
- [HDFS-899] - Delegation Token Implementation
- [HDFS-595] - FsPermission tests need to be updated for new octal configuration parameter from HADOOP-6234
- [HDFS-200] - In HDFS, sync() not yet guarantees data available to the new readers
- [HDFS-528] - Add ability for safemode to wait for a minimum number of live datanodes
Task
- [HDFS-1266] - Missing license headers in branch-20-append
Test
- [HDFS-907] - Add tests for getBlockLocations and totalLoad metrics.
- [HDFS-409] - Add more access token tests
- [HDFS-1252] - TestDFSConcurrentFileOperations broken in 0.20-appendj
- [HDFS-1247] - Improvements to HDFS-1204 test
- [HDFS-1246] - Manual tool to test sync against a real cluster
- [HDFS-1243] - 0.20 append: Replication tests in TestFileAppend4 should not expect immediate replication
- [HDFS-1242] - 0.20 append: Add test for appendFile() race solved in HDFS-142
- [HDFS-1244] - Misc improvements to TestFileAppend2
- [HDFS-696] - Java assertion failures triggered by tests
MapReduce
Bug
- [MAPREDUCE-2321] - TT should fail to start on secure cluster when SecureIO isn't available
- [MAPREDUCE-2289] - Permissions race can make getStagingDir fail on local filesystem
- [MAPREDUCE-2178] - Race condition in LinuxTaskController permissions handling
- [MAPREDUCE-2023] - TestDFSIO read test may not read specified bytes.
- [MAPREDUCE-2005] - TestDelegationTokenRenewal fails
- [MAPREDUCE-1961] - [gridmix3] ConcurrentModificationException when shutting down Gridmix
- [MAPREDUCE-2328] - memory-related configurations missing from mapred-default.xml
- [MAPREDUCE-1118] - Capacity Scheduler scheduling information is hard to read / should be tabular format
- [MAPREDUCE-2256] - FairScheduler fairshare preemption from multiple pools may preempt all tasks from one pool causing that pool to go below fairshare.
- [MAPREDUCE-2242] - LinuxTaskController doesn't properly escape environment variables
- [MAPREDUCE-2253] - Servlets should specify content type
- [MAPREDUCE-2082] - Race condition in writing the jobtoken password file when launching pipes jobs
- [MAPREDUCE-1085] - For tasks, "ulimit -v -1" is being run when user doesn't specify mapred.child.ulimit
- [MAPREDUCE-2277] - TestCapacitySchedulerWithJobTracker fails sometimes
- [MAPREDUCE-2238] - Undeletable build directories
- [MAPREDUCE-787] - -files, -archives should honor user given symlink path
- [MAPREDUCE-572] - If #link is missing from uri format of -cacheArchive then streaming does not throw error.
- [MAPREDUCE-1178] - MultipleInputs fails with ClassCastException
- [MAPREDUCE-2234] - If Localizer can't create task log directory, it should fail on the spot
- [MAPREDUCE-2219] - JT should not try to remove mapred.system.dir during startup
- [MAPREDUCE-1699] - JobHistory shouldn't be disabled for any reason
- [MAPREDUCE-1853] - MultipleOutputs does not cache TaskAttemptContext
- [MAPREDUCE-1621] - Streaming's TextOutputReader.getLastOutput throws NPE if it has never read any output
- [MAPREDUCE-1784] - IFile should check for null compressor
- [MAPREDUCE-1288] - DistributedCache localizes only once per cache URI
- [MAPREDUCE-2096] - Secure local filesystem IO from symlink vulnerabilities
- [MAPREDUCE-1280] - Eclipse Plugin does not work with Eclipse Ganymede (3.4)
- [MAPREDUCE-1682] - Tasks should not be scheduled after tip is killed/failed.
- [MAPREDUCE-1914] - TrackerDistributedCacheManager never cleans its input directories
- [MAPREDUCE-1538] - TrackerDistributedCacheManager can fail because the number of subdirectories reaches system limit
- [MAPREDUCE-1900] - MapReduce daemons should close FileSystems that are not needed anymore
- [MAPREDUCE-1807] - TestQueueManager can take long enough to time out
- [MAPREDUCE-1716] - Truncate logs of finished tasks to prevent node thrash due to excessive logging
- [MAPREDUCE-1442] - StackOverflowError when JobHistory parses a really long line
- [MAPREDUCE-1744] - DistributedCache creates its own FileSytem instance when adding a file/archive to the path
- [MAPREDUCE-1759] - Exception message for unauthorized user doing killJob, killTask, setJobPriority needs to be improved
- [MAPREDUCE-1754] - Replace mapred.persmissions.supergroup with an acl : mapreduce.cluster.administrators
- [MAPREDUCE-1707] - TaskRunner can get NPE in getting ugi from TaskTracker
- [MAPREDUCE-1687] - Stress submission policy does not always stress the cluster.
- [MAPREDUCE-1641] - Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives
- [MAPREDUCE-1664] - Job Acls affect Queue Acls
- [MAPREDUCE-1397] - NullPointerException observed during task failures
- [MAPREDUCE-1607] - Task controller may not set permissions for a task cleanup attempt's log directory
- [MAPREDUCE-1533] - Reduce or remove usage of String.format() usage in CapacityTaskScheduler.updateQSIObjects and Counters.makeEscapedString()
- [MAPREDUCE-1701] - AccessControlException while renewing a delegation token in not correctly handled in the JobTracker
- [MAPREDUCE-1657] - After task logs directory is deleted, tasklog servlet displays wrong error message about job ACLs
- [MAPREDUCE-1692] - Remove TestStreamedMerge from the streaming tests
- [MAPREDUCE-1617] - TestBadRecords failed once in our test runs
- [MAPREDUCE-1718] - job conf key for the services name of DelegationToken for HFTP url is constructed incorrectly in HFTPFileSystem
- [MAPREDUCE-587] - Stream test TestStreamingExitStatus fails with Out of Memory
- [MAPREDUCE-1985] - java.lang.ArrayIndexOutOfBoundsException in analysejobhistory.jsp of jobs with 0 maps
- [MAPREDUCE-1683] - Remove JNI calls from ClusterStatus cstr
- [MAPREDUCE-1635] - ResourceEstimator does not work after MAPREDUCE-842
- [MAPREDUCE-1612] - job conf file is not accessible from job history web page
- [MAPREDUCE-1611] - Refresh nodes and refresh queues doesnt work with service authorization enabled
- [MAPREDUCE-1609] - TaskTracker.localizeJob should not set permissions on job log directory recursively
- [MAPREDUCE-1610] - Forrest documentation should be updated to reflect the changes in MAPREDUCE-856
- [MAPREDUCE-1417] - Forrest documentation should be updated to reflect the changes in MAPREDUCE-744
- [MAPREDUCE-1604] - Job acls should be documented in forrest.
- [MAPREDUCE-1543] - Log messages of JobACLsManager should use security logging of HADOOP-6586
- [MAPREDUCE-1606] - TestJobACLs may timeout as there are no slots for launching JOB_CLEANUP task
- [MAPREDUCE-927] - Cleanup of task-logs should happen in TaskTracker instead of the Child
- [MAPREDUCE-1599] - MRBench reuses jobConf and credentials there in.
- [MAPREDUCE-1522] - FileInputFormat may change the file system of an input path
- [MAPREDUCE-1100] - User's task-logs filling up local disks on the TaskTrackers
- [MAPREDUCE-1422] - Changing permissions of files/dirs under job-work-dir may be needed sothat cleaning up of job-dir in all mapred-local-directories succeeds always
- [MAPREDUCE-890] - After HADOOP-4491, the user who started mapred system is not able to run job.
- [MAPREDUCE-1566] - Need to add a mechanism to import tokens and secrets into a submitted job.
- [MAPREDUCE-1421] - LinuxTaskController tests failing on trunk after the commit of MAPREDUCE-1385
- [MAPREDUCE-1559] - The DelegationTokenRenewal timer task should use the jobtracker's credentials to create the filesystem
- [MAPREDUCE-1550] - UGI.doAs should not be used for getting the history file of jobs
- [MAPREDUCE-899] - When using LinuxTaskController, localized files may become accessible to unintended users if permissions are misconfigured.
- [MAPREDUCE-1528] - TokenStorage should not be static
- [MAPREDUCE-1532] - Delegation token is obtained as the superuser
- [MAPREDUCE-1520] - TestMiniMRLocalFS fails on trunk
- [MAPREDUCE-1505] - Cluster class should create the rpc client only when needed
- [MAPREDUCE-1398] - TaskLauncher remains stuck on tasks waiting for free nodes even if task is killed.
- [MAPREDUCE-1476] - committer.needsTaskCommit should not be called for a task cleanup attempt
- [MAPREDUCE-1316] - JobTracker holds stale references to retired jobs via unreported tasks
- [MAPREDUCE-1399] - The archive command shows a null error message
- [MAPREDUCE-1435] - symlinks in cwd of the task are not handled properly after MAPREDUCE-896
- [MAPREDUCE-1186] - While localizing a DistributedCache file, TT sets permissions recursively on the whole base-dir
- [MAPREDUCE-896] - Users can set non-writable permissions on temporary files for TT and can abuse disk usage.
- [MAPREDUCE-1140] - Per cache-file refcount can become negative when tasks release distributed-cache files
- [MAPREDUCE-1284] - TestLocalizationWithLinuxTaskController fails
- [MAPREDUCE-1098] - Incorrect synchronization in DistributedCache causes TaskTrackers to freeze up during localization of Cache for tasks.
- [MAPREDUCE-408] - TestKillSubProcesses fails with assertion failure sometimes
- [MAPREDUCE-1342] - Potential JT deadlock in faulty TT tracking
- [MAPREDUCE-1124] - TestGridmixSubmission fails sometimes
- [MAPREDUCE-1143] - runningMapTasks counter is not properly decremented in case of failed Tasks.
- [MAPREDUCE-676] - Existing diagnostic rules fail for MAP ONLY jobs
- [MAPREDUCE-1171] - Lots of fetch failures
- [MAPREDUCE-754] - NPE in expiry thread when a TT is lost
- [MAPREDUCE-1219] - JobTracker Metrics causes undue load on JobTracker
- [MAPREDUCE-1196] - MAPREDUCE-947 incompatibly changed FileOutputCommitter
- [MAPREDUCE-1160] - Two log statements at INFO level fill up jobtracker logs
- [MAPREDUCE-1158] - running_maps is not decremented when the tasks of a job is killed/failed
- [MAPREDUCE-1062] - MRReliability test does not work with retired jobs
- [MAPREDUCE-1090] - Modify log statement in Tasktracker log related to memory monitoring to include attempt id.
- [MAPREDUCE-1105] - CapacityScheduler: It should be possible to set queue hard-limit beyond it's actual capacity
- [MAPREDUCE-1086] - hadoop commands in streaming tasks are trying to write to tasktracker's log
- [MAPREDUCE-1088] - JobHistory files should have narrower 0600 perms
- [MAPREDUCE-732] - node health check script should not log "UNHEALTHY" status for every heartbeat in INFO mode
- [MAPREDUCE-144] - TaskMemoryManager should log process-tree's status while killing tasks.
- [MAPREDUCE-1030] - Reduce tasks are getting starved in capacity scheduler
- [MAPREDUCE-1028] - Cleanup tasks are scheduled using high memory configuration, leaving tasks in unassigned state.
- [MAPREDUCE-964] - Inaccurate values in jobSummary logs
- [MAPREDUCE-945] - Test programs support only default queue.
- [MAPREDUCE-682] - Reserved tasktrackers should be removed when a node is globally blacklisted
- [MAPREDUCE-809] - Job summary logs show status of completed jobs as RUNNING
- [MAPREDUCE-771] - Setup and cleanup tasks remain in UNASSIGNED state for a long time on tasktrackers with long running high RAM tasks
- [MAPREDUCE-733] - When running ant test TestTrackerBlacklistAcrossJobs, losing task tracker heartbeat exception occurs.
- [MAPREDUCE-734] - java.util.ConcurrentModificationException observed in unreserving slots for HiRam Jobs
- [MAPREDUCE-693] - Conf files not moved to "done" subdirectory after JT restart
- [MAPREDUCE-722] - More slots are getting reserved for HiRAM job tasks then required
- [MAPREDUCE-709] - node health check script does not display the correct message on timeout
- [MAPREDUCE-522] - Rewrite TestQueueCapacities to make it simpler and avoid timeout errors
- [MAPREDUCE-516] - Fix the 'cluster drain' problem in the Capacity Scheduler wrt High RAM Jobs
- [MAPREDUCE-118] - Job.getJobID() will always return null
- [MAPREDUCE-1887] - MRAsyncDiskService does not properly absolutize volume root paths
- [MAPREDUCE-1372] - ConcurrentModificationException in JobInProgress
- [MAPREDUCE-1378] - Args in job details links on jobhistory.jsp are not URL encoded
- [MAPREDUCE-1213] - TaskTrackers restart is very slow because it deletes distributed cache directory synchronously
- [MAPREDUCE-1443] - DBInputFormat can leak connections
- [MAPREDUCE-1728] - Oracle timezone strings do not match Java
- [MAPREDUCE-1375] - TestFileArgs fails intermittently
- [MAPREDUCE-1536] - DataDrivenDBInputFormat does not split date columns correctly.
- [MAPREDUCE-1480] - CombineFileRecordReader does not properly initialize child RecordReader
- [MAPREDUCE-1436] - Deadlock in preemption code in fair scheduler
- [MAPREDUCE-1469] - Sqoop should disable speculative execution in export
- [MAPREDUCE-1395] - Sqoop does not check return value of Job.waitForCompletion()
- [MAPREDUCE-1327] - Oracle database import via sqoop fails when a table contains the column types such as TIMESTAMP(6) WITH LOCAL TIME ZONE and TIMESTAMP(6) WITH TIME ZONE
- [MAPREDUCE-1394] - Sqoop generates incorrect URIs in paths sent to Hive
- [MAPREDUCE-1313] - NPE in FieldFormatter if escape character is set and field is null
- [MAPREDUCE-1155] - Streaming tests swallow exceptions
- [MAPREDUCE-1258] - Fair scheduler event log not logging job info
- [MAPREDUCE-1212] - Mapreduce contrib project ivy dependencies are not included in binary target
- [MAPREDUCE-1310] - CREATE TABLE statements for Hive do not correctly specify delimiters
- [MAPREDUCE-1235] - java.io.IOException: Cannot convert value '0000-00-00 00:00:00' from column 6 to TIMESTAMP.
- [MAPREDUCE-1174] - Sqoop improperly handles table/column names which are reserved sql words
- [MAPREDUCE-1146] - Sqoop dependencies break Ecpilse build on Linux
- [MAPREDUCE-1148] - SQL identifiers are a superset of Java identifiers
- [MAPREDUCE-1285] - DistCp cannot handle -delete if destination is local filesystem
- [MAPREDUCE-764] - TypedBytesInput's readRaw() does not preserve custom type codes
- [MAPREDUCE-1293] - AutoInputFormat doesn't work with non-default FileSystems
- [MAPREDUCE-1131] - Using profilers other than hprof can cause JobClient to report job failure
- [MAPREDUCE-1059] - distcp can generate uneven map task assignments
- [MAPREDUCE-1128] - MRUnit Allows Iteration Twice
- [MAPREDUCE-112] - Reduce Input Records and Reduce Output Records counters are not being set when using the new Mapreduce reducer API
- [MAPREDUCE-1089] - Fair Scheduler preemption triggers NPE when tasks are scheduled but not running
- [MAPREDUCE-968] - NPE in distcp encountered when placing _logs directory on S3FileSystem
- [MAPREDUCE-683] - TestJobTrackerRestart fails with Map task completion events ordering mismatch
- [MAPREDUCE-416] - Move the completed jobs' history files to a DONE subdirectory inside the configured history directory
- [MAPREDUCE-971] - distcp does not always remove distcp.tmp.dir
- [MAPREDUCE-923] - Sqoop's ORM uses URLDecoder on a file, which replaces plus signs in a jar file name with spaces
- [MAPREDUCE-840] - DBInputFormat leaves open transaction
- [MAPREDUCE-825] - JobClient completion poll interval of 5s causes slow tests in local mode
- [MAPREDUCE-792] - javac warnings in DBInputFormat
- [MAPREDUCE-716] - org.apache.hadoop.mapred.lib.db.DBInputformat not working with oracle
- [MAPREDUCE-799] - Some of MRUnit's self-tests were not being run
- [MAPREDUCE-685] - Sqoop will fail with OutOfMemory on large tables using mysql
- [MAPREDUCE-703] - Sqoop requires dependency on hsqldb in ivy
- [MAPREDUCE-415] - JobControl Job does always has an unassigned name
- [MAPREDUCE-680] - Reuse of Writable objects is improperly handled by MRUnit
- [MAPREDUCE-714] - JobConf.findContainingJar unescapes unnecessarily on Linux
Improvement
- [MAPREDUCE-2332] - Improve error messages when MR dirs on local FS have bad ownership
- [MAPREDUCE-1545] - Add 'first-task-launched' to job-summary
- [MAPREDUCE-339] - JobTracker should give preference to failed tasks over virgin tasks so as to terminate the job ASAP if it is eventually going to fail.
- [MAPREDUCE-1936] - [gridmix3] Make Gridmix3 more customizable.
- [MAPREDUCE-1778] - CompletedJobStatusStore initialization should fail if {mapred.job.tracker.persist.jobstatus.dir} is unwritable
- [MAPREDUCE-1868] - Add read timeout on userlog pull
- [MAPREDUCE-1850] - Include job submit host information (name and ip) in jobconf and jobdetails display
- [MAPREDUCE-1521] - Protection against incorrectly configured reduces
- [MAPREDUCE-1960] - Limit the size of jobconf.
- [MAPREDUCE-1872] - Re-think (user|queue) limits on (tasks|jobs) in the CapacityScheduler
- [MAPREDUCE-1382] - MRAsyncDiscService should tolerate missing local.dir
- [MAPREDUCE-655] - Change KeyValueLineRecordReader and KeyValueTextInputFormat to use new api.
- [MAPREDUCE-369] - Change org.apache.hadoop.mapred.lib.MultipleInputs to use new api.
- [MAPREDUCE-1734] - Un-deprecate the old MapReduce API in the 0.20 branch
- [MAPREDUCE-1906] - Lower minimum heartbeat interval for tasktracker > Jobtracker
- [MAPREDUCE-2103] - task-controller shouldn't require o-r permissions
- [MAPREDUCE-2035] - Enable -Wall and fix warnings in task-controller build
- [MAPREDUCE-1711] - Gridmix should provide an option to submit jobs to the same queues as specified in the trace.
- [MAPREDUCE-1656] - JobStory should provide queue info.
- [MAPREDUCE-1317] - Reducing memory consumption of rumen objects
- [MAPREDUCE-1526] - Cache the job related information while submitting the job , this would avoid many RPC calls to JobTracker.
- [MAPREDUCE-1624] - Document the job credentials and associated details to do with delegation tokens (on the client side)
- [MAPREDUCE-1354] - Incremental enhancements to the JobTracker for better scalability
- [MAPREDUCE-1466] - FileInputFormat should save #input-files in JobConf
- [MAPREDUCE-1403] - Save file-sizes of each of the artifacts in DistributedCache in the JobConf
- [MAPREDUCE-1425] - archive throws OutOfMemoryError
- [MAPREDUCE-1440] - MapReduce should use the short form of the user names
- [MAPREDUCE-1376] - Support for varied user submission in Gridmix
- [MAPREDUCE-476] - extend DistributedCache to work locally (LocalJobRunner)
- [MAPREDUCE-711] - Move Distributed Cache from Common to Map/Reduce
- [MAPREDUCE-478] - separate jvm param for mapper and reducer
- [MAPREDUCE-1250] - Refactor job token to use a common token interface
- [MAPREDUCE-353] - Allow shuffle read and connection timeouts to be configurable
- [MAPREDUCE-1185] - URL to JT webconsole for running job and job history should be the same
- [MAPREDUCE-1231] - Distcp is very slow
- [MAPREDUCE-1048] - Show total slot usage in cluster summary on jobtracker webui
- [MAPREDUCE-1103] - Additional JobTracker metrics
- [MAPREDUCE-947] - OutputCommitter should have an abortJob method
- [MAPREDUCE-277] - Job history counters should be avaible on the UI.
- [MAPREDUCE-270] - TaskTracker could send an out-of-band heartbeat when the last running map/reduce completes
- [MAPREDUCE-817] - Add a cache for retired jobs with minimal job info and provide a way to access history file url
- [MAPREDUCE-1570] - Shuffle stage - Key and Group Comparators
- [MAPREDUCE-739] - Allow relative paths to be created inside archives.
- [MAPREDUCE-1302] - TrackerDistributedCacheManager can delete file asynchronously
- [MAPREDUCE-1489] - DataDrivenDBInputFormat should not query the database when generating only one split
- [MAPREDUCE-1785] - Add streaming config option for not emitting the key
- [MAPREDUCE-1460] - Oracle support in DataDrivenDBInputFormat
- [MAPREDUCE-1569] - Mock Contexts & Configurations
- [MAPREDUCE-1423] - Improve performance of CombineFileInputFormat when multiple pools are configured
- [MAPREDUCE-364] - Change org.apache.hadoop.examples.MultiFileWordCount to use new mapreduce api.
- [MAPREDUCE-1467] - Add a --verbose flag to Sqoop
- [MAPREDUCE-967] - TaskTracker does not need to fully unjar job jars
- [MAPREDUCE-1356] - Allow user-specified hive table name in sqoop
- [MAPREDUCE-1198] - Alternatively schedule different types of tasks in fair share scheduler
- [MAPREDUCE-1169] - Improvements to mysqldump use in Sqoop
- [MAPREDUCE-1224] - Calling "SELECT t.* from <table> AS t" to get meta information is too expensive for big tables
- [MAPREDUCE-370] - Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.
- [MAPREDUCE-999] - Improve Sqoop test speed and refactor tests
- [MAPREDUCE-814] - Move completed Job history files to HDFS
- [MAPREDUCE-906] - Updated Sqoop documentation
- [MAPREDUCE-907] - Sqoop should use more intelligent splits
- [MAPREDUCE-885] - More efficient SQL queries for DBInputFormat
- [MAPREDUCE-876] - Sqoop import of large tables can time out
- [MAPREDUCE-918] - Test hsqldb server should be memory-only.
- [MAPREDUCE-875] - Make DBRecordReader execute queries lazily
- [MAPREDUCE-750] - Extensible ConnManager factory API
- [MAPREDUCE-749] - Make Sqoop unit tests more Hudson-friendly
- [MAPREDUCE-910] - MRUnit should support counters
- [MAPREDUCE-797] - MRUnit MapReduceDriver should support combiners
- [MAPREDUCE-782] - Use PureJavaCrc32 in mapreduce spills
- [MAPREDUCE-789] - Oracle support for Sqoop
- [MAPREDUCE-816] - Rename "local" mysql import to "direct"
- [MAPREDUCE-710] - Sqoop should read and transmit passwords in a more secure manner
- [MAPREDUCE-713] - Sqoop has some superfluous imports
- [MAPREDUCE-674] - Sqoop should allow a "where" clause to avoid having to export entire tables
- [MAPREDUCE-675] - Sqoop should allow user-defined class and package names
- [MAPREDUCE-692] - Make Hudson run Sqoop unit tests
New Feature
- [MAPREDUCE-2323] - Add metrics to the fair scheduler
- [MAPREDUCE-1774] - Large-scale Automated Framework
- [MAPREDUCE-1938] - Ability for having user's classes take precedence over the system classes for tasks' classpath
- [MAPREDUCE-1733] - Authentication between pipes processes and java counterparts.
- [MAPREDUCE-1680] - Add a metrics to track the number of heartbeats processed
- [MAPREDUCE-1594] - Support for Sleep Jobs in gridmix
- [MAPREDUCE-1493] - Authorization for job-history pages
- [MAPREDUCE-1455] - Authorization for servlets
- [MAPREDUCE-1307] - Introduce the concept of Job Permissions
- [MAPREDUCE-1454] - The servlets should quote server generated strings sent in the response
- [MAPREDUCE-1430] - JobTracker should be able to renew delegation tokens for the jobs
- [MAPREDUCE-1433] - Create a Delegation token for MapReduce
- [MAPREDUCE-1457] - For secure job execution, couple of more UserGroupInformation.doAs needs to be added
- [MAPREDUCE-1432] - Add the hooks in JobTracker and TaskTracker to load tokens from the token cache into the user's UGI
- [MAPREDUCE-1383] - Allow storage and caching of delegation token.
- [MAPREDUCE-744] - Support in DistributedCache to share cache files with other users after HADOOP-4493
- [MAPREDUCE-1338] - need security keys storage solution
- [MAPREDUCE-856] - Localized files from DistributedCache should have right access-control
- [MAPREDUCE-871] - Job/Task local files have incorrect group ownership set by LinuxTaskController binary
- [MAPREDUCE-842] - Per-job local data on the TaskTracker node should have right access-control
- [MAPREDUCE-181] - Secure job submission
- [MAPREDUCE-1026] - Shuffle should be secure
- [MAPREDUCE-467] - Collect information about number of tasks succeeded / total per time unit for a tasktracker.
- [MAPREDUCE-740] - Provide summary information per job once a job is finished.
- [MAPREDUCE-532] - Allow admins of the Capacity Scheduler to set a hard-limit on the capacity of a queue
- [MAPREDUCE-211] - Provide a node health check script and run it periodically to check the node health status
- [MAPREDUCE-679] - XML-based metrics as JSP servlet for JobTracker
- [MAPREDUCE-1341] - Sqoop should have an option to create hive tables and skip the table import step
- [MAPREDUCE-707] - Provide a jobconf property for explicitly assigning a job to a pool
- [MAPREDUCE-698] - Per-pool task limits for the fair scheduler
- [MAPREDUCE-1168] - Export data to databases via Sqoop
- [MAPREDUCE-706] - Support for FIFO pools in the fair scheduler
- [MAPREDUCE-1017] - Compression and output splitting for Sqoop
- [MAPREDUCE-768] - Configuration information should generate dump in a standard format.
- [MAPREDUCE-551] - Add preemption to the fair scheduler
- [MAPREDUCE-987] - Exposing MiniDFS and MiniMR clusters as a single process command-line
- [MAPREDUCE-461] - Enable ServicePlugins for the JobTracker
- [MAPREDUCE-938] - Postgresql support for Sqoop
- [MAPREDUCE-798] - MRUnit should be able to test a succession of MapReduce passes
- [MAPREDUCE-800] - MRUnit should support the new API
- [MAPREDUCE-705] - User-configurable quote and delimiter characters for Sqoop records and record reparsing
Task
Test
- [MAPREDUCE-2331] - Add coverage of task graph servlet to fair scheduler system test
- [MAPREDUCE-2180] - Add coverage of fair scheduler servlet to system test
- [MAPREDUCE-2073] - TestTrackerDistributedCacheManager should be up-front about requirements on build environment
- [MAPREDUCE-2051] - Contribute a fair scheduler preemption system test
- [MAPREDUCE-2034] - TestSubmitJob triggers NPE instead of permissions error
- [MAPREDUCE-670] - Create target for 10 minute patch test build for mapreduce
- [MAPREDUCE-686] - Move TestSpeculativeExecution.Fake* into a separate class so that it can be used by other tests also
- [MAPREDUCE-1093] - Java assertion failures triggered by tests
- [MAPREDUCE-1092] - Enable asserts for tests by default
Build
There are 96 Cloudera build patches in this release. These patches can be found in the cloudera/patches directory in the release tarball.