CDH 3 Release Notes
The following lists all Apache Hadoop Jiras included in CDH 3
that are not included in the Apache Hadoop base version 0.20.2. The
hadoop-0.20.2+923.418.CHANGES.txt
file lists all changes included in CDH 3. The patch for each
change can be found in the cloudera/patches directory in the release tarball.
Changes Not In Apache Hadoop 0.20.2
CDH
Bug
- [DISTRO-53] - bin/hadoop leaks pids when running a non-detached datanode via jsvc
- [DISTRO-224] - CDH packages should depend on JRE, not JDK
- [DISTRO-185] - hadoop-config.sh needs to look for the RHEL6 sun JDK path
- [DISTRO-90] - FUSE can pick up the wrong libjvm.so
- [DISTRO-44] - Hadoop core POM missing jackson dependency
- [DISTRO-73] - FileSystems leaked when user logs on different FS URI than submit dir
- [DISTRO-38] - Autotools cannot find libssl on fedora
- [DISTRO-27] - CombineFileInputFormat incompatible
Improvement
- [DISTRO-373] - UserGroupInformation "JAAS Configuration already set" log should be a debug
- [DISTRO-29] - Figure out whether HBase Thrift server or HUE gets to own port 9090
- [DISTRO-32] - Make the default example conf support Hue
- [DISTRO-1] - Support default-jre in order to be able to install the hadoop packages using the default jvm in Ubuntu
Common
Bug
- [HADOOP-8612] - Backport HADOOP-8599 to branch-1 (Non empty response when read beyond eof)
- [HADOOP-8552] - Conflict: Same security.log.file for multiple users.
- [HADOOP-6975] - integer overflow in S3InputStream for blocks > 2GB
- [HADOOP-7836] - TestSaslRPC#testDigestAuthMethodHostBasedToken fails with hostname localhost.localdomain
- [HADOOP-8587] - HarFileSystem access of harMetaCache isn't threadsafe
- [HADOOP-8586] - Fixup a bunch of SPNEGO misspellings
- [HADOOP-8355] - SPNEGO filter throws/logs exception when authentication fails
- [HADOOP-7988] - Upper case in hostname part of the principals doesn't work with kerberos.
- [HADOOP-8460] - Document proper setting of HADOOP_PID_DIR and HADOOP_SECURE_DN_PID_DIR
- [HADOOP-8445] - Token should not print the password in toString
- [HADOOP-8338] - Can't renew or cancel HDFS delegation tokens over secure RPC
- [HADOOP-6963] - Fix FileUtil.getDU. It should not include the size of the directory or follow symbolic links
- [HADOOP-7964] - Deadlock in class init.
- [HADOOP-7908] - Fix three javadoc warnings on branch-1
- [HADOOP-8269] - Fix some javadoc warnings on branch-1
- [HADOOP-7898] - Fix javadoc warnings in AuthenticationToken.java
- [HADOOP-7854] - UGI getCurrentUser is not synchronized
- [HADOOP-7721] - dfs.web.authentication.kerberos.principal expects the full hostname and does not replace _HOST with the hostname
- [HADOOP-7215] - RPC clients must connect over a network interface corresponding to the host name in the client's kerberos principal key
- [HADOOP-7649] - TestMapredGroupMappingServiceRefresh and TestRefreshUserMappings fail after HADOOP-7625
- [HADOOP-7661] - FileSystem.getCanonicalServiceName throws NPE for any file system uri that doesn't have an authority.
- [HADOOP-7602] - wordcount, sort etc on har files fails with NPE
- [HADOOP-7539] - merge hadoop archive goodness from trunk to .20
- [HADOOP-7644] - Fix the delegation token tests to use the new style renewers
- [HADOOP-7625] - TestDelegationToken is failing in 205
- [HADOOP-8151] - Error handling in snappy decompressor throws invalid exceptions
- [HADOOP-6546] - BloomMapFile can return false negatives
- [HADOOP-8512] - AuthenticatedURL should reset the Token when the server returns other than OK on authentication
- [HADOOP-8329] - Build fails with Java 7
- [HADOOP-8314] - HttpServer#hasAdminAccess should return false if authorization is enabled but user is not authenticated
- [HADOOP-8249] - invalid hadoop-auth cookies should trigger authentication if info is avail before returning HTTP 401
- [HADOOP-7982] - UserGroupInformation fails to login if thread's context classloader can't load HadoopLoginModule
- [HADOOP-8154] - DNS#getIPs shouldn't silently return the local host IP for bogus interface names
- [HADOOP-5226] - Add license headers to html and jsp files
- [HADOOP-7879] - DistributedFileSystem#createNonRecursive should also incrementWriteOps statistics.
- [HADOOP-7870] - fix SequenceFile#createWriter with boolean createParent arg to respect createParent.
- [HADOOP-7902] - skipping name rules setting (if already set) should be done on UGI initialization only
- [HADOOP-7887] - KerberosAuthenticatorHandler is not setting KerberosName name rules from configuration
- [HADOOP-7853] - multiple javax security configurations cause conflicts
- [HADOOP-7753] - Support fadvise and sync_data_range in NativeIO, add ReadaheadPool class
- [HADOOP-7629] - regression with MAPREDUCE-2289 - setPermission passed immutable FsPermission (rpc failure)
- [HADOOP-7674] - TestKerberosName fails in 20 branch.
- [HADOOP-7645] - HTTP auth tests requiring Kerberos infrastructure are not disabled on branch-0.20-security
- [HADOOP-7621] - alfredo config should be in a file not readable by users
- [HADOOP-7666] - branch-0.20-security doesn't include o.a.h.security.TestAuthenticationFilter
- [HADOOP-7665] - branch-0.20-security doesn't include SPNEGO settings in core-default.xml
- [HADOOP-7653] - tarball doesn't include .eclipse.templates
- [HADOOP-7507] - jvm metrics all use the same namespace
- [HADOOP-7053] - wrong FSNamesystem Audit logging setting in conf/log4j.properties
- [HADOOP-5464] - DFSClient does not treat write timeout of 0 properly
- [HADOOP-7428] - IPC connection is orphaned with null 'out' member
- [HADOOP-7440] - HttpServer.getParameterValues throws NPE for missing parameters
- [HADOOP-7402] - TestConfiguration doesn't clean up after itself
- [HADOOP-7040] - DiskChecker:mkdirsWithExistsCheck swallows FileNotFoundException.
- [HADOOP-7433] - Native libs are not in platform-specific dirs in the tarball
- [HADOOP-7290] - Unit test failure in TestUserGroupInformation.testGetServerSideGroups
- [HADOOP-7121] - Exceptions while serializing IPC call response are not handled well
- [HADOOP-7145] - Configuration.getLocalPath should trim whitespace from the provided directories
- [HADOOP-6947] - Kerberos relogin should set refreshKrb5Config to true
- [HADOOP-7229] - Absolute path to kinit in auto-renewal thread
- [HADOOP-7045] - TestDU fails on systems with local file systems with extended attributes
- [HADOOP-7104] - Remove unnecessary DNS reverse lookups from RPC layer
- [HADOOP-7183] - WritableComparator.get should not cache comparator objects
- [HADOOP-7156] - getpwuid_r is not thread-safe on RHEL6
- [HADOOP-7172] - SecureIO should not check owner on non-secure clusters that have no native support
- [HADOOP-7115] - Add a cache for getpwuid_r and getpwgid_r calls
- [HADOOP-7011] - KerberosName.main(...) throws NPE
- [HADOOP-7140] - IPC Reader threads do not stop when server stops
- [HADOOP-6899] - RawLocalFileSystem#setWorkingDir() does not work for relative names
- [HADOOP-6669] - zlib.compress.level ignored for DefaultCodec initialization
- [HADOOP-5489] - hadoop-env.sh still refers to java1.5
- [HADOOP-5050] - TestDFSShell fails intermittently
- [HADOOP-7122] - Timed out shell commands leak Timer threads
- [HADOOP-7118] - NPE in Configuration.writeXml
- [HADOOP-5836] - Bug in S3N handling of directory markers using an object with a trailing "/" causes jobs to fail
- [HADOOP-7093] - Servlets should default to text/plain
- [HADOOP-7101] - UserGroupInformation.getCurrentUser() fails when called from non-Hadoop JAAS context
- [HADOOP-7089] - Fix link resolution logic in hadoop-config.sh
- [HADOOP-5476] - calling new SequenceFile.Reader(...) leaves an InputStream open, if the given sequence file is broken
- [HADOOP-7070] - JAAS configuration should delegate unknown application names to pre-existing configuration
- [HADOOP-7082] - Configuration.writeXML should not hold lock while outputting
- [HADOOP-6663] - BlockDecompressorStream get EOF exception when decompressing the file compressed from empty file
- [HADOOP-6496] - HttpServer sends wrong content-type for CSS files (and others)
- [HADOOP-6907] - Rpc client doesn't use the per-connection conf to figure out server's Kerberos principal
- [HADOOP-6815] - refreshSuperUserGroupsConfiguration should use server side configuration for the refresh
- [HADOOP-6951] - Distinct minicluster services (e.g. NN and JT) overwrite each other's service policies
- [HADOOP-6946] - SecurityUtils' TGT fetching does not fall back to "login" user
- [HADOOP-6881] - The efficient comparators aren't always used except for BytesWritable and Text
- [HADOOP-6939] - Inconsistent lock ordering in AbstractDelegationTokenSecretManager
- [HADOOP-5861] - s3n files are not getting split by default
- [HADOOP-6925] - BZip2Codec incorrectly implements read()
- [HADOOP-6928] - Fix BooleanWritable comparator in 0.20
- [HADOOP-6833] - IPC leaks call parameters when exceptions thrown
- [HADOOP-6781] - security audit log shouldn't have exception in it.
- [HADOOP-6776] - UserGroupInformation.createProxyUser's javadoc is broken
- [HADOOP-6760] - WebServer shouldn't increase port number in case of negative port setting caused by Jetty's race
- [HADOOP-6756] - Clean up and add documentation for configuration keys in CommonConfigurationKeys.java
- [HADOOP-6715] - AccessControlList.toString() returns empty string when we set acl to "*"
- [HADOOP-6757] - NullPointerException for hadoop clients launched from streaming tasks
- [HADOOP-6631] - FileUtil.fullyDelete() should continue to delete other files despite failure at any level.
- [HADOOP-6701] - Incorrect exit codes for "dfs -chown", "dfs -chgrp"
- [HADOOP-6640] - FileSystem.get() does RPC retries within a static synchronized block
- [HADOOP-6710] - Symbolic umask for file creation is not consistent with posix
- [HADOOP-6670] - UserGroupInformation doesn't support use in hash tables
- [HADOOP-6716] - System won't start in non-secure mode when kerb5.conf (edu.mit.kerberos on Mac) is not present
- [HADOOP-6706] - Relogin behavior for RPC clients could be improved
- [HADOOP-6718] - Client does not close connection when an exception happens during SASL negotiation
- [HADOOP-6545] - Cached FileSystem objects can lead to wrong token being used in setting up connections
- [HADOOP-6687] - user object in the subject in UGI should be reused in case of a relogin.
- [HADOOP-5958] - Use JDK 1.6 File APIs in DF.java wherever possible
- [HADOOP-6682] - NetUtils:normalizeHostName does not process hostnames starting with [a-f] correctly
- [HADOOP-6656] - Security framework needs to renew Kerberos tickets while the process is running
- [HADOOP-6653] - NullPointerException in setupSaslConnection when browsing directories
- [HADOOP-6652] - ShellBasedUnixGroupsMapping shouldn't have a cache
- [HADOOP-6649] - login object in UGI should be inside the subject
- [HADOOP-6648] - Credentials should ignore null tokens
- [HADOOP-6647] - balancer fails with "is not authorized for protocol interface NamenodeProtocol" in secure environment
- [HADOOP-6644] - util.Shell getGROUPS_FOR_USER_COMMAND method name - should use common naming convention
- [HADOOP-6634] - AccessControlList uses full-principal names to verify acls causing queue-acls to fail
- [HADOOP-6642] - Fix javac, javadoc, findbugs warnings
- [HADOOP-6638] - try to relogin in a case of failed RPC connection (expired tgt) only in case the subject is loginUser or proxyUgi.realUser.
- [HADOOP-6613] - RPC server should check for version mismatch first
- [HADOOP-5592] - Hadoop Streaming - GzipCodec
- [HADOOP-6627] - "Bad Connection to FS" message in FSShell should print message from the exception
- [HADOOP-6598] - Remove verbose logging from the Groups class
- [HADOOP-6620] - NPE if renewer is passed as null in getDelegationToken
- [HADOOP-6612] - Protocols RefreshUserToGroupMappingsProtocol and RefreshAuthorizationPolicyProtocol will fail with security enabled
- [HADOOP-6603] - Provide workaround for issue with Kerberos not resolving cross-realm principal
- [HADOOP-6609] - Deadlock in DFSClient#getBlockLocations even with the security disabled
- [HADOOP-5561] - Javadoc-dev ant target runs out of heap space
- [HADOOP-6549] - TestDoAsEffectiveUser should use ip address of the host for superuser ip check
- [HADOOP-6558] - archive does not work with distcp -update
- [HADOOP-6577] - IPC server response buffer reset threshold should be configurable
- [HADOOP-6551] - Delegation tokens when renewed or cancelled should throw an exception that explains what went wrong
- [HADOOP-6572] - RPC responses may be out-of-order with respect to SASL
- [HADOOP-6560] - HarFileSystem throws NPE for har://hdfs-/foo
- [HADOOP-6552] - KEYTAB_KERBEROS_OPTIONS in UserGroupInformation should have options for automatic renewal of keytab based tickets
- [HADOOP-6521] - FsPermission:SetUMask not updated to use new-style umask setting.
- [HADOOP-6544] - fix ivy settings to include JSON jackson.codehause.org libs for .20
- [HADOOP-6520] - UGI should load tokens from the environment
- [HADOOP-6495] - Identifier should be serialized after the password is created In Token constructor
- [HADOOP-4041] - IsolationRunner does not work as documented
- [HADOOP-5737] - UGI checks in testcases are broken
- [HADOOP-6132] - RPC client opens an extra connection for VersionedProtocol
- [HADOOP-5824] - remove OP_READ_METADATA functionality from Datanode
- [HADOOP-6441] - Prevent remote CSS attacks in Hostname and UTF-7.
- [HADOOP-5582] - Hadoop Vaidya throws number format exception due to changes in the job history counters string format (escaped compact representation).
- [HADOOP-4933] - ConcurrentModificationException in JobHistory.java
- [HADOOP-6234] - Permission configuration files should use octal and symbolic
- [HADOOP-6344] - rm and rmr fail to correctly move the user's files to the trash prior to deleting when they are over quota.
- [HADOOP-6227] - Configuration does not lock parameters marked final if they have no value.
- [HADOOP-5780] - Fix slightly confusing log from "-metaSave" on NameNode
- [HADOOP-5420] - Support killing of process groups in LinuxTaskController binary
- [HADOOP-5488] - HADOOP-2721 doesn't clean up descendant processes of a jvm that exits cleanly after running a task successfully
- [HADOOP-5980] - LD_LIBRARY_PATH not passed to tasks spawned off by LinuxTaskController
- [HADOOP-5801] - JobTracker should refresh the hosts list upon recovery
- [HADOOP-5818] - Revert the renaming from checkSuperuserPrivilege to checkAccess by HADOOP-5643
- [HADOOP-5739] - After JobTracker restart Capacity Schduler does not schedules pending tasks from already running tasks.
- [HADOOP-5203] - TT's version build is too restrictive
- [HADOOP-6762] - exception while doing RPC I/O closes channel
- [HADOOP-6722] - NetUtils.connect should check that it hasn't connected a socket to itself
- [HADOOP-6724] - IPC doesn't properly handle IOEs thrown by socket factory
- [HADOOP-6723] - unchecked exceptions thrown in IPC Connection orphan clients
- [HADOOP-6254] - s3n fails with SocketTimeoutException
- [HADOOP-6522] - TestUTF8 fails
- [HADOOP-6643] - Set executable bit for python cloud scripts in the distribution
- [HADOOP-2366] - Space in the value for dfs.data.dir can cause great problems
- [HADOOP-6453] - Hadoop wrapper script shouldn't ignore an existing JAVA_LIBRARY_PATH
- [HADOOP-6460] - Namenode runs of out of memory due to memory leak in ipc Server
- [HADOOP-6505] - sed in build.xml fails
- [HADOOP-6503] - contrib projects should pull in the ivy-fetched libs from the root project
- [HADOOP-5647] - TestJobHistory fails if /tmp/_logs is not writable to. Testcase should not depend on /tmp
- [HADOOP-6462] - contrib/cloud failing, target "compile" does not exist
- [HADOOP-6184] - Provide a configuration dump in json format.
- [HADOOP-6269] - Missing synchronization for defaultResources in Configuration.addResource
- [HADOOP-5891] - If dfs.http.address is default, SecondaryNameNode can't find NameNode
- [HADOOP-4655] - FileSystem.CACHE should be ref-counted
- [HADOOP-5981] - HADOOP-2838 doesnt work as expected
- [HADOOP-5738] - Split waiting tasks field in JobTracker metrics to individual tasks
- [HADOOP-5442] - The job history display needs to be paged
- [HADOOP-5650] - Namenode log that indicates why it is not leaving safemode may be confusing
- [HADOOP-5805] - problem using top level s3 buckets as input/output directories
- [HADOOP-5656] - Counter for S3N Read Bytes does not work
- [HADOOP-3327] - Shuffling fetchers waited too long between map output fetch re-tries
Improvement
- [HADOOP-7301] - FSDataInputStream should expose a getWrappedStream method
- [HADOOP-7509] - Improve message when Authentication is required
- [HADOOP-7510] - Tokens should use original hostname provided instead of ip
- [HADOOP-8430] - Backport new FileSystem methods introduced by HADOOP-8014 to branch-1
- [HADOOP-4885] - Try to restore failed replicas of Name Node storage (at checkpoint time)
- [HADOOP-8350] - Improve NetUtils.getInputStream to return a stream which has a tunable timeout
- [HADOOP-8230] - Enable sync by default and disable append
- [HADOOP-8209] - Add option to relax build-version check for branch-1
- [HADOOP-7987] - Support setting the run-as user in unsecure mode
- [HADOOP-6056] - Use java.net.preferIPv4Stack to force IPv4
- [HADOOP-8098] - KerberosAuthenticatorHandler should use _HOST replacement to resolve principal name
- [HADOOP-8027] - Visiting /jmx on the daemon web interfaces may print unnecessary error in logs
- [HADOOP-6886] - LocalFileSystem Needs createNonRecursive API
- [HADOOP-6840] - Support non-recursive create() in FileSystem & SequenceFile.Writer
- [HADOOP-7457] - Remove out-of-date Chinese language documentation
- [HADOOP-6614] - RunJar should provide more diags when it can't create a temp file
- [HADOOP-7761] - Improve performance of raw comparisons
- [HADOOP-7491] - hadoop command should respect HADOOP_OPTS when given a class name
- [HADOOP-7272] - Remove unnecessary security related info logs
- [HADOOP-7325] - hadoop command - do not accept class names starting with a hyphen
- [HADOOP-7247] - Fix documentation to reflect new jar names
- [HADOOP-4794] - separate branch for HadoopVersionAnnotation
- [HADOOP-7323] - Add capability to resolve compression codec based on codec name
- [HADOOP-7189] - Add ability to enable 'debug' property in JAAS configuration
- [HADOOP-7159] - RPC server should log the client hostname when read exception happened
- [HADOOP-7154] - Should set MALLOC_ARENA_MAX in hadoop-config.sh
- [HADOOP-7173] - Remove unused fstat() call from NativeIO
- [HADOOP-7167] - Allow using a file to exclude certain tests from build
- [HADOOP-6943] - The GroupMappingServiceProvider interface should be public
- [HADOOP-6879] - Provide SSH based (Jsch) remote execution API for system tests
- [HADOOP-7114] - FsShell should dump all exceptions at DEBUG level
- [HADOOP-6713] - The RPC server Listener thread is a scalability bottleneck
- [HADOOP-6859] - Introduce additional statistics to FileSystem
- [HADOOP-6864] - Provide a JNI-based implementation of ShellBasedUnixGroupsNetgroupMapping (implementation of GroupMappingServiceProvider)
- [HADOOP-6818] - Provide a JNI-based implementation of GroupMappingServiceProvider
- [HADOOP-6882] - Update the patch level of Jetty
- [HADOOP-3953] - Sticky bit for directories
- [HADOOP-7110] - Implement chmod with JNI
- [HADOOP-7072] - Remove java5 dependencies from build
- [HADOOP-6578] - Configuration should trim whitespace around a lot of value types
- [HADOOP-6813] - Add a new newInstance method in FileSystem that takes a "user" as argument
- [HADOOP-6985] - Suggest that HADOOP_OPTS be preserved in hadoop-env.sh.template
- [HADOOP-6995] - Allow wildcards to be used in ProxyUsers configurations
- [HADOOP-6988] - Add support for reading multiple hadoop delegation token files
- [HADOOP-6950] - Suggest that HADOOP_CLASSPATH should be preserved in hadoop-env.sh.template
- [HADOOP-6745] - adding some java doc to Server.RpcMetrics, UGI
- [HADOOP-6693] - Add metrics to track kerberos login activity
- [HADOOP-6674] - Performance Improvement in Secure RPC
- [HADOOP-6661] - User document for UserGroupInformation.doAs
- [HADOOP-6632] - Support for using different Kerberos keys for different instances of Hadoop services
- [HADOOP-6526] - Need mapping from long principal names to local OS user names
- [HADOOP-6633] - normalize property names for JT/NN kerberos principal names in configuration
- [HADOOP-6569] - FsShell#cat should avoid calling unecessary getFileStatus before opening a file to read
- [HADOOP-6584] - Provide Kerberized SSL encryption for webservices
- [HADOOP-6589] - Better error messages for RPC clients when authentication fails
- [HADOOP-6599] - Split RPC metrics into summary and detailed metrics
- [HADOOP-6596] - Should add version to the serialization of DelegationToken
- [HADOOP-6579] - A utility for reading and writing tokens into a URL safe string.
- [HADOOP-6543] - Allow authentication-enabled RPC clients to connect to authentication-disabled RPC servers
- [HADOOP-6467] - Performance improvement for liststatus on directories in hadoop archives.
- [HADOOP-6583] - Capture metrics for authentication/authorization at the RPC layer
- [HADOOP-6559] - The RPC client should try to re-login when it detects that the TGT expired
- [HADOOP-2141] - speculative execution start up condition based on completion time
- [HADOOP-5879] - GzipCodec should read compression level etc from configuration
- [HADOOP-6161] - Add get/setEnum to Configuration
- [HADOOP-6204] - Implementing aspects development and fault injeciton framework for Hadoop
- [HADOOP-6299] - Use JAAS LoginContext for our login
- [HADOOP-5771] - Create unit test for LinuxTaskController
- [HADOOP-4656] - Add a user to groups mapping service
- [HADOOP-6203] - Improve error message when moving to trash fails due to quota issue
- [HADOOP-5675] - DistCp should not launch a job if it is not necessary
- [HADOOP-6343] - Stack trace of any runtime exceptions should be recorded in the server logs.
- [HADOOP-6304] - Use java.io.File.set{Readable|Writable|Executable} where possible in RawLocalFileSystem
- [HADOOP-6284] - Any hadoop commands crashing jvm (SIGBUS) when /tmp (tmpfs) is full
- [HADOOP-5976] - create script to provide classpath for external tools
- [HADOOP-5784] - The length of the heartbeat cycle should be configurable.
- [HADOOP-5419] - Provide a way for users to find out what operations they can do on which M/R queues
- [HADOOP-5396] - Queue ACLs should be refreshed without requiring a restart of the job tracker
- [HADOOP-6714] - FsShell 'hadoop fs -text' does not support compression codecs
- [HADOOP-1849] - IPC server max queue size should be configurable
- [HADOOP-3659] - Patch to allow hadoop native to compile on Mac OS X
- [HADOOP-6667] - RPC.waitForProxy should retry through NoRouteToHostException
- [HADOOP-5687] - Hadoop NameNode throws NPE if fs.default.name is the default value
- [HADOOP-6454] - Create setup.py for EC2 cloud scripts
- [HADOOP-6444] - Support additional security group option in hadoop-ec2 script
- [HADOOP-6426] - Create ant build for running EC2 unit tests
- [HADOOP-5625] - Add I/O duration time in client trace
- [HADOOP-5222] - Add offset in client trace
- [HADOOP-6400] - Log errors getting Unix UGI
- [HADOOP-5640] - Allow ServicePlugins to hook callbacks into key service events
- [HADOOP-6312] - Configuration sends too much data to log4j
- [HADOOP-6279] - Add JVM memory usage to JvmMetrics
- [HADOOP-6133] - ReflectionUtils performance regression
- [HADOOP-2838] - Add HADOOP_LIBRARY_PATH config setting so Hadoop will include external directories for jni
- [HADOOP-5733] - Add map/reduce slot capacity and lost map/reduce slot capacity to JobTracker metrics
- [HADOOP-4842] - Streaming combiner should allow command, not just JavaClass
- [HADOOP-6267] - build-contrib.xml unnecessarily enforces that contrib projects be located in contrib/ dir
- [HADOOP-4936] - Improvements to TestSafeMode
- [HADOOP-4675] - Current Ganglia metrics implementation is incompatible with Ganglia 3.1
- [HADOOP-5450] - Add support for application-specific typecodes to typed bytes
- [HADOOP-1722] - Make streaming to handle non-utf8 byte array
- [HADOOP-6166] - Improve PureJavaCrc32
- [HADOOP-6148] - Implement a pure Java CRC32 calculator
- [HADOOP-5968] - Sqoop should only print a warning about mysql import speed once
- [HADOOP-5967] - Sqoop should only use a single map task
- [HADOOP-5613] - change S3Exception to checked exception
- [HADOOP-5240] - 'ant javadoc' does not check whether outputs are up to date and always rebuilds
New Feature
- [HADOOP-7594] - Support HTTP REST in HttpServer
- [HADOOP-8343] - Allow configuration of authorization for JmxJsonServlet and MetricsServlet
- [HADOOP-7030] - Add TableMapping topology implementation to read host to rack mapping from a file
- [HADOOP-7806] - Support binding to sub-interfaces
- [HADOOP-6255] - Create an rpm integration project
- [HADOOP-7119] - add Kerberos HTTP SPNEGO authentication support to Hadoop JT/NN/DN/TT web-consoles
- [HADOOP-7144] - Expose JMX with something like JMXProxyServlet
- [HADOOP-7206] - Integrate Snappy compression
- [HADOOP-3741] - SecondaryNameNode has http server on dfs.secondary.http.address but without any contents
- [HADOOP-6996] - Allow CodecFactory to return a codec object given a codec' class name
- [HADOOP-5913] - Allow administrators to be able to start and stop queues
- [HADOOP-6889] - Make RPC to have an option to timeout
- [HADOOP-5170] - Set max map/reduce tasks on a per-job basis, either per-node or cluster-wide
- [HADOOP-6408] - Add a /conf servlet to dump running configuration
- [HADOOP-5752] - Provide examples of using offline image viewer (oiv) to analyze hadoop file systems
- [HADOOP-5467] - Create an offline fsimage image viewer
- [HADOOP-6832] - Provide a web server plugin that uses a static user for the web UI
- [HADOOP-6568] - Authorization for default servlets
- [HADOOP-6600] - mechanism for authorization check for inter-server protocols
- [HADOOP-6580] - UGI should contain authentication method.
- [HADOOP-6573] - Delegation Tokens should be persisted.
- [HADOOP-6586] - Log authentication and authorization failures and successes
- [HADOOP-6566] - Hadoop daemons should not start up if the ownership/permissions on the directories used at runtime are misconfigured
- [HADOOP-6332] - Large-scale Automated Test Framework
- [HADOOP-6547] - Move the Delegation Token feature to common since both HDFS and MapReduce needs it
- [HADOOP-6510] - doAs for proxy user
- [HADOOP-6419] - Change RPC layer to support SASL based mutual authentication
- [HADOOP-6538] - Set hadoop.security.authentication to "simple" by default
- [HADOOP-6337] - Update FilterInitializer class to be more visible and take a conf for further development
- [HADOOP-6517] - Ability to add/get tokens from UserGroupInformation
- [HADOOP-4268] - Permission checking in fsck
- [HADOOP-6415] - Adding a common token interface for both job token and delegation token
- [HADOOP-4359] - Access Token: Support for data access authorization checking on DataNodes
- [HADOOP-5643] - Ability to blacklist tasktracker
- [HADOOP-4490] - Map and Reduce tasks should run as the user who submitted the job
- [HADOOP-4930] - Implement setuid executable for Linux to assist in launching tasks as job owners
- [HADOOP-6433] - Add AsyncDiskService that is used in both hdfs and mapreduce
- [HADOOP-6382] - publish hadoop jars to apache mvn repo.
- [HADOOP-4012] - Providing splitting support for bzip2 compressed files
- [HADOOP-4368] - Superuser privileges required to do "df"
- [HADOOP-6466] - Add a ZooKeeper service to the cloud scripts
- [HADOOP-6392] - Run namenode and jobtracker on separate EC2 instances
- [HADOOP-6108] - Add support for EBS storage on EC2
- [HADOOP-5257] - Export namenode/datanode functionality through a pluggable RPC layer
- [HADOOP-5469] - Exposing Hadoop metrics via HTTP
- [HADOOP-5745] - Allow setting the default value of maxRunningJobs for all pools
- [HADOOP-5887] - Sqoop should create tables in Hive metastore after importing to HDFS
- [HADOOP-5528] - Binary partitioner
- [HADOOP-5175] - Option to prohibit jars unpacking
- [HADOOP-4829] - Allow FileSystem shutdown hook to be disabled
- [HADOOP-5518] - MRUnit unit test library
- [HADOOP-5844] - Use mysqldump when connecting to local mysql instance in Sqoop
- [HADOOP-5815] - Sqoop: A database import tool for Hadoop
Task
Test
- [HADOOP-6637] - Benchmark overhead of RPC session establishment
- [HADOOP-5457] - Failing contrib tests should not stop the build
- [HADOOP-6176] - Adding a couple private methods to AccessTokenHandler for testing purposes
HDFS
Bug
- [HDFS-3808] - fuse_dfs: postpone libhdfs intialization until after fork
- [HDFS-3758] - TestFuseDFS test failing
- [HDFS-3754] - BlockSender doesn't shutdown ReadaheadPool threads
- [HDFS-3444] - hdfs groups command doesn't work with security enabled
- [HDFS-3732] - fuse_dfs: incorrect configuration value checked for connection expiry timer period
- [HDFS-3698] - TestHftpFileSystem is failing in branch-1 due to changed default secure port
- [HDFS-3334] - ByteRangeInputStream leaks streams
- [HDFS-3609] - libhdfs: don't force the URI to look like hdfs://hostname:port
- [HDFS-3539] - libhdfs code cleanups
- [HDFS-3633] - libhdfs: hdfsDelete should pass JNI_FALSE or JNI_TRUE
- [HDFS-711] - hdfsUtime does not handle atime = 0 or mtime = 0 correctly
- [HDFS-470] - libhdfs should handle 0-length reads from FSInputStream correctly
- [HDFS-3628] - The dfsadmin -setBalancerBandwidth command on branch-1 does not check for superuser privileges
- [HDFS-3652] - 1.x: FSEditLog failure removes the wrong edit stream when storage dirs have same name
- [HDFS-1728] - SecondaryNameNode.checkpointSize is in byte but not MB.
- [HDFS-96] - HDFS does not support blocks greater than 2GB
- [HDFS-2827] - Cannot save namespace after renaming a directory above a file with an open lease
- [HDFS-2368] - defaults created for web keytab and principal, these properties should not have defaults
- [HDFS-3581] - FSPermissionChecker#checkPermission sticky bit check missing range check
- [HDFS-3551] - WebHDFS CREATE does not use client location for redirection
- [HDFS-3522] - If NN is in safemode, it should throw SafeModeException when getBlockLocations has zero locations
- [HDFS-3374] - hdfs' TestDelegationToken fails intermittently with a race condition
- [HDFS-3176] - JsonUtil should not parse the MD5MD5CRC32FileChecksum bytes on its own.
- [HDFS-3101] - cannot read empty file using webhdfs
- [HDFS-3006] - Webhdfs "SETOWNER" call returns incorrect content-type
- [HDFS-2869] - Error in Webhdfs documentation for mkdir
- [HDFS-2590] - Some links in WebHDFS forrest doc do not work
- [HDFS-2065] - Fix NPE in DFSClient.getFileChecksum
- [HDFS-2450] - Only complete hostname is supported to access data via hdfs://
- [HDFS-2411] - with webhdfs enabled in secure mode the auth to local mappings are not being respected.
- [HDFS-2589] - unnecessary hftp token fetch and renewal thread
- [HDFS-2392] - Dist with hftp is failing again
- [HDFS-1377] - Quota bug for partial blocks allows quotas to be violated
- [HDFS-2361] - hftp is broken
- [HDFS-2333] - HDFS-2284 introduced 2 findbugs warnings on trunk
- [HDFS-2331] - Hdfs compilation fails
- [HDFS-2328] - hftp throws NPE if security is not enabled on remote cluster
- [HDFS-1487] - FSDirectory.removeBlock() should update diskspace count of the block owner node
- [HDFS-3485] - DataTransferThrottler will over-throttle when currentTimeMillis jumps
- [HDFS-3330] - If GetImageServlet throws an Error or RTE, response has HTTP "OK" status
- [HDFS-3376] - DFSClient fails to make connection to DN if there are many unusable cached sockets
- [HDFS-3357] - DataXceiver reads from client socket with incorrect/no timeout
- [HDFS-3359] - DFSClient.close should close cached sockets
- [HDFS-3078] - 2NN https port setting is broken
- [HDFS-2132] - Potential resource leak in EditLogFileOutputStream.close
- [HDFS-1910] - when dfs.name.dir and dfs.name.edits.dir are same fsimage will be saved twice every time
- [HDFS-2877] - If locking of a storage dir fails, it will remove the other NN's lock file on exit
- [HDFS-3008] - Negative caching of local addrs doesn't work
- [HDFS-2751] - Datanode drops OS cache behind reads even for short reads
- [HDFS-2702] - A single failed name dir can cause the NN to exit
- [HDFS-2703] - removedStorageDirs is not updated everywhere we remove a storage dir
- [HDFS-2637] - The rpc timeout for block recovery is too low
- [HDFS-2541] - For a sufficiently large value of blocks, the DN Scanner may request a random number with a negative seed value.
- [HDFS-2379] - 0.20: Allow block reports to proceed without holding FSDataset lock
- [HDFS-2267] - DataXceiver thread name incorrect while waiting on op during keepalive
- [HDFS-1001] - DataXceiver and BlockReader disagree on when to send/recv CHECKSUM_OK
- [HDFS-94] - The "Heap Size" in HDFS web ui may not be accurate
- [HDFS-2422] - The NN should tolerate the same number of low-resource volumes as failed volumes
- [HDFS-1779] - After NameNode restart , Clients can not read partial files even after client invokes Sync.
- [HDFS-2186] - DN volume failures on startup are not counted
- [HDFS-2305] - Running multiple 2NNs can result in corrupt file system
- [HDFS-1480] - All replicas of a block can end up on the same rack when some datanodes are decommissioning.
- [HDFS-970] - FSImage writing should always fsync before close
- [HDFS-2259] - DN web-UI doesn't work with paths that contain html
- [HDFS-2235] - Encode servlet paths
- [HDFS-1317] - HDFSProxy needs additional changes to work after changes to streamFile servlet in HDFS-1109
- [HDFS-1109] - HFTP and URL Encoding
- [HDFS-1340] - A null delegation token is appended to the url if security is disabled when browsing filesystem.
- [HDFS-2023] - Backport of NPE for File.list and File.listFiles
- [HDFS-2190] - NN fails to start if it encounters an empty or malformed fstime file
- [HDFS-1758] - Web UI JSP pages thread safety issue
- [HDFS-2011] - Removal and restoration of storage directories on checkpointing failure doesn't work properly
- [HDFS-1836] - Thousand of CLOSE_WAIT socket
- [HDFS-1897] - Documention refers to removed option dfs.network.script
- [HDFS-1753] - Resource Leak in org.apache.hadoop.hdfs.server.namenode.StreamFile
- [HDFS-1592] - Datanode startup doesn't honor volumes.tolerated
- [HDFS-1692] - In secure mode, Datanode process doesn't exit when disks fail.
- [HDFS-2117] - DiskChecker#mkdirsWithExistsAndPermissionCheck may return true even when the dir is not created
- [HDFS-1850] - DN should transmit absolute failed volume count rather than increments to the NN
- [HDFS-1602] - NameNode storage failed replica restoration is broken
- [HDFS-1978] - All but first option in LIBHDFS_OPTS is ignored
- [HDFS-2082] - SecondaryNameNode web interface doesn't show the right info
- [HDFS-1594] - When the disk becomes full Namenode is getting shutdown and not able to recover
- [HDFS-1189] - Quota counts missed between clear quota and set quota
- [HDFS-1258] - Clearing namespace quota on "/" corrupts FS image
- [HDFS-1625] - TestDataNodeMXBean fails if disk space usage changes during test run
- [HDFS-1597] - Batched edit log syncs can reset synctxid throw assertions
- [HDFS-1085] - hftp read failing silently
- [HDFS-1364] - HFTP client should support relogin from keytab
- [HDFS-1153] - dfsnodelist.jsp should handle invalid input parameters
- [HDFS-1101] - TestDiskError.testLocalDirs() fails
- [HDFS-1589] - In secure mode, Datanodes should shutdown if they come up on non-privileged ports
- [HDFS-1560] - dfs.data.dir permissions should default to 700
- [HDFS-1542] - Deadlock in Configuration.writeXml when serialized form is larger than one DFS block
- [HDFS-1250] - Namenode accepts block report from dead datanodes
- [HDFS-1464] - Fix reporting of 2NN address when dfs.secondary.http.address is default (wildcard)
- [HDFS-1301] - TestHDFSProxy need to use server side conf for ProxyUser stuff.
- [HDFS-1404] - TestNodeCount logic incorrect in branch-0.20
- [HDFS-1267] - fuse-dfs does not compile
- [HDFS-1000] - libhdfs needs to be updated to use the new UGI
- [HDFS-446] - Offline Image Viewer Ls visitor incorrectly says 'output file' instead of 'input file'
- [HDFS-1164] - TestHdfsProxy is failing
- [HDFS-1313] - HdfsProxy changes from HDFS-481 missed in y20.1xx
- [HDFS-1007] - HFTP needs to be updated to use delegation tokens
- [HDFS-1157] - Modifications introduced by HDFS-1150 are breaking aspect's bindings
- [HDFS-1130] - Pass Administrator acl to HTTPServer for common servlet access.
- [HDFS-1146] - Javadoc for getDelegationTokenSecretManager in FSNamesystem
- [HDFS-1136] - FileChecksumServlets.RedirectServlet doesn't carry forward the delegation token
- [HDFS-1006] - getImage/putImage http requests should be https for the case of security enabled.
- [HDFS-1104] - Fsck triggers full GC on NameNode
- [HDFS-1010] - HDFSProxy: Retrieve group information from UnixUserGroupInformation instead of LdapEntry
- [HDFS-481] - Bug Fixes + HdfsProxy to use proxy user to impresonate the real user
- [HDFS-955] - FSImage.saveFSImage can lose edits
- [HDFS-1080] - SecondaryNameNode image transfer should use the defined http address rather than local ip address
- [HDFS-1044] - Cannot submit mapreduce job from secure client to unsecure sever
- [HDFS-1045] - In secure clusters, re-login is necessary for https clients before opening connections
- [HDFS-1039] - Service should be set in the token in JspHelper.getUGI
- [HDFS-1036] - in DelegationTokenFetch dfs.getURI returns no port
- [HDFS-1038] - In nn_browsedfscontent.jsp fetch delegation token only if security is enabled.
- [HDFS-1015] - Intermittent failure in TestSecurityTokenEditLog
- [HDFS-1020] - The canceller and renewer for delegation tokens should be long names.
- [HDFS-1019] - Incorrect default values for delegation tokens in hdfs-default.xml
- [HDFS-1017] - browsedfs jsp should call JspHelper.getUGI rather than using createRemoteUser()
- [HDFS-1014] - Error in reading delegation tokens from edit logs.
- [HDFS-965] - TestDelegationToken fails in trunk
- [HDFS-111] - UnderReplicationBlocks should use generic types
- [HDFS-938] - Replace calls to UGI.getUserName() with UGI.getShortUserName()
- [HDFS-195] - Need to handle access token expiration when re-establishing the pipeline for dfs write
- [HDFS-781] - Metrics PendingDeletionBlocks is not decremented
- [HDFS-625] - ListPathsServlet throws NullPointerException
- [HDFS-587] - Test programs support only default queue.
- [HDFS-1260] - 0.20: Block lost when multiple DNs trying to recover it to different genstamps
- [HDFS-1254] - 0.20: mark dfs.support.append to be true by default for the 0.20-append branch
- [HDFS-1240] - TestDFSShell failing in branch-20
- [HDFS-1207] - 0.20-append: stallReplicationWork should be volatile
- [HDFS-1197] - Blocks are considered "complete" prematurely after commitBlockSynchronization or DN restart
- [HDFS-1118] - DFSOutputStream socket leak when cannot connect to DataNode
- [HDFS-1186] - 0.20: DNs should interrupt writers at start of recovery
- [HDFS-915] - Hung DN stalls write pipeline for far longer than its timeout
- [HDFS-1218] - 20 append: Blocks recovered on startup should be treated with lower priority during block synchronization
- [HDFS-445] - pread() fails when cached block locations are no longer valid
- [HDFS-1204] - 0.20: Lease expiration should recover single files, not entire lease holder
- [HDFS-1202] - DataBlockScanner throws NPE when updated before initialized
- [HDFS-606] - ConcurrentModificationException in invalidateCorruptReplicas()
- [HDFS-1141] - completeFile does not check lease ownership
- [HDFS-1215] - TestNodeCount infinite loops on branch-20-append
- [HDFS-1122] - client block verification may result in blocks in DataBlockScanner prematurely
- [HDFS-1057] - Concurrent readers hit ChecksumExceptions if following a writer to very end of file
- [HDFS-561] - Fix write pipeline READ_TIMEOUT
- [HDFS-611] - Heartbeats times from Datanodes increase when there are plenty of blocks to delete
- [HDFS-894] - DatanodeID.ipcPort is not updated when existing node re-registers
- [HDFS-142] - In 0.20, move blocks being written into a blocksBeingWritten directory
- [HDFS-988] - saveNamespace race can corrupt the edits log
- [HDFS-101] - DFS write pipeline : DFSClient sometimes does not detect second datanode failure
- [HDFS-909] - Race condition between rollEditLog or rollFSImage ant FSEditsLog.write operations corrupts edits log
- [HDFS-612] - FSDataset should not use org.mortbay.log.Log
- [HDFS-1024] - SecondaryNamenode fails to checkpoint because namenode fails with CancelledKeyException
- [HDFS-961] - dfs_readdir incorrectly parses paths
- [HDFS-908] - TestDistributedFileSystem fails with Wrong FS on weird hosts
- [HDFS-877] - Client-driven block verification not functioning
- [HDFS-464] - Memory leaks in libhdfs
- [HDFS-861] - fuse-dfs does not support O_RDWR
- [HDFS-859] - fuse-dfs utime behavior causes issues with tar
- [HDFS-858] - Incorrect return codes for fuse-dfs
- [HDFS-857] - Incorrect type for fuse-dfs capacity can cause "df" to return negative values on 32-bit machines
- [HDFS-856] - Hardcoded replication level for new files in fuse-dfs
- [HDFS-423] - Unbreak FUSE build and fuse_dfs_wrapper.sh
- [HDFS-727] - bug setting block size hdfsOpenFile
- [HDFS-686] - NullPointerException is thrown while merging edit log and image
- [HDFS-127] - DFSClient block read failures cause open DFSInputStream to become unusable
Improvement
- [HDFS-3697] - Enable fadvise readahead by default
- [HDFS-3680] - Allows customized audit logging in HDFS FSNamesystem
- [HDFS-799] - libhdfs must call DetachCurrentThread when a thread is destroyed
- [HDFS-2617] - Replaced Kerberized SSL for image transfer and fsck with SPNEGO-based solution
- [HDFS-2391] - Newly set BalancerBandwidth value is not displayed anywhere
- [HDFS-1997] - Image transfer process misreports client side exceptions
- [HDFS-3596] - Improve FSEditLog pre-allocation in branch-1
- [HDFS-2868] - Add number of active transfer threads to the DataNode status
- [HDFS-1957] - Documentation for HFTP
- [HDFS-3568] - fuse_dfs: add support for security
- [HDFS-3604] - Add dfs.webhdfs.enabled to hdfs-default.xml
- [HDFS-2604] - Add a log message to show if WebHDFS is enabled
- [HDFS-2116] - Cleanup TestStreamFile and TestByteRangeInputStream
- [HDFS-3516] - Check content-type in WebHdfsFileSystem
- [HDFS-3475] - Make the replication monitor multipliers configurable
- [HDFS-3479] - backport HDFS-3335 (check for edit log corruption at the end of the log) to branch-1
- [HDFS-3520] - Add transfer rate logging to TransferFsImage
- [HDFS-1457] - Limit transmission rate when transfering image between primary and secondary NNs
- [HDFS-3044] - fsck move should be non-destructive by default
- [HDFS-1773] - Remove a datanode from cluster if include list is not empty and this datanode is removed from both include and exclude lists
- [HDFS-2872] - Add sanity checks during edits loading that generation stamps are non-decreasing
- [HDFS-2502] - hdfs-default.xml should include dfs.name.dir.restore
- [HDFS-930] - o.a.h.hdfs.server.datanode.DataXceiver - run() - Version mismatch exception - more context to help debugging
- [HDFS-2788] - HdfsServerConstants#DN_KEEPALIVE_TIMEOUT is dead code
- [HDFS-2701] - Cleanup FS* processIOError methods
- [HDFS-2654] - Make BlockReaderLocal not extend RemoteBlockReader2
- [HDFS-2653] - DFSClient should cache whether addrs are non-local when short-circuiting is enabled
- [HDFS-2246] - Shortcut a local client reads to a Datanodes files directly
- [HDFS-617] - Support for non-recursive create() in HDFS
- [HDFS-2638] - Improve a block recovery log
- [HDFS-854] - Datanode should scan devices in parallel to generate block report
- [HDFS-941] - Datanode xceiver protocol should allow reuse of a connection
- [HDFS-2465] - Add HDFS support for fadvise readahead and drop-behind
- [HDFS-1959] - Better error message for missing namenode directory
- [HDFS-1628] - AccessControlException should display the full path
- [HDFS-556] - Provide info on failed volumes in the web ui
- [HDFS-420] - Fuse-dfs should cache fs handles
- [HDFS-1846] - Don't fill preallocated portion of edits log with 0x00
- [HDFS-1759] - Improve error message when starting secure DN without jsvc
- [HDFS-1601] - Pipeline ACKs are sent as lots of tiny TCP packets
- [HDFS-1114] - Reducing NameNode memory usage by an alternate hash table
- [HDFS-1119] - Refactor BlocksMap with GettableSet
- [HDFS-599] - Improve Namenode robustness by prioritizing datanode heartbeats over client requests
- [HDFS-1298] - Add support in HDFS to update statistics that tracks number of file system operations in FileSystem
- [HDFS-1315] - Add fsck event to audit log and remove other audit log events corresponding to FSCK listStatus and open calls
- [HDFS-1383] - Better error messages on hftp
- [HDFS-1061] - Memory footprint optimization for INodeFile object.
- [HDFS-1307] - Add start time, end time and total time taken for FSCK to FSCK report
- [HDFS-1626] - Make BLOCK_INVALIDATE_LIMIT configurable
- [HDFS-1353] - Remove most of getBlockLocation optimization
- [HDFS-1378] - Edit log replay should track and report file offsets in case of errors
- [HDFS-1387] - Update HDFS permissions guide for security
- [HDFS-1178] - The NameNode servlets should not use RPC to connect to the NameNode
- [HDFS-1012] - documentLocation attribute in LdapEntry for HDFSProxy isn't specific to a cluster
- [HDFS-1011] - Improve Logging in HDFSProxy to include cluster name associated with the request
- [HDFS-1081] - Performance regression in DistributedFileSystem::getFileBlockLocations in secure systems
- [HDFS-1033] - In secure clusters, NN and SNN should verify that the remote principal during image and edits transfer
- [HDFS-1023] - Allow http server to start as regular principal if https principal not defined.
- [HDFS-994] - Provide methods for obtaining delegation token from Namenode for hftp and other uses
- [HDFS-998] - The servlets should quote server generated strings sent in the response
- [HDFS-786] - Implement getContentSummary(..) in HftpFileSystem
- [HDFS-946] - NameNode should not return full path name when lisitng a diretory or getting the status of a file
- [HDFS-737] - Improvement in metasave output
- [HDFS-764] - Moving Access Token implementation from Common to HDFS
- [HDFS-758] - Improve reporting of progress of decommissioning
- [HDFS-1209] - Add conf dfs.client.block.recovery.retries to configure number of block recovery attempts
- [HDFS-1210] - DFSClient should log exception when block recovery fails
- [HDFS-1205] - FSDatasetAsyncDiskService should name its threads
- [HDFS-1248] - Misc cleanup/logging improvements for branch-20-append
- [HDFS-1203] - DataNode should sleep before reentering service loop after an exception
- [HDFS-895] - Allow hflush/sync to occur in parallel with new writes to the file
- [HDFS-1211] - 0.20 append: Block receiver should not log "rewind" packets at INFO level
- [HDFS-1056] - Multi-node RPC deadlocks during block recovery
- [HDFS-1055] - Improve thread naming for DataXceivers
- [HDFS-1054] - Remove unnecessary sleep after failure in nextBlockOutputStream
- [HDFS-826] - Allow a mechanism for an application to detect that datanode(s) have died in the write pipeline
- [HDFS-1161] - Make DN minimum valid volumes configurable
- [HDFS-1160] - Improve some FSDataset warnings and comments
- [HDFS-457] - better handling of volume failure in Data Node storage
- [HDFS-1013] - Miscellaneous improvements to HTML markup for web UIs
- [HDFS-455] - Make NN and DN handle in a intuitive way comma-separated configuration strings
- [HDFS-412] - Hadoop JMX usage makes Nagios monitoring impossible
- [HDFS-630] - In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.
- [HDFS-496] - Use PureJavaCrc32 in HDFS
New Feature
- [HDFS-2202] - Changes to balancer bandwidth should not require datanode restart.
- [HDFS-2539] - Support doAs and GETHOMEDIRECTORY in webhdfs
- [HDFS-2540] - Change WebHdfsFileSystem to two-step create/append
- [HDFS-2528] - webhdfs rest call to a secure dn fails when a token is sent
- [HDFS-2527] - Remove the use of Range header from webhdfs
- [HDFS-2432] - webhdfs setreplication api should return a 403 when called on a directory
- [HDFS-2494] - [webhdfs] When Getting the file using OP=OPEN with DN http address, ESTABLISHED sockets are growing.
- [HDFS-2501] - add version prefix and root methods to webhdfs
- [HDFS-2416] - distcp with a webhdfs uri on a secure cluster fails
- [HDFS-2427] - webhdfs mkdirs api call creates path with 777 permission, we should default it to 755
- [HDFS-2453] - tail using a webhdfs uri throws an error
- [HDFS-2439] - webhdfs open an invalid path leads to a 500 which states a npe, we should return a 404 with appropriate error message
- [HDFS-2424] - webhdfs liststatus json does not convert to a valid xml document
- [HDFS-2428] - webhdfs api parameter validation should be better
- [HDFS-2441] - webhdfs returns two content-type headers
- [HDFS-2403] - The renewer in NamenodeWebHdfsMethods.generateDelegationToken(..) is not used
- [HDFS-2404] - webhdfs liststatus json response is not correct
- [HDFS-2395] - webhdfs api's should return a root element in the json response
- [HDFS-2385] - Support delegation token renewal in webhdfs
- [HDFS-2348] - Support getContentSummary and getFileChecksum in webhdfs
- [HDFS-2366] - webhdfs throws a npe when ugi is null from getDelegationToken
- [HDFS-2356] - webhdfs: support case insensitive query parameter names
- [HDFS-2340] - Support getFileBlockLocations and getDelegationToken in webhdfs
- [HDFS-2318] - Provide authentication to webhdfs using SPNEGO
- [HDFS-2338] - Configuration option to enable/disable webhdfs.
- [HDFS-2317] - Read access to HDFS using HTTP REST
- [HDFS-2284] - Write Http access to HDFS
- [HDFS-3148] - The client should be able to use multiple local interfaces for data transfer
- [HDFS-3150] - Add option for clients to contact DNs via hostname
- [HDFS-2978] - The NameNode should expose name dir statuses via JMX
- [HDFS-235] - Add support for byte-ranges to hftp
- [HDFS-811] - Add metrics, failure reporting and additional tests for HDFS-457
- [HDFS-2055] - Add hflush support to libhdfs
- [HDFS-1520] - HDFS 20 append: Lightweight NameNode operation to trigger lease recovery
- [HDFS-1318] - HDFS Namenode and Datanode WebUI information needs to be accessible programmatically for scripts
- [HDFS-1330] - Make RPCs to DataNodes timeout
- [HDFS-461] - Analyzing file size distribution.
- [HDFS-1150] - Verify datanodes' identities to clients in secure clusters
- [HDFS-1096] - allow dfsadmin/mradmin refresh of superuser proxy group mappings
- [HDFS-999] - Secondary namenode should login using kerberos if security is configured
- [HDFS-985] - HDFS should issue multiple RPCs for listing a large directory
- [HDFS-992] - Re-factor block access token implementation to conform to the generic Token interface in Common
- [HDFS-814] - Add an api to get the visible length of a DFSDataInputStream.
- [HDFS-204] - Revive number of files listed metrics
- [HDFS-1005] - Fsck security
- [HDFS-991] - Allow browsing the filesystem over http using delegation tokens
- [HDFS-899] - Delegation Token Implementation
- [HDFS-595] - FsPermission tests need to be updated for new octal configuration parameter from HADOOP-6234
- [HDFS-200] - In HDFS, sync() not yet guarantees data available to the new readers
- [HDFS-528] - Add ability for safemode to wait for a minimum number of live datanodes
Task
- [HDFS-2552] - Add WebHdfs Forrest doc
- [HDFS-1266] - Missing license headers in branch-20-append
Test
- [HDFS-3606] - libhdfs: create self-contained unit test
- [HDFS-3129] - NetworkTopology: add test that getLeaf should check for invalid topologies
- [HDFS-2332] - Add test for HADOOP-7629: using an immutable FsPermission as an IPC parameter
- [HDFS-2100] - Improve TestStorageRestore
- [HDFS-1862] - Improve test reliability of HDFS-1594
- [HDFS-1762] - Allow TestHDFSCLI to be run against a cluster
- [HDFS-780] - Revive TestFuseDFS
- [HDFS-907] - Add tests for getBlockLocations and totalLoad metrics.
- [HDFS-409] - Add more access token tests
- [HDFS-1252] - TestDFSConcurrentFileOperations broken in 0.20-appendj
- [HDFS-1247] - Improvements to HDFS-1204 test
- [HDFS-1246] - Manual tool to test sync against a real cluster
- [HDFS-1243] - 0.20 append: Replication tests in TestFileAppend4 should not expect immediate replication
- [HDFS-1242] - 0.20 append: Add test for appendFile() race solved in HDFS-142
- [HDFS-1244] - Misc improvements to TestFileAppend2
- [HDFS-696] - Java assertion failures triggered by tests
Wish
- [HDFS-860] - fuse-dfs truncate behavior causes issues with scp
MapReduce
Bug
- [MAPREDUCE-4036] - Streaming TestUlimit fails on CentOS 6
- [MAPREDUCE-4463] - JobTracker recovery fails with HDFS permission issue
- [MAPREDUCE-4399] - Fix performance regression in shuffle
- [MAPREDUCE-4154] - streaming MR job succeeds even if the streaming command fails
- [MAPREDUCE-323] - Improve the way job history files are managed
- [MAPREDUCE-2779] - JobSplitWriter.java can't handle large job.split file
- [MAPREDUCE-4385] - FairScheduler.maxTasksToAssign() should check for fairscheduler.assignmultiple.maps < TaskTracker.availableSlots
- [MAPREDUCE-3993] - Graceful handling of codec errors during decompression
- [MAPREDUCE-3475] - JT can't renew its own tokens
- [MAPREDUCE-2764] - Fix renewal of dfs delegation tokens
- [MAPREDUCE-2452] - Delegation token cancellation shouldn't hold global JobTracker lock
- [MAPREDUCE-2420] - JobTracker should be able to renew delegation token over HTTP
- [MAPREDUCE-2780] - Standardize the value of token service
- [MAPREDUCE-4359] - Potential deadlock in Counters
- [MAPREDUCE-4195] - With invalid queueName request param, jobqueue_details.jsp shows NPE
- [MAPREDUCE-3674] - If invoked with no queueName request param, jobqueue_details.jsp injects a null queue name into schedulers.
- [MAPREDUCE-4012] - Hadoop Job setup error leaves no useful info to users (when LinuxTaskController is used)
- [MAPREDUCE-1740] - NPE in getMatchingLevelForNodes when node locations are variable depth
- [MAPREDUCE-1109] - ConcurrentModificationException in jobtracker.jsp
- [MAPREDUCE-3997] - jobhistory.jsp cuts off the job name at the first underscore of the job name
- [MAPREDUCE-3789] - CapacityTaskScheduler may perform unnecessary reservations in heterogenous tracker environments
- [MAPREDUCE-3727] - jobtoken location property in jobconf refers to wrong jobtoken file
- [MAPREDUCE-2905] - CapBasedLoadManager incorrectly allows assignment when assignMultiple is true (was: assignmultiple per job)
- [MAPREDUCE-2555] - JvmInvalidate errors in the gridmix TT logs
- [MAPREDUCE-3343] - TaskTracker Out of Memory because of distributed cache
- [MAPREDUCE-2980] - Fetch failures and other related issues in Jetty 6.1.26
- [MAPREDUCE-2932] - Missing instrumentation plugin class shouldn't crash the TT startup per design
- [MAPREDUCE-2992] - TestLinuxTaskController is broken
- [MAPREDUCE-2760] - mapreduce.jobtracker.split.metainfo.maxsize typoed in mapred-default.xml
- [MAPREDUCE-2651] - Race condition in Linux Task Controller for job log directory creation
- [MAPREDUCE-1482] - Better handling of task diagnostic information stored in the TaskInProgress
- [MAPREDUCE-2529] - Recognize Jetty bug 1342 and handle it
- [MAPREDUCE-2670] - Fixing spelling mistake in FairSchedulerServlet.java
- [MAPREDUCE-2447] - Set JvmContext sooner for a task - MR2429
- [MAPREDUCE-2443] - Fix FI build - broken after MR-2429
- [MAPREDUCE-2429] - Check jvmid during task status report
- [MAPREDUCE-2472] - Extra whitespace in mapred.child.java.opts breaks JVM initialization
- [MAPREDUCE-2023] - TestDFSIO read test may not read specified bytes.
- [MAPREDUCE-2457] - job submission should inject group.name (on the JT side)
- [MAPREDUCE-1614] - TestDFSIO should allow to configure output directory
- [MAPREDUCE-1813] - NPE in PipeMapred.MRErrorThread
- [MAPREDUCE-2366] - TaskTracker can't retrieve stdout and stderr from web UI
- [MAPREDUCE-2364] - Shouldn't hold lock on rjob while localizing resources.
- [MAPREDUCE-1563] - Task diagnostic info would get missed sometimes.
- [MAPREDUCE-2356] - A task succeeded even though there were errors on all attempts.
- [MAPREDUCE-2377] - task-controller fails to parse configuration if it doesn't end in \n
- [MAPREDUCE-2379] - Distributed cache sizing configurations are missing from mapred-default.xml
- [MAPREDUCE-2376] - test-task-controller fails if run as a userid < 1000
- [MAPREDUCE-2374] - "Text File Busy" errors launching MR tasks
- [MAPREDUCE-1845] - FairScheduler.tasksToPeempt() can return negative number
- [MAPREDUCE-2321] - TT should fail to start on secure cluster when SecureIO isn't available
- [MAPREDUCE-2289] - Permissions race can make getStagingDir fail on local filesystem
- [MAPREDUCE-2178] - Race condition in LinuxTaskController permissions handling
- [MAPREDUCE-2005] - TestDelegationTokenRenewal fails
- [MAPREDUCE-1961] - [gridmix3] ConcurrentModificationException when shutting down Gridmix
- [MAPREDUCE-2328] - memory-related configurations missing from mapred-default.xml
- [MAPREDUCE-1118] - Capacity Scheduler scheduling information is hard to read / should be tabular format
- [MAPREDUCE-2256] - FairScheduler fairshare preemption from multiple pools may preempt all tasks from one pool causing that pool to go below fairshare.
- [MAPREDUCE-2242] - LinuxTaskController doesn't properly escape environment variables
- [MAPREDUCE-2253] - Servlets should specify content type
- [MAPREDUCE-2082] - Race condition in writing the jobtoken password file when launching pipes jobs
- [MAPREDUCE-1085] - For tasks, "ulimit -v -1" is being run when user doesn't specify mapred.child.ulimit
- [MAPREDUCE-2277] - TestCapacitySchedulerWithJobTracker fails sometimes
- [MAPREDUCE-2238] - Undeletable build directories
- [MAPREDUCE-787] - -files, -archives should honor user given symlink path
- [MAPREDUCE-572] - If #link is missing from uri format of -cacheArchive then streaming does not throw error.
- [MAPREDUCE-1178] - MultipleInputs fails with ClassCastException
- [MAPREDUCE-2219] - JT should not try to remove mapred.system.dir during startup
- [MAPREDUCE-1699] - JobHistory shouldn't be disabled for any reason
- [MAPREDUCE-1853] - MultipleOutputs does not cache TaskAttemptContext
- [MAPREDUCE-1621] - Streaming's TextOutputReader.getLastOutput throws NPE if it has never read any output
- [MAPREDUCE-1784] - IFile should check for null compressor
- [MAPREDUCE-1288] - DistributedCache localizes only once per cache URI
- [MAPREDUCE-2096] - Secure local filesystem IO from symlink vulnerabilities
- [MAPREDUCE-1280] - Eclipse Plugin does not work with Eclipse Ganymede (3.4)
- [MAPREDUCE-1682] - Tasks should not be scheduled after tip is killed/failed.
- [MAPREDUCE-1914] - TrackerDistributedCacheManager never cleans its input directories
- [MAPREDUCE-1538] - TrackerDistributedCacheManager can fail because the number of subdirectories reaches system limit
- [MAPREDUCE-1900] - MapReduce daemons should close FileSystems that are not needed anymore
- [MAPREDUCE-1807] - TestQueueManager can take long enough to time out
- [MAPREDUCE-1716] - Truncate logs of finished tasks to prevent node thrash due to excessive logging
- [MAPREDUCE-1442] - StackOverflowError when JobHistory parses a really long line
- [MAPREDUCE-1744] - DistributedCache creates its own FileSytem instance when adding a file/archive to the path
- [MAPREDUCE-1759] - Exception message for unauthorized user doing killJob, killTask, setJobPriority needs to be improved
- [MAPREDUCE-1754] - Replace mapred.persmissions.supergroup with an acl : mapreduce.cluster.administrators
- [MAPREDUCE-1707] - TaskRunner can get NPE in getting ugi from TaskTracker
- [MAPREDUCE-1687] - Stress submission policy does not always stress the cluster.
- [MAPREDUCE-1641] - Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives
- [MAPREDUCE-1664] - Job Acls affect Queue Acls
- [MAPREDUCE-1397] - NullPointerException observed during task failures
- [MAPREDUCE-1607] - Task controller may not set permissions for a task cleanup attempt's log directory
- [MAPREDUCE-1533] - Reduce or remove usage of String.format() usage in CapacityTaskScheduler.updateQSIObjects and Counters.makeEscapedString()
- [MAPREDUCE-1701] - AccessControlException while renewing a delegation token in not correctly handled in the JobTracker
- [MAPREDUCE-1657] - After task logs directory is deleted, tasklog servlet displays wrong error message about job ACLs
- [MAPREDUCE-1692] - Remove TestStreamedMerge from the streaming tests
- [MAPREDUCE-1617] - TestBadRecords failed once in our test runs
- [MAPREDUCE-1718] - job conf key for the services name of DelegationToken for HFTP url is constructed incorrectly in HFTPFileSystem
- [MAPREDUCE-587] - Stream test TestStreamingExitStatus fails with Out of Memory
- [MAPREDUCE-1985] - java.lang.ArrayIndexOutOfBoundsException in analysejobhistory.jsp of jobs with 0 maps
- [MAPREDUCE-1683] - Remove JNI calls from ClusterStatus cstr
- [MAPREDUCE-1635] - ResourceEstimator does not work after MAPREDUCE-842
- [MAPREDUCE-1612] - job conf file is not accessible from job history web page
- [MAPREDUCE-1611] - Refresh nodes and refresh queues doesnt work with service authorization enabled
- [MAPREDUCE-1609] - TaskTracker.localizeJob should not set permissions on job log directory recursively
- [MAPREDUCE-1610] - Forrest documentation should be updated to reflect the changes in MAPREDUCE-856
- [MAPREDUCE-1417] - Forrest documentation should be updated to reflect the changes in MAPREDUCE-744
- [MAPREDUCE-1604] - Job acls should be documented in forrest.
- [MAPREDUCE-1543] - Log messages of JobACLsManager should use security logging of HADOOP-6586
- [MAPREDUCE-1606] - TestJobACLs may timeout as there are no slots for launching JOB_CLEANUP task
- [MAPREDUCE-927] - Cleanup of task-logs should happen in TaskTracker instead of the Child
- [MAPREDUCE-1599] - MRBench reuses jobConf and credentials there in.
- [MAPREDUCE-1522] - FileInputFormat may change the file system of an input path
- [MAPREDUCE-1100] - User's task-logs filling up local disks on the TaskTrackers
- [MAPREDUCE-1422] - Changing permissions of files/dirs under job-work-dir may be needed sothat cleaning up of job-dir in all mapred-local-directories succeeds always
- [MAPREDUCE-890] - After HADOOP-4491, the user who started mapred system is not able to run job.
- [MAPREDUCE-1566] - Need to add a mechanism to import tokens and secrets into a submitted job.
- [MAPREDUCE-1421] - LinuxTaskController tests failing on trunk after the commit of MAPREDUCE-1385
- [MAPREDUCE-1559] - The DelegationTokenRenewal timer task should use the jobtracker's credentials to create the filesystem
- [MAPREDUCE-1550] - UGI.doAs should not be used for getting the history file of jobs
- [MAPREDUCE-899] - When using LinuxTaskController, localized files may become accessible to unintended users if permissions are misconfigured.
- [MAPREDUCE-1528] - TokenStorage should not be static
- [MAPREDUCE-1532] - Delegation token is obtained as the superuser
- [MAPREDUCE-1520] - TestMiniMRLocalFS fails on trunk
- [MAPREDUCE-1505] - Cluster class should create the rpc client only when needed
- [MAPREDUCE-1398] - TaskLauncher remains stuck on tasks waiting for free nodes even if task is killed.
- [MAPREDUCE-1476] - committer.needsTaskCommit should not be called for a task cleanup attempt
- [MAPREDUCE-1316] - JobTracker holds stale references to retired jobs via unreported tasks
- [MAPREDUCE-1399] - The archive command shows a null error message
- [MAPREDUCE-1435] - symlinks in cwd of the task are not handled properly after MAPREDUCE-896
- [MAPREDUCE-1186] - While localizing a DistributedCache file, TT sets permissions recursively on the whole base-dir
- [MAPREDUCE-896] - Users can set non-writable permissions on temporary files for TT and can abuse disk usage.
- [MAPREDUCE-1140] - Per cache-file refcount can become negative when tasks release distributed-cache files
- [MAPREDUCE-1284] - TestLocalizationWithLinuxTaskController fails
- [MAPREDUCE-1098] - Incorrect synchronization in DistributedCache causes TaskTrackers to freeze up during localization of Cache for tasks.
- [MAPREDUCE-408] - TestKillSubProcesses fails with assertion failure sometimes
- [MAPREDUCE-1342] - Potential JT deadlock in faulty TT tracking
- [MAPREDUCE-1124] - TestGridmixSubmission fails sometimes
- [MAPREDUCE-1143] - runningMapTasks counter is not properly decremented in case of failed Tasks.
- [MAPREDUCE-676] - Existing diagnostic rules fail for MAP ONLY jobs
- [MAPREDUCE-1171] - Lots of fetch failures
- [MAPREDUCE-754] - NPE in expiry thread when a TT is lost
- [MAPREDUCE-1219] - JobTracker Metrics causes undue load on JobTracker
- [MAPREDUCE-1196] - MAPREDUCE-947 incompatibly changed FileOutputCommitter
- [MAPREDUCE-1160] - Two log statements at INFO level fill up jobtracker logs
- [MAPREDUCE-1158] - running_maps is not decremented when the tasks of a job is killed/failed
- [MAPREDUCE-1062] - MRReliability test does not work with retired jobs
- [MAPREDUCE-1090] - Modify log statement in Tasktracker log related to memory monitoring to include attempt id.
- [MAPREDUCE-1105] - CapacityScheduler: It should be possible to set queue hard-limit beyond it's actual capacity
- [MAPREDUCE-1086] - hadoop commands in streaming tasks are trying to write to tasktracker's log
- [MAPREDUCE-1088] - JobHistory files should have narrower 0600 perms
- [MAPREDUCE-732] - node health check script should not log "UNHEALTHY" status for every heartbeat in INFO mode
- [MAPREDUCE-144] - TaskMemoryManager should log process-tree's status while killing tasks.
- [MAPREDUCE-1030] - Reduce tasks are getting starved in capacity scheduler
- [MAPREDUCE-1028] - Cleanup tasks are scheduled using high memory configuration, leaving tasks in unassigned state.
- [MAPREDUCE-964] - Inaccurate values in jobSummary logs
- [MAPREDUCE-945] - Test programs support only default queue.
- [MAPREDUCE-682] - Reserved tasktrackers should be removed when a node is globally blacklisted
- [MAPREDUCE-809] - Job summary logs show status of completed jobs as RUNNING
- [MAPREDUCE-771] - Setup and cleanup tasks remain in UNASSIGNED state for a long time on tasktrackers with long running high RAM tasks
- [MAPREDUCE-733] - When running ant test TestTrackerBlacklistAcrossJobs, losing task tracker heartbeat exception occurs.
- [MAPREDUCE-734] - java.util.ConcurrentModificationException observed in unreserving slots for HiRam Jobs
- [MAPREDUCE-693] - Conf files not moved to "done" subdirectory after JT restart
- [MAPREDUCE-722] - More slots are getting reserved for HiRAM job tasks then required
- [MAPREDUCE-709] - node health check script does not display the correct message on timeout
- [MAPREDUCE-522] - Rewrite TestQueueCapacities to make it simpler and avoid timeout errors
- [MAPREDUCE-516] - Fix the 'cluster drain' problem in the Capacity Scheduler wrt High RAM Jobs
- [MAPREDUCE-118] - Job.getJobID() will always return null
- [MAPREDUCE-1887] - MRAsyncDiskService does not properly absolutize volume root paths
- [MAPREDUCE-1372] - ConcurrentModificationException in JobInProgress
- [MAPREDUCE-1378] - Args in job details links on jobhistory.jsp are not URL encoded
- [MAPREDUCE-1213] - TaskTrackers restart is very slow because it deletes distributed cache directory synchronously
- [MAPREDUCE-1443] - DBInputFormat can leak connections
- [MAPREDUCE-1728] - Oracle timezone strings do not match Java
- [MAPREDUCE-1375] - TestFileArgs fails intermittently
- [MAPREDUCE-1536] - DataDrivenDBInputFormat does not split date columns correctly.
- [MAPREDUCE-1480] - CombineFileRecordReader does not properly initialize child RecordReader
- [MAPREDUCE-1436] - Deadlock in preemption code in fair scheduler
- [MAPREDUCE-1469] - Sqoop should disable speculative execution in export
- [MAPREDUCE-1395] - Sqoop does not check return value of Job.waitForCompletion()
- [MAPREDUCE-1327] - Oracle database import via sqoop fails when a table contains the column types such as TIMESTAMP(6) WITH LOCAL TIME ZONE and TIMESTAMP(6) WITH TIME ZONE
- [MAPREDUCE-1394] - Sqoop generates incorrect URIs in paths sent to Hive
- [MAPREDUCE-1313] - NPE in FieldFormatter if escape character is set and field is null
- [MAPREDUCE-1155] - Streaming tests swallow exceptions
- [MAPREDUCE-1258] - Fair scheduler event log not logging job info
- [MAPREDUCE-1212] - Mapreduce contrib project ivy dependencies are not included in binary target
- [MAPREDUCE-1310] - CREATE TABLE statements for Hive do not correctly specify delimiters
- [MAPREDUCE-1235] - java.io.IOException: Cannot convert value '0000-00-00 00:00:00' from column 6 to TIMESTAMP.
- [MAPREDUCE-1174] - Sqoop improperly handles table/column names which are reserved sql words
- [MAPREDUCE-1146] - Sqoop dependencies break Ecpilse build on Linux
- [MAPREDUCE-1148] - SQL identifiers are a superset of Java identifiers
- [MAPREDUCE-1285] - DistCp cannot handle -delete if destination is local filesystem
- [MAPREDUCE-764] - TypedBytesInput's readRaw() does not preserve custom type codes
- [MAPREDUCE-1293] - AutoInputFormat doesn't work with non-default FileSystems
- [MAPREDUCE-1131] - Using profilers other than hprof can cause JobClient to report job failure
- [MAPREDUCE-1059] - distcp can generate uneven map task assignments
- [MAPREDUCE-1128] - MRUnit Allows Iteration Twice
- [MAPREDUCE-112] - Reduce Input Records and Reduce Output Records counters are not being set when using the new Mapreduce reducer API
- [MAPREDUCE-1089] - Fair Scheduler preemption triggers NPE when tasks are scheduled but not running
- [MAPREDUCE-968] - NPE in distcp encountered when placing _logs directory on S3FileSystem
- [MAPREDUCE-683] - TestJobTrackerRestart fails with Map task completion events ordering mismatch
- [MAPREDUCE-416] - Move the completed jobs' history files to a DONE subdirectory inside the configured history directory
- [MAPREDUCE-971] - distcp does not always remove distcp.tmp.dir
- [MAPREDUCE-923] - Sqoop's ORM uses URLDecoder on a file, which replaces plus signs in a jar file name with spaces
- [MAPREDUCE-840] - DBInputFormat leaves open transaction
- [MAPREDUCE-825] - JobClient completion poll interval of 5s causes slow tests in local mode
- [MAPREDUCE-792] - javac warnings in DBInputFormat
- [MAPREDUCE-716] - org.apache.hadoop.mapred.lib.db.DBInputformat not working with oracle
- [MAPREDUCE-799] - Some of MRUnit's self-tests were not being run
- [MAPREDUCE-685] - Sqoop will fail with OutOfMemory on large tables using mysql
- [MAPREDUCE-703] - Sqoop requires dependency on hsqldb in ivy
- [MAPREDUCE-415] - JobControl Job does always has an unassigned name
- [MAPREDUCE-680] - Reuse of Writable objects is improperly handled by MRUnit
- [MAPREDUCE-714] - JobConf.findContainingJar unescapes unnecessarily on Linux
Improvement
- [MAPREDUCE-4415] - Backport the Job.getInstance methods from MAPREDUCE-1505 to branch-1
- [MAPREDUCE-336] - The logging level of the tasks should be configurable by the job
- [MAPREDUCE-2456] - Show the reducer taskid and map/reduce tasktrackers for "Failed fetch notification #_ for task attempt..." log messages
- [MAPREDUCE-1221] - Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
- [MAPREDUCE-2835] - Make per-job counter limits configurable
- [MAPREDUCE-4001] - Improve MAPREDUCE-3789's fix logic by looking at job's slot demands instead
- [MAPREDUCE-157] - Job History log file format is not friendly for external tools.
- [MAPREDUCE-3607] - Port missing new API mapreduce lib classes to 1.x
- [MAPREDUCE-936] - Allow a load difference in fairshare scheduler
- [MAPREDUCE-3289] - Make use of fadvise in the NM's shuffle handler
- [MAPREDUCE-3184] - Improve handling of fetch failures when a tasktracker is not responding on HTTP
- [MAPREDUCE-3278] - 0.20: avoid a busy-loop in ReduceTask scheduling
- [MAPREDUCE-2836] - Provide option to fail jobs when submitted to non-existent pools.
- [MAPREDUCE-1943] - Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
- [MAPREDUCE-2524] - Backport trunk heuristics for failing maps when we get fetch failures retrieving map output during shuffle
- [MAPREDUCE-2254] - Allow setting of end-of-record delimiter for TextInputFormat
- [MAPREDUCE-2260] - Remove auto-generated native build files
- [MAPREDUCE-2505] - Explain how to use ACLs in the fair scheduler
- [MAPREDUCE-1832] - Support for file sizes less than 1MB in DFSIO benchmark.
- [MAPREDUCE-2372] - TaskLogAppender mechanism shouldn't be set in log4j.properties
- [MAPREDUCE-2371] - TaskLogsTruncater does not need to check log ownership when running as Child
- [MAPREDUCE-2373] - When tasks exit with a nonzero exit status, task runner should log the stderr as well as stdout
- [MAPREDUCE-2351] - mapred.job.tracker.history.completed.location should support an arbitrary filesystem URI
- [MAPREDUCE-2332] - Improve error messages when MR dirs on local FS have bad ownership
- [MAPREDUCE-1545] - Add 'first-task-launched' to job-summary
- [MAPREDUCE-339] - JobTracker should give preference to failed tasks over virgin tasks so as to terminate the job ASAP if it is eventually going to fail.
- [MAPREDUCE-1936] - [gridmix3] Make Gridmix3 more customizable.
- [MAPREDUCE-1778] - CompletedJobStatusStore initialization should fail if {mapred.job.tracker.persist.jobstatus.dir} is unwritable
- [MAPREDUCE-1868] - Add read timeout on userlog pull
- [MAPREDUCE-1850] - Include job submit host information (name and ip) in jobconf and jobdetails display
- [MAPREDUCE-1521] - Protection against incorrectly configured reduces
- [MAPREDUCE-1960] - Limit the size of jobconf.
- [MAPREDUCE-1872] - Re-think (user|queue) limits on (tasks|jobs) in the CapacityScheduler
- [MAPREDUCE-1382] - MRAsyncDiscService should tolerate missing local.dir
- [MAPREDUCE-655] - Change KeyValueLineRecordReader and KeyValueTextInputFormat to use new api.
- [MAPREDUCE-369] - Change org.apache.hadoop.mapred.lib.MultipleInputs to use new api.
- [MAPREDUCE-1734] - Un-deprecate the old MapReduce API in the 0.20 branch
- [MAPREDUCE-1906] - Lower minimum heartbeat interval for tasktracker > Jobtracker
- [MAPREDUCE-2103] - task-controller shouldn't require o-r permissions
- [MAPREDUCE-2035] - Enable -Wall and fix warnings in task-controller build
- [MAPREDUCE-1711] - Gridmix should provide an option to submit jobs to the same queues as specified in the trace.
- [MAPREDUCE-1656] - JobStory should provide queue info.
- [MAPREDUCE-1317] - Reducing memory consumption of rumen objects
- [MAPREDUCE-1526] - Cache the job related information while submitting the job , this would avoid many RPC calls to JobTracker.
- [MAPREDUCE-1624] - Document the job credentials and associated details to do with delegation tokens (on the client side)
- [MAPREDUCE-1354] - Incremental enhancements to the JobTracker for better scalability
- [MAPREDUCE-1466] - FileInputFormat should save #input-files in JobConf
- [MAPREDUCE-1403] - Save file-sizes of each of the artifacts in DistributedCache in the JobConf
- [MAPREDUCE-1425] - archive throws OutOfMemoryError
- [MAPREDUCE-1440] - MapReduce should use the short form of the user names
- [MAPREDUCE-1376] - Support for varied user submission in Gridmix
- [MAPREDUCE-476] - extend DistributedCache to work locally (LocalJobRunner)
- [MAPREDUCE-711] - Move Distributed Cache from Common to Map/Reduce
- [MAPREDUCE-478] - separate jvm param for mapper and reducer
- [MAPREDUCE-1250] - Refactor job token to use a common token interface
- [MAPREDUCE-353] - Allow shuffle read and connection timeouts to be configurable
- [MAPREDUCE-1185] - URL to JT webconsole for running job and job history should be the same
- [MAPREDUCE-1231] - Distcp is very slow
- [MAPREDUCE-1048] - Show total slot usage in cluster summary on jobtracker webui
- [MAPREDUCE-1103] - Additional JobTracker metrics
- [MAPREDUCE-947] - OutputCommitter should have an abortJob method
- [MAPREDUCE-277] - Job history counters should be avaible on the UI.
- [MAPREDUCE-270] - TaskTracker could send an out-of-band heartbeat when the last running map/reduce completes
- [MAPREDUCE-817] - Add a cache for retired jobs with minimal job info and provide a way to access history file url
- [MAPREDUCE-1570] - Shuffle stage - Key and Group Comparators
- [MAPREDUCE-739] - Allow relative paths to be created inside archives.
- [MAPREDUCE-1302] - TrackerDistributedCacheManager can delete file asynchronously
- [MAPREDUCE-1489] - DataDrivenDBInputFormat should not query the database when generating only one split
- [MAPREDUCE-1785] - Add streaming config option for not emitting the key
- [MAPREDUCE-1460] - Oracle support in DataDrivenDBInputFormat
- [MAPREDUCE-1569] - Mock Contexts & Configurations
- [MAPREDUCE-1423] - Improve performance of CombineFileInputFormat when multiple pools are configured
- [MAPREDUCE-364] - Change org.apache.hadoop.examples.MultiFileWordCount to use new mapreduce api.
- [MAPREDUCE-1467] - Add a --verbose flag to Sqoop
- [MAPREDUCE-967] - TaskTracker does not need to fully unjar job jars
- [MAPREDUCE-1356] - Allow user-specified hive table name in sqoop
- [MAPREDUCE-1198] - Alternatively schedule different types of tasks in fair share scheduler
- [MAPREDUCE-1169] - Improvements to mysqldump use in Sqoop
- [MAPREDUCE-1224] - Calling "SELECT t.* from <table> AS t" to get meta information is too expensive for big tables
- [MAPREDUCE-370] - Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.
- [MAPREDUCE-999] - Improve Sqoop test speed and refactor tests
- [MAPREDUCE-814] - Move completed Job history files to HDFS
- [MAPREDUCE-906] - Updated Sqoop documentation
- [MAPREDUCE-907] - Sqoop should use more intelligent splits
- [MAPREDUCE-885] - More efficient SQL queries for DBInputFormat
- [MAPREDUCE-876] - Sqoop import of large tables can time out
- [MAPREDUCE-918] - Test hsqldb server should be memory-only.
- [MAPREDUCE-875] - Make DBRecordReader execute queries lazily
- [MAPREDUCE-750] - Extensible ConnManager factory API
- [MAPREDUCE-749] - Make Sqoop unit tests more Hudson-friendly
- [MAPREDUCE-910] - MRUnit should support counters
- [MAPREDUCE-797] - MRUnit MapReduceDriver should support combiners
- [MAPREDUCE-782] - Use PureJavaCrc32 in mapreduce spills
- [MAPREDUCE-789] - Oracle support for Sqoop
- [MAPREDUCE-816] - Rename "local" mysql import to "direct"
- [MAPREDUCE-710] - Sqoop should read and transmit passwords in a more secure manner
- [MAPREDUCE-713] - Sqoop has some superfluous imports
- [MAPREDUCE-674] - Sqoop should allow a "where" clause to avoid having to export entire tables
- [MAPREDUCE-675] - Sqoop should allow user-defined class and package names
- [MAPREDUCE-692] - Make Hudson run Sqoop unit tests
New Feature
- [MAPREDUCE-4355] - Add RunningJob.getJobStatus()
- [MAPREDUCE-3837] - Job tracker is not able to recover job in case of crash and after that no user can submit job.
- [MAPREDUCE-2413] - TaskTracker should handle disk failures at both startup and runtime
- [MAPREDUCE-2777] - Backport MAPREDUCE-220 to Hadoop 20 security branch
- [MAPREDUCE-2323] - Add metrics to the fair scheduler
- [MAPREDUCE-1774] - Large-scale Automated Framework
- [MAPREDUCE-2234] - If Localizer can't create task log directory, it should fail on the spot
- [MAPREDUCE-1938] - Ability for having user's classes take precedence over the system classes for tasks' classpath
- [MAPREDUCE-1733] - Authentication between pipes processes and java counterparts.
- [MAPREDUCE-1680] - Add a metrics to track the number of heartbeats processed
- [MAPREDUCE-1594] - Support for Sleep Jobs in gridmix
- [MAPREDUCE-1493] - Authorization for job-history pages
- [MAPREDUCE-1455] - Authorization for servlets
- [MAPREDUCE-1307] - Introduce the concept of Job Permissions
- [MAPREDUCE-1454] - The servlets should quote server generated strings sent in the response
- [MAPREDUCE-1430] - JobTracker should be able to renew delegation tokens for the jobs
- [MAPREDUCE-1433] - Create a Delegation token for MapReduce
- [MAPREDUCE-1457] - For secure job execution, couple of more UserGroupInformation.doAs needs to be added
- [MAPREDUCE-1432] - Add the hooks in JobTracker and TaskTracker to load tokens from the token cache into the user's UGI
- [MAPREDUCE-1383] - Allow storage and caching of delegation token.
- [MAPREDUCE-744] - Support in DistributedCache to share cache files with other users after HADOOP-4493
- [MAPREDUCE-1338] - need security keys storage solution
- [MAPREDUCE-856] - Localized files from DistributedCache should have right access-control
- [MAPREDUCE-871] - Job/Task local files have incorrect group ownership set by LinuxTaskController binary
- [MAPREDUCE-842] - Per-job local data on the TaskTracker node should have right access-control
- [MAPREDUCE-181] - Secure job submission
- [MAPREDUCE-1026] - Shuffle should be secure
- [MAPREDUCE-467] - Collect information about number of tasks succeeded / total per time unit for a tasktracker.
- [MAPREDUCE-740] - Provide summary information per job once a job is finished.
- [MAPREDUCE-532] - Allow admins of the Capacity Scheduler to set a hard-limit on the capacity of a queue
- [MAPREDUCE-211] - Provide a node health check script and run it periodically to check the node health status
- [MAPREDUCE-679] - XML-based metrics as JSP servlet for JobTracker
- [MAPREDUCE-1341] - Sqoop should have an option to create hive tables and skip the table import step
- [MAPREDUCE-707] - Provide a jobconf property for explicitly assigning a job to a pool
- [MAPREDUCE-698] - Per-pool task limits for the fair scheduler
- [MAPREDUCE-1168] - Export data to databases via Sqoop
- [MAPREDUCE-706] - Support for FIFO pools in the fair scheduler
- [MAPREDUCE-1017] - Compression and output splitting for Sqoop
- [MAPREDUCE-768] - Configuration information should generate dump in a standard format.
- [MAPREDUCE-551] - Add preemption to the fair scheduler
- [MAPREDUCE-987] - Exposing MiniDFS and MiniMR clusters as a single process command-line
- [MAPREDUCE-461] - Enable ServicePlugins for the JobTracker
- [MAPREDUCE-938] - Postgresql support for Sqoop
- [MAPREDUCE-798] - MRUnit should be able to test a succession of MapReduce passes
- [MAPREDUCE-800] - MRUnit should support the new API
- [MAPREDUCE-705] - User-configurable quote and delimiter characters for Sqoop records and record reparsing
Task
Test
- [MAPREDUCE-2638] - Create a simple stress test for the fair scheduler
- [MAPREDUCE-2331] - Add coverage of task graph servlet to fair scheduler system test
- [MAPREDUCE-2180] - Add coverage of fair scheduler servlet to system test
- [MAPREDUCE-2073] - TestTrackerDistributedCacheManager should be up-front about requirements on build environment
- [MAPREDUCE-2051] - Contribute a fair scheduler preemption system test
- [MAPREDUCE-2034] - TestSubmitJob triggers NPE instead of permissions error
- [MAPREDUCE-670] - Create target for 10 minute patch test build for mapreduce
- [MAPREDUCE-686] - Move TestSpeculativeExecution.Fake* into a separate class so that it can be used by other tests also
- [MAPREDUCE-1093] - Java assertion failures triggered by tests
- [MAPREDUCE-1092] - Enable asserts for tests by default