commit 4961783e96adace7bc1f8c0dcc4c9e98b21faee2 Author: Tom White Date: Thu Feb 14 11:12:37 2013 +0000 CLOUDERA-BUILD. JT HA: job is not recovered when jt transitions to active then to standby and then to active (follow up) Reason: Bug Ref: CDH-10365 Author: Tom White commit c0d302737a86b5d2cb87383f5c3bf285ef2f0b48 Author: Alejandro Abdelnur Date: Tue Feb 12 16:14:27 2013 -0800 CLOUDERA-BUILD. hadoop client must exclude servlet/jsp/jetty/tomcat JARs. Reason: bug, this creates conflicts in Oozie and HttpFS Author: Alejandro Abdelnur Ref: CDH-10421 commit 02f0e249f5e16ea28b959e240316a3b2ffeda93d Author: Alejandro Abdelnur Date: Mon Feb 11 10:39:03 2013 -0800 CLOUDERA-BUILD. JT HA: job is not recovered when jt transitions to active then to standby and then to active Reason: bug Author: Tom White Ref: CDH-10365 commit fbd9572d65ddf210d1ce2983133b6722d455d685 Author: Tom White Date: Fri Feb 1 14:14:13 2013 +0000 CLOUDERA-BUILD. JT HA: if job is run with mapred.job.restart.recover=false, job client hangs on failover Reason: Bug Ref: CDH-10247 Author: Tom White commit 5a7127134209bf58ae41ba58c604e297052a8084 Author: Alejandro Abdelnur Date: Wed Feb 6 12:34:43 2013 -0800 MAPREDUCE-4977. Documentation for pluggable sort Reason: backport, new feature Author: Alejandro Abdelnur Ref: CDH-8388 commit a3eb0b3d88dc389eddabd25e4a6617bada34d7b6 Author: Karthik Kambatla Date: Wed Jan 30 23:55:46 2013 -0800 Revert "CLOUDERA-BUILD. Fix DataChecksum API usage to reflect HADOOP-8700 changes" This reverts commit be0ed1e32fe18f2b1ab327414fbbc8789bb1e0ce. (cherry picked from commit 6538b2036bf08b308ba6d4777f539f0c2d248ccd) commit 6d593e1d85ef0c4ee56dcebf9a66d7c5ef2eeb13 Author: Tom White Date: Wed Jan 30 15:15:51 2013 +0000 CLOUDERA-BUILD. Wait for RUNNING state when transitioning to active in JT HA. Reason: Bug Ref: CDH-10167 Author: Tom White commit b267414641734a769565e2e956962bcb6b216314 Author: Jenkins slave Date: Wed Jan 30 10:28:42 2013 -0800 Preparing for CDH4.2.0 release commit 66737601a0057b563e5b3754ee93a28a72d27881 Author: Tom White Date: Tue Jan 29 12:01:51 2013 +0000 CLOUDERA-BUILD. Fair scheduler does not terminate its EagerTaskInitializationListener. Reason: Bug Ref: CDH-10167 Author: Tom White commit 71ac6646b0da7db997e841b6161049aa29bb2662 Author: Alejandro Abdelnur Date: Mon Jan 28 16:16:03 2013 -0800 CLOUDERA-BUILD. JT HA: jobtrackerha daemon shuts down on secure cluster after a while Reason: bug Author: Alejandro Abdelnur Ref: CDH-10165 commit 851c6751938fa5f0ea0d8c01c19c449a4d923c66 Author: Alejandro Abdelnur Date: Mon Jan 28 09:58:58 2013 -0800 CLOUDERA-BUILD. ReduceTask class requires public visibility for pluggable sort. Reason: bug Author: Alejandro Abdelnur Ref: CDH-10154 commit 00df40130c6d414374dcad6e58c5bd1a1bd6d463 Author: Alejandro Abdelnur Date: Sat Jan 26 20:33:41 2013 +0000 MAPREDUCE-4962. jobdetails.jsp uses display name instead of real name to get counters. (sandyr via tucu) git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1438956 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit 71a4429f006b489cda51862274751f446af7fbb6) Reason: Bug Ref: CDH-9004 Author: Sandy Ryza commit 88ef1a6de8255ad72eb6641b420414f78053b543 Author: Thomas White Date: Fri Jan 25 10:57:52 2013 +0000 MAPREDUCE-2931. LocalJobRunner should support parallel mapper execution. Contributed by Sandy Ryza. git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1438447 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit 14369a4ed7138a5eae46e95c5fe6d3ed61c72e34) Reason: New Feature Ref: CDH-8337 Author: Sandy Ryza commit fc39bca2c84e1c0941e0dd2cf184805ba67d90e0 Author: Alejandro Abdelnur Date: Sat Jan 26 04:29:54 2013 +0000 MAPREDUCE-4963. StatisticsCollector improperly keeps track of Last Day and Last Hour statistics for new TaskTrackers. (rkanter via tucu) git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1438843 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit dd62cd17b0f975d22504162316072180fa03b141) commit 3418f3180a8c25b449db533c962a7aa4624ef3b1 Author: Alejandro Abdelnur Date: Fri Jan 25 01:05:12 2013 +0000 MAPREDUCE-2264. Job status exceeds 100% in some cases. (devaraj.k and sandyr via tucu) git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1438286 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit 142e1b8b722c53be5264310f64c81e8afcda5457) Reason: Customer/support request Ref: CDH-7179 Author: Sandy Ryza commit 3a5f8bccba102fe85f1e04c53684c1b5ef7c4426 Author: Alejandro Abdelnur Date: Fri Jan 25 16:33:35 2013 -0800 CLOUDERA-BUILD. JT HA mrhaadmin does not work when hadoop.security.authorization is set to true Reason: bug Auhor: Alejandro Abdelnur Ref: CDH-10099 commit 331f9b224fb307ea714964e681a650787c912c66 Author: Alejandro Abdelnur Date: Fri Jan 25 15:12:51 2013 -0800 CLOUDERA-BUILD. oozie distcp fails on secure mr1 cluster Reason: bug, this is a backport of a functionality present in distcp-v2 (Hadoop2) Author: Alejandro Abdelnur Ref: CDH-9905 commit 95c447a4ed50d2d420bec7eb015df868e399fd0b Author: Sean Mackrory Date: Thu Jan 24 13:55:29 2013 -0800 CDH-9274: Oro was removed from Ivy dependencies, removing it from Maven to prevent broken symlinks in packaging commit 3237cd75d0065c50d0f228f4139d5fe3ed417587 Author: Thomas White Date: Wed Jan 23 11:06:39 2013 +0000 MAPREDUCE-4929. mapreduce.task.timeout is ignored. Contributed by Sandy Ryza. git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1437343 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit f92d8753bc4ab98f68f6556fa802bcd11e1cabfd) Reason: Support/customer request Author: Sandy Ryza Ref: CDH-9078 commit f90ee552f2f9fa059d26dd214fd02dc38b0c1307 Author: Alejandro Abdelnur Date: Thu Jan 24 10:27:23 2013 -0800 CLOUDERA-BUILD. jt ha does not work in a secure cluster. Reason: bug Author: Alejandro Abdelnur Ref: CDH-9873 commit 9b6c8d72b00717d6451f4ad3dd1ec0a508db3928 Author: Alejandro Abdelnur Date: Wed Jan 23 10:33:14 2013 -0800 MAPREDUCE-4808. Allow reduce-side merge to be pluggable. (masokan via tucu) Reason: enable pluggable sort Author: Alejandro Abdelnur Ref: CDH-6920 commit 859c5c54e651032e39abac871051b794405d1fea Author: Alejandro Abdelnur Date: Wed Jan 23 10:13:11 2013 -0800 MAPREDUCE-4807. Allow MapOutputBuffer to be pluggable. (masokan via tucu) Reason: enable pluggable sort Author: Alejandro Abdelnur Ref: CDH-6920 commit 85e6873f36ba9635ba7a98422dc6c274b33ba1e6 Author: Karthik Kambatla Date: Wed Jan 23 15:51:14 2013 -0800 CLOUDERA-BUILD. Backport MAPREDUCE-4562 to MR1 for compatibility with MR2 Reason: Compatibility with MR2 Ref: CDH-9978 Author: Jarcec commit a8dca2b96be49df5c3048767f487d95a6bf71120 Author: Karthik Kambatla Date: Wed Jan 23 14:48:56 2013 -0800 CDH-9701. Revert MAPREDUCE-2492 and other dependent JIRAs. The reverted commits are: e804373 "CDH-9220. Fix TestStreamingStatus to check log contains error" e809c60 "CDH-9220. TestStreamingStatus failing - regression due to CDH-8955" 6381204 "Backport MAPREDUCE-4800. Remove Unnecessary code from MapTaskStatus." 72a156f "Backport MAPREDUCE-2492. The new MapReduce API should make available task's progress to the task." Reason: Maintain binary compatibility for rolling upgrade Ref: CDH-9701 commit 88d67503bbaeed2576db4ef58ad3686881ea14c8 Author: Alejandro Abdelnur Date: Fri Jan 18 00:43:28 2013 +0000 MAPREDUCE-4923. Add toString method to TaggedInputSplit. (sandyr via tucu) git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1434995 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit bed22b230563228cdc5c6366b0af2e5ab48ae63a) Reason: For ZD-10823 Author: Sandy Ryza Ref: CDH-9861 commit aa95fae8042e5ce967c59d722177cad5cad069f5 Author: Alejandro Abdelnur Date: Wed Jan 16 23:56:04 2013 +0000 MAPREDUCE-4315. jobhistory.jsp throws 500 when a .txt file is found in /done. (sandyr via tucu) git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1434506 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit 61ee40b37429c41d9cbc4c619b6db4a12644dfa0) commit 3c41fab664a71f4309c4f724e3711c425b24d940 Author: Tom White Date: Thu Jan 17 14:48:35 2013 +0000 CLOUDERA-BUILD. Don't shut down jobtracker HA process if an error occurs while active jobtracker is being closed. Reason: Bug Ref: CDH-9687 Author: Tom White commit cd6efe8f444e1ecc5c6e3b031cca2b7e1d0c5315 Author: Harsh J Date: Tue Jan 15 14:13:16 2013 +0000 MAPREDUCE-4930. Backport MAPREDUCE-4678 and MAPREDUCE-4925 to branch-1. Contributed by Karthik Kambatla and Chris McConnell. (harsh) git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1433428 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit 3164be4d5b0f574058844cde984f72202a0d5293) Reason: Customer request Ref: CDH-8172 Author: Karthik Kambatla and Chris McConnell commit 2bfe85699f070c903427b9f485533aca5f3e9dd9 Author: Alejandro Abdelnur Date: Wed Jan 16 01:07:35 2013 +0000 MAPREDUCE-4924. flakey test: org.apache.hadoop.mapred.TestClusterMRNotification.testMR. (rkanter via tucu) git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1433782 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit d0fb29450104fb814e2b492af5db8581f3a02109) commit 4d7c7e8f4aab7d72513f61d8193869a8825e19be Author: Matthew J. Foley Date: Thu Dec 15 08:51:13 2011 +0000 MAPREDUCE-3475. JT can't renew its own tokens. Contributed by Daryn Sharp. git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1214660 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit 0279d7f6966f430413ed3e13d5be694200127948) commit 17452b7cdea78dafc9cd234e1ffb03a68006b27f Author: Tom White Date: Tue Jan 15 21:16:34 2013 +0000 CLOUDERA-BUILD. Avoid port conflicts in JT HA tests. Reason: Test Ref: CDH-9819 Author: Tom White commit 9b8ba78621a5016cfd581652d3c88a09a385e317 Author: Alejandro Abdelnur Date: Tue Jan 15 10:39:42 2013 -0800 CLOUDERA-BUILD. JT HA: Support delegation tokens with logical JT names under JT HA Reason: bug Author: Alejandro Abdelnur Ref: CDH-9178/CDH-9615 commit 322841ba1f773d264b66530b1764b3a7fb043005 Author: Robert Kanter Date: Mon Jan 14 11:55:19 2013 -0800 CLOUDERA-BUILD. mapreduce.FileOutputCommitter#abortTask should throw IOException (CDH-9238) commit 2e2f2de749dcd0f5f7987d693d38c1d790a254a9 Author: Robert Joseph Evans Date: Fri Jan 11 19:26:29 2013 +0000 MAPREDUCE-4933. MR1 final merge asks for length of file it just wrote before flushing it. Contributed by Sandy Ryza git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1432243 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit 82dafad81421ef839d578b6a6da00841bfae7f34) Reason: So that MAPREDUCE-2264 won't cause tests to fail. Author: Sandy Ryza Ref: CDH-7179 commit b6ee087eeff3aa9ebdf2dd7d91404330d1a98d18 Author: Tom White Date: Tue Dec 4 17:52:19 2012 +0000 CLOUDERA-BUILD. Add a stress test for JT HA. Reason: Test Ref: CDH-9371 Author: Tom White commit b8be34fede48ad0e4a379a973090ab590cb4343c Author: Robert Kanter Date: Thu Jan 10 22:06:01 2013 -0800 CLOUDERA-BUILD. Simple fix for org.apache.hadoop.streaming.TestMultipleCachefiles.testMultipleCachefiles failing This code change is already in upstream branch-1 CDH-9766 commit a42d3944fd074c67553e0420e94f058f01e158b6 Author: Alejandro Abdelnur Date: Thu Jan 10 00:55:11 2013 +0000 MAPREDUCE-4907. TrackerDistributedCacheManager issues too many getFileStatus calls. (sandyr via tucu) git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1431168 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit e5f94cdecc0226cd95ef3deff738cb0cf199416b) commit 8d69c35fe50303941a8d4b35314eecec31ba6d71 Author: Tom White Date: Wed Jan 9 17:59:27 2013 +0000 HADOOP-9051 Fix ant clean/test with circular symlinks in build dir. (llu) git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1422178 13f79535-47bb-0310-9956-ffa450edef68 commit d9f50576a2d642520643703949dc206071a0a06c Author: Tom White Date: Wed Dec 5 15:46:11 2012 +0000 MAPREDUCE-4850. Job recovery may fail if staging directory has been deleted. commit d92a27a53c521245d508e0300c87b0fc0458d2ae Author: Thomas White Date: Tue Jan 8 16:37:41 2013 +0000 MAPREDUCE-4278. Cannot run two local jobs in parallel from the same gateway. Contributed by Sandy Ryza. git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1430371 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit 9e1e587fa68aef4e6c2eaa7f72d15833790077b5) commit d59778e3b68da9e816aab1df52a2a7f8b78e1ef6 Author: Harsh J Date: Thu Sep 27 16:33:51 2012 +0000 MAPREDUCE-4464. Reduce tasks failing with NullPointerException in ConcurrentHashMap.get(). Contributed by Clint Heath. (harsh) git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1391089 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit f7b6d7ab0173d4e863bb45399a2aea4551f48c69) commit 41654f520f1f784004cf290dc9602d91cb4e7bb8 Author: Tom White Date: Mon Jan 7 14:48:53 2013 +0000 CLOUDERA-BUILD. JT HA: Further fix to ignore port in logical names. Reason: Bug Ref: CDH-9608 Author: Tom White commit be0b5c09fce5fc4550f36915d01a2ef10a1ed870 Author: Alejandro Abdelnur Date: Thu Jan 3 11:49:55 2013 +0000 MAPREDUCE-2217. The expire launching task should cover the UNASSIGNED task. (schen and kkambatl via tucu) git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1428304 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit 6a4e45250b89b01920e6d8ab86623c08304c2229) Reason: Tasks hang in the UNASSIGNED state on faulty TTs Ref: CDH-8889 Author: Karthik Kambatla and Scott Chen commit 7da9b6e4f96ec1438499c683abaa921d42d6c83c Author: Alejandro Abdelnur Date: Fri Jan 4 10:40:29 2013 +0100 CLOUDERA-BUILD. JT HA: if neither jt is active, web ui redirect leads to a redirect loop. Reason: bug Author: Alejandro Abdelnur Ref: CDH-9676 commit 951fd912d743d4558acddf813eed27bb5cac9f6a Author: Robert Kanter Date: Thu Jan 3 12:59:34 2013 -0800 MAPREDUCE-3607. Port missing new API mapreduce lib classes to 1.x. (Adding some missing files/changes from the original backport) Original backport at commit: c0075b2a0de23e41a3ae600f5fdd5e9c181c4c15 commit 8f2986bf1beca1b4720ad0285e31625477028ac3 Author: Tom White Date: Wed Jan 2 16:31:46 2013 +0000 CLOUDERA-BUILD. JT HA: Ignore port in logical names. Reason: Bug Ref: CDH-9608 Author: Tom White commit 3a5dab56ae416886278c735c69be5a1bbb81c217 Author: Alejandro Abdelnur Date: Thu Jan 3 15:45:54 2013 +0100 CLOUDERA-BUILD. Create service file for MapReduce DelegationTokenIdentifier. Reason: bug Author: Alejandro Abdelnur Ref: CDH-9373 commit 43bf6f1784c375b830c6ba44b9009cebe76ad539 Author: Robert Joseph Evans Date: Tue Apr 10 20:14:23 2012 +0000 MAPREDUCE-1238. mapred metrics shows negative count of waiting maps and reduces (tgraves via bobby) git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1311966 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit 15c91ff34b4bd0a53e53d3c9dfb02472d0012539) Reason: Metrics report negative number of waiting maps and reduces Ref: CDH-8336 Author: Sandy Ryza commit aa525b552322295645a3a5671cd60fbd8bb9268e Author: Sandy Ryza Date: Sun Dec 23 10:43:59 2012 -0800 CLOUDERA_BUILD. Fix TokenCache to compile after HADOOP-7967 Reason: Fix compilation Ref: CDH-9617 Author: Sandy Ryza commit 7059324b1d9bad81a26292ea317f5d726d81e9a3 Author: Mark Grover Date: Wed Dec 19 17:56:04 2012 -0800 CDH-8545: Package Job Tracker High Availability commit 71ddc5e70fe68163cc8fa656c36fffcef42aa7e5 Author: Karthik Kambatla Date: Wed Dec 19 13:22:02 2012 -0800 CLOUDERA_BUILD. Ignore TestRecoveryManager#testJobTrackerRestartsWithMissingJobFile Reason: Temporarily ignore failing test Ref: CDH-9566 Author: Karthik Kambatla commit 17179c503fce97b17ffdbbb3ca01437ad954026f Author: Tom White Date: Wed Dec 19 12:46:45 2012 +0000 CLOUDERA-BUILD. JT HA: should not need to specify mapred.ha.jobtracker.id Reason: Bug Ref: CDH-9490 Author: Tom White commit a1f7887c18e28d7c3be87eaa1695b3990e6dbf20 Author: Alejandro Abdelnur Date: Thu Dec 13 22:32:22 2012 +0000 MAPREDUCE-4860. DelegationTokenRenewal attempts to renew token even after a job is removed. (kkambatl via tucu) git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1421582 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit f89fdb59ea132ba9e74bfb53cce788584d2e9639) Reason: Fix failing tests Ref: CDH-9041 Author: Karthik Kambatla commit a4150b7686f4da2823aba62c928c7b116f98242f Author: Karthik Kambatla Date: Tue Dec 18 17:13:56 2012 -0800 CLOUDERA_BUILD. Fix TestQueueManagerForJobKill* tests that fail post HDFS-2264 Reason: Fix failing tests Ref: CDH-9469 Author: Karthik Kambatla commit 39f383c581ea75b0d99f8b01a6846494ecf7a4b7 Author: Thomas White Date: Tue Dec 18 15:26:29 2012 +0000 MAPREDUCE-4845. ClusterStatus.getMaxMemory() and getUsedMemory() exist in MR1 but not MR2. Contributed by Sandy Ryza. git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1423472 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit ccee91c60f729843d4309142d24f8acf0b002eac) commit 4461c8ac74aaa0632ae0fd2f0992ed36879a817c Author: Tom White Date: Tue Dec 18 12:32:51 2012 +0000 MAPREDUCE-4824. Provide a mechanism for jobs to indicate they should not be recovered on restart. Reason: New feature Author: Tom White Ref: CDH-8915 commit b318388384c12e05feb0c8960ba6bb7196031c5b Author: Tom White Date: Mon Dec 17 17:43:57 2012 +0000 MAPREDUCE-4859. TestRecoveryManager fails on branch-1. Reason: Fix tests Author: Tom White Ref: CDH-9543 commit 6dad0fd099998140db51068ad15f984b163e39dd Author: Tom White Date: Mon Dec 10 12:04:51 2012 +0000 CLOUDERA_BUILD. Change ZooKeeper version to version in rest of stack. Reason: Consistency Ref: CDH-9257 Author: Tom White commit be0ed1e32fe18f2b1ab327414fbbc8789bb1e0ce Author: Karthik Kambatla Date: Tue Dec 11 14:00:04 2012 -0800 CLOUDERA-BUILD. Fix DataChecksum API usage to reflect HADOOP-8700 changes Reason: Fix the build Ref: CDH-9432 Author: Karthik Kambatla commit f37b49a2fc04c1da5b9fcf6e667afaa4ae7de77f Author: Tom White Date: Mon Dec 10 11:28:34 2012 +0000 CLOUDERA_BUILD. Revert ZooKeeper version to 3.4.2. Reason: Regression Ref: CDH-8916 Author: Tom White commit 20d35b1b8e19168ff7dac971bc093bc96a392dd8 Author: Tom White Date: Tue Dec 4 14:38:57 2012 +0000 CLOUDERA_BUILD. Implement software fencing for JT HA. Reason: New feature Ref: CDH-8916 Author: Tom White commit e804373478c42d572403f5a5c5590c70f11da477 Author: Karthik Kambatla Date: Thu Dec 6 16:48:33 2012 -0800 CDH-9220. Fix TestStreamingStatus to check log contains error Reason: Test failure Ref: CDH-9220 Author: Karthik Kambatla commit e809c60dec51c933df5dccb639f4a8535ea35346 Author: Karthik Kambatla Date: Tue Dec 4 18:50:02 2012 -0800 CDH-9220. TestStreamingStatus failing - regression due to CDH-8955 Fix TestStreamingStatus#validateTaskStatus to set finalPhase of map task to be sort Reason: Fix regression test failure Ref: CDH-9220 Author: Karthik Kambatla commit 7a44a5a3aeceb8d1703ad3693d5e6c19e90483dc Author: Karthik Kambatla Date: Tue Dec 4 15:01:51 2012 -0800 CDH-9273. Fix the bug caused by CDH-8955 (ContextFactory doesn't reflect other changes) Reason: Bug fix (unit tests fix) Ref: CDH-9273 Author: Karthik Kambatla commit 93050991a6e1755bbfddf7086bfeded090e0fe46 Author: Thomas White Date: Wed Nov 28 14:43:28 2012 +0000 MAPREDUCE-4778. Fair scheduler event log is only written if directory exists on HDFS. Contributed by Sandy Ryza. git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1414731 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit ef32c3f8fcafe90eeb63a1cfc42969c058abf190) commit 368c078d039959bbdb93f1d48aaafbdb87fee7f4 Author: Tom White Date: Thu Oct 4 11:30:36 2012 +0100 CLOUDERA_BUILD. JobTracker HA. Reason: New feature Ref: CDH-8379 Author: Tom White commit 2a4ac2507bf07c961c7a0f9ecd0136511438e5d7 Author: Harsh J Date: Mon Nov 26 17:05:02 2012 +0530 MAPREDUCE-3678. The Map tasks logs should carry the input split it processed. Description: MR map tasks don't currently log the splits they use, so its harder to debug to find the file behind a specific failed task. Reason: Customer Request Author: Harsh J Ref: CDH-6951 commit 3ef0a0bea1a5cf57c046c6ad3f1e24e1689c0539 Author: Roman Shaposhnik Date: Sat Nov 24 20:17:13 2012 -0800 CDH-7625. buildscript from mr1-2.0.0-mr1-cdh4.0.1.tar.gz is misleading and the needed one is missing commit 63812047fa4b06afc4705e01af31648b49922132 Author: Karthik Kambatla Date: Mon Nov 19 16:54:42 2012 -0800 Backport MAPREDUCE-4800. Remove Unnecessary code from MapTaskStatus. Reason: Part of the fix for CDH-8955 (Mapper progress is not reported) Ref: CDH-8955 Author: Karthik Kambatla commit 72a156ff368300090950ad846a28d94af602047f Author: Karthik Kambatla Date: Fri Nov 9 16:10:56 2012 -0800 Backport MAPREDUCE-2492. The new MapReduce API should make available task's progress to the task. Also, backport relevant/related parts of MAPREDUCE-318 and HADOOP-4687 Reason: Bug fix - New API not updating the progress correctly Ref: CDH-8955 Author: Amar Kamat commit 2c63c99b7fd60df207ce371522c0d4783b2d5211 Author: Karthik Kambatla Date: Fri Nov 9 14:11:36 2012 -0800 Backport MAPREDUCE-1905. Fix Context.setStatus() and Context.progress APIs Reason: Bug fix - Status and progress not being reported correctly Ref: CDH-8955 Author: Amareshwari Sriramadasu commit a5e9f6cf2fa7ed7a3bd3046ae0853f6915812ca5 Author: Roman Shaposhnik Date: Tue Nov 20 15:59:31 2012 -0800 CDH-7625. buildscript from mr1-2.0.0-mr1-cdh4.0.1.tar.gz is misleading and the needed one is missing commit 95ab3127d87d9aa2b168fa0bb5481ebf68e016a0 Author: Robert Joseph Evans Date: Fri Sep 7 16:47:42 2012 +0000 MAPREDUCE-4629. Remove JobHistory.DEBUG_MODE (Karthik Kambatla via bobby) git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1382090 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit a66cfeb30adc3bf68c6457ef5419e96f4243a40b) Reason: Fix customer issue in disabled debug mode Ref: CDH-8600 Author: Karthik Kambatla (cherry picked from commit c0dc6e2af12f0f15160cde320d664b738bd2cfca) commit 9303593313ab41958557b0f1682545a807b7da8f Author: Alejandro Abdelnur Date: Wed Nov 7 21:13:11 2012 +0000 MAPREDUCE-4765. Restarting the JobTracker programmatically can cause DelegationTokenRenewal to throw an exception. (rkanter via tucu) git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1406810 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit f825dbb8c23bd5c8f63bf1e0224ffff6bcf258a6) commit ed778edb936f5e1eaa0fa531354e22a81626ce69 Author: Alejandro Abdelnur Date: Tue Oct 30 05:49:04 2012 +0000 MAPREDUCE-1806. CombineFileInputFormat does not work with paths not on default FS. (Gera Shegalov via tucu) git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1403617 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit 07244962b26e92e47f261f7db308278105c526ee) commit 10a0fd217f5b78ae3999d8e56fa0ff91731f14f6 Author: Alejandro Abdelnur Date: Thu Oct 25 16:01:47 2012 -0700 HADOOP-8968. add flag to disable completely version check in the TaskTracker and DataNode. (tucu) Reason: to enable rolling upgrades Author: Alejandro Abdelnur Ref: CDH-8562 commit d3bd3b84e2096b857fdc3566d3dadc6a18016b04 Author: Alejandro Abdelnur Date: Thu Oct 11 21:53:25 2012 +0000 MAPREDUCE-4451. fairscheduler fail to init job with kerberos authentication configured. (erik.fang via tucu) git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1397330 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit 82fae0f9071121bdcde24f04336a05901b120b85) Reason: FairScheduler job initialization issues on secure cluster Ref: CDH-7461 Author: Erik Fang (cherry picked from commit 09f12262a74649171e086c2fff7a3d0ed6dd63a1) commit 221a6ea2c6fec211963304e5d3540ee5ae49a731 Author: Roman Shaposhnik Date: Mon Oct 8 14:23:52 2012 -0700 CLOUDERA-BUILD. mr1 tarball build requires native (CDH-7918) commit f04f1925961611e05a2ff401c0f2f0aede36ab0a Author: Alejandro Abdelnur Date: Thu Oct 4 23:55:26 2012 +0000 MAPREDUCE-4556. FairScheduler: PoolSchedulable#updateDemand() has potential redundant computation (kkambatl via tucu) git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1394331 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit ceb0b88f68b1da45a3ab854edecd0e040f5937bb) Reason: FS Optimization Ref: CDH-8454 Author: Karthik Kambatla (cherry picked from commit f231e7eaa380a68c54f8ae287fbaa89767dd067b) commit 401c4be5b3104f085416f19fa41b0d0f88c962dc Author: Thomas White Date: Mon Oct 8 11:44:42 2012 +0000 MAPREDUCE-4706. FairScheduler#dump(): Computing of # running maps and reduces is commented out. Contributed by Karthik Kambatla. git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1395520 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit 31c02cb4b0be9f298f11f7fac584ee603afb4973) Reason: Bug fix (Better debugging of FS) Ref: CDH-8210 Author: Karthik Kambatla (cherry picked from commit df1cad6a6ec42818840282816ddad9fb137bbe32) commit 4e70412422a9d9da4258603988af849ed0249b65 Author: Robert Kanter Date: Fri Oct 5 16:37:04 2012 -0700 MAPREDUCE-2786. TestDFSIO should also test compression reading/writing from command-line. commit de58daa99978480c14af4bbb808a221c8d3457bc Author: Robert Kanter Date: Mon Oct 8 09:37:15 2012 -0700 CLOUDERA-BUILD. fix to allow hadoop-test jar to run in MR1 (CDH-8097) commit 45d4d6fd9d01938620b90530e4719dfd1c7c500d Author: Thomas White Date: Tue Sep 25 11:51:19 2012 +0000 MAPREDUCE-4652. ValueAggregatorJob sets the wrong job jar. Contributed by Ahmed Radwan. git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1389821 13f79535-47bb-0310-9956-ffa450edef68 commit 8daabea86be3801ccdec550be61bdf92746d8602 Author: Eli Collins Date: Fri Sep 21 13:18:15 2012 -0700 CLOUDERA-BUILD. Point *-default doc links to the right place. commit 461f101523e1ebf7cb4723743b240f035732c7f2 Author: Tom White Date: Wed Sep 19 11:06:54 2012 +0100 MAPREDUCE-2185. Infinite loop at creating splits using CombineFileInputFormat Reason: Regression Ref: CDH-8046 Author: Ramkumar Vadali (cherry picked from commit 8292191d239edf2e19042e0388edb7254cbdf292) commit e29f2b1ac1ed291603e2b48b9db986b969c3c972 Author: Tom White Date: Wed Sep 19 10:56:27 2012 +0100 MAPREDUCE-4470. Fix TestCombineFileInputFormat.testForEmptyFile Reason: Regression Ref: CDH-8046 Author: Ilya Katsov (cherry picked from commit 7ee3100b5814f304b34f936aa442ca6af47020d8) commit 82fb388c7604c92b8788702d264b36a13b17de6d Author: Tom White Date: Tue Sep 18 17:03:13 2012 +0100 CLOUDERA-BUILD. MR1 returns 0 splits when the input directory is empty Reason: Regression Ref: CDH-8046 Author: Tom White (cherry picked from commit 8cd53b02af406640f30ad76a6ce80fc0b8660db4) commit bd4100e30c377d61616e1fb00c1d219240c3262f Author: Andrew Bayer Date: Thu Sep 13 09:40:24 2012 -0700 CLOUDERA-BUILD. Updating build.xml's version to be consistent. commit cb3cecccc92af3fb1a99bf81072eec2ff0c1ff03 Author: Tom White Date: Wed Sep 12 13:21:25 2012 +0100 CLOUDERA-BUILD. JobClient constructors should call setConf(conf). Reason: Bug Ref: CDH-7900 Author: Tom White (cherry picked from commit c166f79e00d08d182fc69010f1ef2a4b41f7f9cb) commit 36235703070afff05f4f071a0a75b0ef0daaadeb Author: Tom White Date: Wed Sep 12 14:40:17 2012 +0100 HDFS-3910 (DFSTestUtil#waitReplication should timeout). Reason: Improvement Ref: CDH-7935 Author: Eli Collins and Tom White (cherry picked from commit 7405ca726f8ccf360f1206d14c393be19c32fb18) commit 9e61675f1cd8f567adc6ee6be6010ebf629517e8 Author: Alejandro Abdelnur Date: Mon Sep 10 22:49:52 2012 +0000 HADOOP-8781. hadoop-config.sh should add JAVA_LIBRARY_PATH to LD_LIBRARY_PATH. (tucu) git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1383145 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit 60c49a8e938219585c42ad167d040a0431fb9fa5) Conflicts: CHANGES.txt commit 26f5ea2ed12fe5bf3ddb3cc2ba226b5cf25de643 Author: Sean Mackrory Date: Fri Sep 7 15:26:42 2012 -0700 CDH-6973: Mockito needs to be distributed because hadoop-test.jar needs it at runtime commit e7b29d40e29587026ed5bc4f96ef72788a27ef57 Author: Jenkins slave Date: Thu Sep 6 15:39:50 2012 -0700 Preparing for CDH4.2.0 development commit 9a653c2bde2bccf41322b6adc9c9b2c85d9148a3 Author: Todd Lipcon Date: Thu Sep 6 12:09:40 2012 -0700 CLOUDERA-BUILD. embedded jetty may fail to start about 1/5000 times Upgrades Jetty dependency to 6.1.26.cloudera.2. This addreses a race condition (JETTY-1316) which caused the Acceptor thread to sometimes not start, which caused reducer fetch timeouts, etc. The source for this Jetty release can be found at https://github.com/cloudera/jetty-hadoop-fix and corresponds to commit 51042f4a6cd36fe90be7c0f0d208dca5397b527a in that repository. Author: Todd Lipcon Ref: CDH-7767, MAPREDUCE-3851 commit 3bb6950bcc12d063bb35c2709df9bc471914084b Author: Karthik Kambatla Date: Tue Sep 4 16:22:41 2012 -0700 CLOUDERA-BUILD. Disable JobHistory.DEBUG_MODE by default. Reason: Fix build (TestLostTracker) Ref: CDH-7104 Author: Arun Murthy/Karthik Kambatla commit f39562fc9549307413ee8db87e95d7a098869d88 Author: Tom White Date: Fri Aug 31 10:39:53 2012 +0100 MAPREDUCE-4610. Support deprecated mapreduce.job.counters.limit property in MR2. Reason: Compatibility Ref: CDH-7678 Author: Tom White commit 5fd893ccf5f10194699b9d127bda4dc2d89304aa Author: Owen O'Malley Date: Wed Sep 21 20:30:55 2011 +0000 HADOOP-7644. Add forgotten service provider file. git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security@1173833 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit 240940f4cb9b657b1ed92cf7e913bb140f885964) commit 66a6edbaf22f4504e386d93dd1914a9bc487feac Author: Owen O'Malley Date: Thu Sep 15 22:22:14 2011 +0000 HADOOP-7644. Fix TestDelegationTokenRenewal and TestDelegationTokenFetcher to use and test the new style renewers. (omalley) MapReduce part of: git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security@1171298 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit 7142897eee9f7db0d7aa3785d6cb5e1701af7dcb) Reason: Bug Ref: CDH-6189 Author: Owen O'Malley commit 27903f7aaa755c8006c441f41f47d42b0783b342 Author: Owen O'Malley Date: Fri Sep 9 22:07:35 2011 +0000 MAPREDUCE-2764. Allow JobTracker to renew and cancel arbitrary token types, including delegation tokens obtained via hftp. (omalley) MapReduce part of: git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security@1167374 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit ef21eca97f1d10489003e30435acab24010a2209) Reason: Bug Ref: CDH-6189 Author: Owen O'Malley commit 31b566c69d5521f54c8f9dd63ffb8a4206a8392c Author: Devaraj Das Date: Thu Jun 2 05:03:36 2011 +0000 MAPREDUCE-2452. Moves the cancellation of delegation tokens to a separate thread. Contributed by Devaraj Das. MapReduce part of: git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security@1130409 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit 4d01aab294ad571e6e1915dc710f31d1aeedd3a3) Reason: Bug Ref: CDH-6189 Author: Devaraj Das commit a454aef93def8148817c3e6af38d7f290c7abeb7 Author: Tom White Date: Wed Aug 29 18:02:53 2012 +0100 CLOUDERA-BUILD. MR1 does not have "mradmin -refreshServiceAcl" implemented Reason: Bug Ref: CDH-7644 Author: Tom White commit a7e1c9f8fb4c675aad2a075f7aeca050df4663f6 Author: Alejandro Abdelnur Date: Fri Aug 24 23:42:55 2012 +0000 MAPREDUCE-4408. allow jobs to set a JAR that is in the distributed cached (rkanter via tucu) git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1377152 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit 7c0d407e828e2f70969097c4c0509bec45b73a7f) commit b25a5a4b5a8cd75fcf77735b71d478e909ef2722 Author: Alejandro Abdelnur Date: Mon Aug 27 22:38:13 2012 +0000 MAPREDUCE-4595. TestLostTracker failing - possibly due to a race in JobHistory.JobHistoryFilesManager#run() (kkambatl via tucu) git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1377895 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit 327f6378f2cdb9923b2b54a3ef39f8b207ff8173) Reason: TestLostTracker fails due to a race Ref: CDH-7104 Author: Karthik Kambatla commit c6e51a5102fb17e64624470a859d83ce0787dffc Author: Karthik Kambatla Date: Thu Aug 23 14:05:23 2012 -0700 CDH-7619. TT uses MRAsyncDiskService to delete only specific directories at init/close Reason: Mimic upstream behavior w.r.t creating/deleting directories at startup/shutdown Ref: CDH-7619 Author: Karthik Kambatla commit 5a59eb2ce95a12fd2b42515b8e442a7e073454c2 Author: Aaron Twining Myers Date: Thu Aug 23 18:30:43 2012 +0000 MAPREDUCE-2374. "Text File Busy" errors launching MR tasks. Contributed by Andy Isaacson. git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1376639 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit 5e3e5ec4597257ebb4903cfa8c745ac892d41755) commit 7c346a345db16d2ae9fb77c4f7cb98034cc79bc0 Author: Alejandro Abdelnur Date: Thu Jul 5 16:55:27 2012 +0000 MAPREDUCE-4385. FairScheduler.maxTasksToAssign() should check for fairscheduler.assignmultiple.maps < TaskTracker.availableSlots (kkambatl via tucu) git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1357737 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit 7cd6014991d076dae0616c89885eaa19f338a9eb) Reason: Optimization Ref: CDH-6723 Author: Karthik Kambatla commit 822ab8d0330cc2911d42d553d952859614a2edc9 Author: Karthik Kambatla Date: Wed Aug 22 12:01:24 2012 -0700 CDH-7412. Speedup counter name resolution during ResourceBundle lookup Backport of MR-2855(trunk) and MR-4565(branch-1) Reason: Counter name resolution taking too long Ref: CDH-7412 Author: Karthik Kambatla commit ec73f40829ca668513da2145a7718687aa4a0ee3 Author: Alejandro Abdelnur Date: Mon Jul 9 19:20:37 2012 +0000 MAPREDUCE-3993. Graceful handling of codec errors during decompression (kkambatl via tucu) git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1359348 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit e821891bfea977da78ad86b4ccb08582aa8f7ddd) Reason: Codec errors affecting job execution Ref: CDH-6721 Author: Karthik Kambatla commit 789a260747e492fe3911e9b97f73ff157d758c94 Author: Alejandro Abdelnur Date: Thu Jul 5 16:29:26 2012 +0000 MAPREDUCE-4355. Add RunningJob.getJobStatus() (kkambatl via tucu) git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1357724 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit e976bd7fe67a3ef7f8d62bddc826a195674369dd) Conflicts: - src/test/org/apache/hadoop/mapred/TestNetworkedJob.java (from upstream change) not included as it required several other changes. Reason: Customer (eBay) request Ref: CDH-5730 Author: Karthik Kambatla (cherry picked from commit 45d61ad391961f40f14c5483653d55293f1c551f) commit 8faccb9703c5607d8b76ff17e23314f4618cf26b Author: Karthik Kambatla Date: Thu Aug 16 11:06:50 2012 -0700 CDH-7130. Retain userlogs across JT/TT restarts Reason: Customer/cdh-user request Author: Karthik Kambatla Ref: CDH-7130 commit dc193788dee7d54abba20f8119c917216f423567 Author: Alejandro Abdelnur Date: Wed Aug 15 23:15:09 2012 +0000 MAPREDUCE-4511. Add IFile readahead (ahmed via tucu) git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1373672 13f79535-47bb-0310-9956-ffa450edef68 (cherry picked from commit 1ca5374b8eec25feef38bd5f29308f8ef70f287a) commit 77e46755d68d5bb8e024925437e331fea463c8ad Author: Alejandro Abdelnur Date: Mon Aug 13 21:56:31 2012 +0000 HADOOP-8581. add support for HTTPS to the web UIs. (tucu) Reason: This is a functionality backport as MR1 is based on JT/TT services Author: Alejandro Abdelnur Ref: CDH-6709 git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2@1372642 13f79535-47bb-0310-9956-ffa450edef68 commit cb373105b33538171fe92fbff21d275816834189 Author: Tom White Date: Fri Aug 3 15:15:51 2012 -0400 MAPREDUCE-4487. Reduce job latency by removing hardcoded sleep statements. Reason: Performance Author: Tom White Ref: CDH-6839 commit 0e6bf38e488f27e1848d2c580ea542b8b86f8c4d Author: Harsh J Date: Tue Jul 24 17:54:47 2012 +0000 MAPREDUCE-4415. Backport the Job.getInstance methods from MAPREDUCE-1505 to branch-1. Contributed by Harsh J. (harsh) git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1365193 13f79535-47bb-0310-9956-ffa450edef68 commit e988d51dab1726781d5b3513821ef81ff4f58e4d Author: Alejandro Abdelnur Date: Mon Jul 30 08:44:08 2012 -0700 MAPREDUCE-4417. add support for encrypted shuffle (tucu) Reason: Amendment to testcase to initialize MapOutputServlet outside of the TastTracker Author: Alejandro Abdelnur Ref: CDH-7046 commit 6a32b8bf697490d0ccf610fccc4754bbf75823a8 Author: Tom White Date: Sat Jul 28 18:44:14 2012 -0400 CLOUDERA-BUILD. Fix TestNodeRefresh. Reason: failing tests Author: Tom White Ref: CDH-6343 commit 35390d7be4fa7c3249d81f5fd8a1c056936036c1 Author: Alejandro Abdelnur Date: Thu Jul 26 17:40:15 2012 -0700 MAPREDUCE-4417. add support for encrypted shuffle (tucu) Reason: customer request Author: Alejandro Abdelnur Ref: CDH-6647 commit 59b4cac504232133617bc2e5846b64e9bd7abc45 Author: Tom White Date: Fri Jul 27 10:49:08 2012 -0400 MAPREDUCE-4463. JobTracker recovery fails with HDFS permission issue Reason: Bug fix Author: Tom White Ref: CDH-6870 commit 2885532bb95416199264cca59e8cecc71634a027 Author: Tom White Date: Fri Jul 27 13:55:21 2012 -0400 CLOUDERA-BUILD. Fix failing tests due to incorrect @Ignore usage. Reason: failing tests Author: Tom White Ref: CDH-6343 commit b65578b6d9fc738d029a7754f64c56deb6c3f2ab Author: Andrew Bayer Date: Tue Jul 24 18:07:09 2012 -0700 CDH-5800. Add reactorRepo to MR1 Ivy resolvers. With -DreactorRepo=$HOME/.m2/repository added to the ant call, the local Maven repo will be checked first. commit 35380266854a6a20b37312ab7978464359e5afeb Author: Todd Lipcon Date: Mon Jul 23 08:54:54 2012 -0700 MAPREDUCE-4399. Performance regression in shuffle Reason: performance bugfix Ref: CDH-6700 Author: Luke Lu commit 6a4c7a5ac49d001191e027ab390961defc977e16 Author: Ahmed Radwan Date: Thu Jul 12 00:12:28 2012 -0700 MAPREDUCE-323. Re-factor layout of JobHistory files on HDFS to improve operability. Description: Partial backport of MAPREDUCE-323 to change the JobTracker History view to include "Job Submit Time" rather than "Job Tracker Start Time". Reason: Bug Author: Dick King Ref: CDH-5822 commit 596cd63725fec7e8d522d03daa3c189528af2eb4 Author: Tom White Date: Mon Jul 2 11:38:19 2012 -0400 MAPREDUCE-3837. Job tracker is not able to recover job in case of crash and after that no user can submit job. Reason: Bug fix Author: Mayank Bansal Ref: CDH-6386 commit 521ba5f2f5e083009d8a9eb39c0e4308e73fcfd7 Author: Alejandro Abdelnur Date: Tue Jun 12 17:58:13 2012 -0700 MAPREDUCE-4195. With invalid queueName request param, jobqueue_details.jsp shows NPE (jira.shegalov via tucu) Reason: affects customer Author: Gera Shegalov Ref: CDH-4156 commit d92e97955bff414c9ac62dafc7fd53fa1db272a5 Author: Harsh J Date: Fri Apr 20 10:02:21 2012 +0000 MAPREDUCE-3674. Invoked with no queueName request param, the jobqueue_details.jsp injects a null queue name into schedulers. (harsh) Reason: affects customer Author: Harsh J Ref: CDH-4156 git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1328292 13f79535-47bb-0310-9956-ffa450edef68 commit ed0f6abeee57ed24a8fd9f371fa3a997a37a2cde Author: Alejandro Abdelnur Date: Fri Jun 8 12:45:00 2012 -0700 MAPREDUCE-336. The logging level of the tasks should be configurable by the job Reason: customers request Author: Arun C Murthy Ref: CDH-3931 commit 18e7a7c4ff950864d30fe649469f22d66dca2cb9 Author: Eli Collins Date: Fri Jun 1 15:27:44 2012 -0700 HADOOP-8466. hadoop-client POM incorrectly excludes avro. Reason: Bug Author: Bruno Mahe Ref: CDH-6141 commit e9071ca13eefcde46921dfcf355b7e965847701e Author: Jenkins slave Date: Fri Jun 1 08:35:29 2012 -0700 Preparing for 4.1.0 development. commit 796d350f68b73ace33cb321c74d3620fdfe864c4 Author: Aaron T. Myers Date: Tue May 22 16:18:29 2012 -0700 MAPREDUCE-2839. MR Jobs fail on a secure cluster with viewfs Reason: Bug Author: Siddharth Seth Ref: CDH-5610 commit ba9bfc804b8e5d54f6ea82f777d337546e3daaa2 Author: Roman Shaposhnik Date: Fri May 18 13:42:21 2012 -0700 CDH-5919. MR1 'hadoop mrgroups' command not working commit baf74db4bf4ed1d7d53ab7af1c315837e12eb3fa Author: Alejandro Abdelnur Date: Tue May 15 14:41:25 2012 -0700 CLOUDERA-BUILD. regression: unable to read har created with mr1 Reason: HAR files created with MR1 tools had wrong version Author: Alejandro Abdelnur Ref: CDH-5839 commit 618fd946f4e2c16147fd22d2564544b186b7d176 Author: Andrew Bayer Date: Tue May 15 13:54:10 2012 -0700 Switching to 2.0.0-mr1-cdh4.0.0-SNAPSHOT commit 26a74136377b9c7fe0052b7dae83e543414f846e Author: Andrew Bayer Date: Thu May 10 12:55:38 2012 -0700 CDH-5555. Add -lcrypto to pipes examples compilation. commit ec93fab53cc79eed3bec31f1e4a094f17bb0e318 Author: Alejandro Abdelnur Date: Wed May 9 12:27:19 2012 -0700 CLOUDERA-BUILD. TestWebUIAuthorization.testWebUIAuthorizationForCommonServlets is failing Reason: testcases failures after HADOOP-8343 integration Author: Alejandro Abdelnur Ref: CDH-5691 commit 85eb52237e60cbc9379d96aa2e48f7b3dbb9eede Author: Alejandro Abdelnur Date: Wed May 9 11:53:52 2012 -0700 CLOUDERA-BUILD. Fix MR1 TestFileSystem to work with FS serviceloader Reason: test is failing due to FS serviceloader changes Author: Alejandro Abdelnur Ref: CDH-5738 commit 7b97cd9b274080929468047c26b43a9e2d4a9b1b Author: Andrew Bayer Date: Tue May 8 08:51:48 2012 -0700 CLOUDERA-BUILD. Publish hadoop-streaming jar. commit e4bf750d7952915c5c43c2c17157158389570ca0 Author: Alejandro Abdelnur Date: Mon May 7 20:12:00 2012 -0700 CLOUDERA-BUILD. Remove HAR from MR1 (we should use the one from MR2 common) Reason: HADOOP-7549 (FS serviceloader) breaks MR1 Author: Alejandro Abdelnur Ref: CDH-5656 commit 56d3a38f3c385cc1cc35a8b323326a90047ec88f Author: Tom White Date: Mon May 7 14:16:45 2012 -0600 MAPREDUCE-4226. ConcurrentModificationException in FileSystemCounterGroup. Reason: Bug Author: Tom White Ref: CDH-5658 commit 29960d8d3484ede1f559d5bf29f751ddb48066e9 Author: Ahmed Radwan Date: Fri May 4 01:00:45 2012 -0700 MAPREDUCE-4129. Lots of unneeded counters log messages (Ahmed Radwan via bobby) git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/trunk@1328106 13f79535-47bb-0310-9956-ffa450edef68 commit c2969401900ba5d968227e4d170fc9026479a01f Author: Ahmed Radwan Date: Thu May 3 12:19:03 2012 -0700 MAPREDUCE-3827. Changed Counters to use ConcurrentSkipListMap for performance. Contributed by Vinod K V. git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/trunk@1241711 13f79535-47bb-0310-9956-ffa450edef68 commit f91e996bebaca5fcb7117fb47db61c98679ef614 Author: Tom White Date: Thu May 3 11:30:02 2012 -0700 MAPREDUCE-3809. Tasks may take upto 3 seconds to exit after completion. Reason: Performance Author: Siddharth Seth Ref: CDH-5628 commit 02383d9e01b42bf2d1ba1e3785d4c13475b85242 Author: Roman Shaposhnik Date: Fri Apr 27 13:04:56 2012 -0700 CLOUDERA-BUILD. mr1 pipes binaries should be executable (CDH-5154) commit 9e0eba7e156544ecdfc596f305f42dd977f117c2 Author: Tom White Date: Tue Apr 24 15:09:59 2012 -0700 MAPREDUCE-2450. Calls from running tasks to TaskTracker methods sometimes fail and incur a 60s timeout. Reason: Performance Author: Rajesh Balamohan Ref: CDH-5206 commit 49fec50c189e7b51db9b6085207deb2db150cdc8 Author: Todd Lipcon Date: Sat Apr 21 17:05:41 2012 -0700 CDH-5434. Fix ability to submit MR1 job to secure HA HDFS Reason: Bug Author: Todd Lipcon commit 245afbd7218619c271ec149655312b886199602b Author: hudson Date: Sun Apr 15 14:38:22 2012 -0700 Updating for 4.0.0 development. commit ab09918e0e5b2460c8ed6295e370bb9fc834e81d Author: Aaron T. Myers Date: Thu Apr 12 16:13:40 2012 -0700 HADOOP-8261. Har file system doesn't deal with FS URIs with a host but no port. Author: Aaron T. Myers Reason: Bug Ref: CDH-5166 commit 8952a31dffd44df3c4ee5382a0e4dec18ab21ae9 Author: Eli Collins Date: Thu Apr 12 12:39:41 2012 -0700 HADOOP-8209. Add an option to relax the build version check. Changes the behavior of tasktrackers to only check for a version match (eg "0.20.2-cdh4b2") but ignore the other build fields (revision, user, and source checksum) when checking for compatibility with jobtrackers. In previous releases tasktrackers refused to connect to jobtrackers if their build version (version, revision, user, and source checksum) did not match. This behavior can be restored by disabling hadoop.relaxed.worker.version.check in mapred-site.xml. Author: Eli Collins Reason: Enable rolling upgrades of tasktrackers within an update Ref: CDH-5027 commit 0382938cd326445f66e87e8eb1796b5838b36291 Author: Aaron T. Myers Date: Wed Mar 14 11:16:06 2012 -0700 The LTC should set supplementary groups in addition to euid and egid when switching users. (cherry picked from commit 78ca997f549a89d60b39ae466f02a2797fa8003a) (cherry picked from commit ca684ae7787052ca604204639e8338ad351f3587) commit be18fe23c5602ed65c5780c97f9df46ae00c6d07 Author: Eli Collins Date: Wed Apr 4 00:16:46 2012 -0700 MAPREDUCE-4095. TestJobInProgress#testLocality uses a bogus topology. Contributed by Colin Patrick McCabe git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1@1308520 13f79535-47bb-0310-9956-ffa450edef68 commit 788cfd94d9d40bba0fe9faaf8289c73bbfb62fe6 Author: Patrick Hunt Date: Tue Apr 3 15:43:44 2012 +0000 MAPREDUCE-4012 Hadoop Job setup error leaves no useful info to users. (tgraves) Author: Thomas Graves Reason: Bug Ref: CDH-4854 commit 56e7ed0abcca4f9fb5d3498a956559427d39c071 Author: Patrick Hunt Date: Tue May 10 20:18:44 2011 +0000 MAPREDUCE-2456. Log the reduce taskID and associated TaskTrackers with failed fetch notifications in the JobTracker log. Author: Jeffrey Naisbitt Reason: Improvement Ref: CDH-5019 commit f9e056f8b6c6384a5680e9c99e705585df854615 Author: Roman Shaposhnik Date: Fri Mar 30 09:48:08 2012 -0700 CLOUDERA-BUILD. add pipes (and other c++ libs) to the cdh4 build (CDH-5075) commit e0cc1bf8b959452b3fc8ec5bbe0345cb8e016f63 Author: Roman Shaposhnik Date: Tue Mar 27 12:03:39 2012 -0700 CLOUDERA-BUILD. Seed the classpath of MR1 with MR2 value (when available) commit 48019d0377df69bd8468ba7d6cce94efdf753681 Author: Tom White Date: Mon Mar 19 14:04:13 2012 -0700 CDH-4334. Backport MAPREDUCE-3583 (ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException) commit c2a181bfcb6c2dde8c4a1d4ad60d39b69f814e61 Author: Todd Lipcon Date: Mon Mar 26 12:39:43 2012 -0700 Amend MAPREDUCE-3289. Fix TaskTracker fadvise support to only readahead a chunk at a time Reason: previous implementation could cause reduce fetch timeouts when some map output partitions are very large Author: Todd Lipcon Ref: CDH-5020 commit 9638ec1805dfe9c5521c609ffcb2262d88f5e220 Author: Tom White Date: Tue Mar 13 13:57:13 2012 -0700 MAPREDUCE-1221. Kill tasks on a node if the free physical memory on that machine falls below a configured threshold Reason: Customer request Author: Scott Chen, Tom White Ref: CDH-3828 commit 90de974a275dc15d09828e626e49e2a33f23a473 Author: Ahmed Radwan Date: Wed Mar 21 17:30:36 2012 -0700 MAPREDUCE-1740. NPE in getMatchingLevelForNodes when node locations are variable depth. Author: Ahmed Radwan Reason: Bug Ref: CDH-4278 commit e9ae96fe627860aaa9fafa041e085df02ef833bf Author: Andrew Bayer Date: Tue Mar 20 16:13:41 2012 -0700 CLOUDERA-BUILD. Publish MR1 hadoop-tools jar. commit 3ab97bdbaa08bd12656b545f8d48d7c80234b24d Author: Alejandro Abdelnur Date: Mon Mar 19 20:23:27 2012 -0700 MAPREDUCE-4036 Streaming TestUlimit fails on CentOS 6 (MR1) (tucu) Reason: CentOS 6 JVM has higher minimum memory requirements Author: Alejandro Abdelnur Ref: CDH-4893 commit a41ead47e39176112e23996998476dca56bc8d6b Author: Ahmed Radwan Date: Mon Mar 19 17:49:28 2012 -0700 CLOUDERA BUILD. hadoop-core-0.23.0-mr1 JAR has bogus o.a.h.mapreduce.Cluster Reason: Bug Author: Ahmed Radwan Ref: CDH-4777 commit 5323038a0ad9d727c1c09e9883465d14af768088 Author: Alejandro Abdelnur Date: Mon Feb 6 10:40:32 2012 -0800 MAPREDUCE-3727 jobtoken location property in jobconf refers to wrong jobtoken file. This backport is backporting also a portion of HADOOP-7001 (Configuration#unset method) Reason: Oozie Hive actions are impacted by this bug Author: Alejandro Abdelnur Ref: CDH-4232 commit 960bd85e0d9ba2fb67e121755c9cec7ed989278a Author: Alejandro Abdelnur Date: Fri Mar 16 08:27:21 2012 -0700 MAPREDUCE-4010. TestWritableJobConf fails on trunk (tucu via bobby) git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/trunk@1301551 13f79535-47bb-0310-9956-ffa450edef68 Reason: fixing testcases failures after HADOOP-8167 Author: Alejandro Abdelnur Ref: CDH-4872 commit 398a39ed3ce1ade01cfee361da3c7b892ff2adb8 Author: Alejandro Abdelnur Date: Thu Mar 15 17:18:18 2012 -0700 CLOUDERA BUILD. mr1 test failures "unknown protocol" on golden after 23.3 refresh Reason: GetUserMappingsProtocol from 0.23.3 is PB based Author: Alejandro Abdelnur Ref: CDH-4833 commit e8be44d49903e8e5d93901d512463e3095788142 Author: Roman Shaposhnik Date: Wed Mar 14 17:21:44 2012 -0700 CLOUDERA-BUILD. Making hadoop-client assembly for MR1 commit f6b40f3697e7867e927fdbf7472787f2003c4612 Author: Andrew Bayer Date: Tue Mar 13 08:41:13 2012 -0700 CDH-4642. Ensure MR1 uses the same Avro version as other components. commit c0075b2a0de23e41a3ae600f5fdd5e9c181c4c15 Author: Tom White Date: Tue Dec 27 14:46:52 2011 -0800 MAPREDUCE-3607. Port missing new API mapreduce lib classes to 1.x. Reason: Compatibility Author: Tom White Ref: CDH-4006 commit 8344475eab5321d26ed51056551e9429e3c7a473 Author: Harsh J Date: Wed Mar 14 03:04:43 2012 +0530 MAPREDUCE-4001. Improve MAPREDUCE-3789's fix logic by looking at job's slot demands instead. Reason: Customer request Author: Harsh J Ref: CDH-4276 commit 2bd2aaf3b65bdd46994d076c432914cf2fba59ca Author: Alejandro Abdelnur Date: Mon Mar 12 12:54:18 2012 -0700 MAPREDUCE-3974 TestSubmitJob in MR1 tests doesn't compile after HDFS-1623 merge (atm) Reason: backport required for HDFS HA Author: Alejandro Abdelnur Ref: CDH-4821 commit 63d82bf64c64693a46a9ab9c89967b4887ee438b Author: Roman Shaposhnik Date: Mon Mar 12 11:12:03 2012 -0700 CLOUDERA-BUILD. Updating versions to 0.23.1. commit 0bf1a23137e5c55e6e7e7b5aa56e831aed252c5a Author: Harsh J Date: Sun Mar 11 22:44:46 2012 +0530 MAPREDUCE-1109. ConcurrentModificationException in jobtracker.jsp Reason: Customer request Author: Harsh J Ref: CDH-4527 commit 2e2b07744a16f01dffb3df64c90537fd1d02fa61 Author: Tom White Date: Mon Mar 5 17:06:13 2012 -0800 MAPREDUCE-157. Add configuration to control maximum age of job history files (mapreduce.jobhistory.max-age-ms). Reason: Improvement Author: Jothi Padmanabhan and Tom White Ref: CDH-3641 commit 179f7e2a9f73b02eece97d1fc8be5271e54ab405 Author: Tom White Date: Tue Mar 6 14:25:11 2012 -0800 MAPREDUCE-3997. jobhistory.jsp cuts off the job name at the first underscore of the job name Reason: Bug (customer request) Author: Tom White Ref: CDH-4408 commit b1266bc5b349051737e3ff4b3fd7da747172232b Author: Owen O'Malley Date: Fri Mar 4 04:50:13 2011 +0000 commit 98d8043ef96dd9a67e23852b5e5caf8f0a4589db Author: Krishna Ramachandran Date: Thu Oct 21 12:21:50 2010 -0700 . Delete PrintWriter using iterator to fix java.util.ConcurrentModificationException (dking) git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-patches@1077739 13f79535-47bb-0310-9956-ffa450edef68 commit f855c4c0ea0e2bf5077a6ce805356120b3208b81 Author: Tom White Date: Fri Feb 24 13:02:00 2012 -0800 MAPREDUCE-3697. Hadoop Counters API limits Oozie's working across different hadoop versions Reason: Improve API compatibility Author: Mahadev Konar Ref: CDH-4592 commit daa4b265ad866e011e3ed5bff135ec50842b8e63 Author: Andrew Bayer Date: Thu Feb 23 16:26:38 2012 -0800 CDH-4480 - removing redundant Ivy entry for hadoop-commons. commit 9f31388ae87b9cd04937a07b5c7ce0bf46d7cfb8 Author: Alejandro Abdelnur Date: Wed Feb 15 13:16:16 2012 -0800 MAPREDUCE-3789 CapacityTaskScheduler may perform unnecessary reservations in heterogenous tracker environments Reason: Customer impacted by this bug Author: Harsh Chouraria Ref: CDH-4276 commit f67e78d825de0125a3a4b03b1e635d549bce23fc Author: Alejandro Abdelnur Date: Tue Feb 14 16:32:16 2012 -0800 CLOUDERA BUILD. hadoop-streaming has wrong version Author: Alejandro Abdelnur Ref: CDH-4503 commit ba8b335e19c0f3b2117d0b8ef8ce120b95ecb82c Author: Alejandro Abdelnur Date: Tue Feb 14 15:45:41 2012 -0800 CLOUDERA BUILD. Create hadoop-client and hadoop-minicluster artifacts for downstream projects Reason: backport for easier build/test for downstream projects Author: Alejandro Abdelnur Ref: CDH-4502 commit 4ab03c692c0dbe469811b37a38c44f0242ad2d33 Author: Andrew Bayer Date: Mon Feb 6 20:05:28 2012 -0800 CLOUDERA-BUILD. Fixing handling of test artifact dependencies. commit f0bb853f85ec31a3e18af74470d519373e2ff14f Author: Andrew Bayer Date: Mon Feb 6 12:05:36 2012 -0800 Prepping for CDH4b2 development. commit 6a8bab52212c72805a277985b1d4c98bd64bc69b Author: Roman Shaposhnik Date: Mon Feb 6 10:37:41 2012 -0800 CLOUDERA-BUILD. Should not be able to start HDFS from the MR1 tarball (CDH-4376) commit 9d577fb0aee4fa2bc31208f2876cff12e8c1045b Author: Tom White Date: Thu Feb 2 13:48:35 2012 -0800 MAPREDUCE-3639. TokenCache likely broken for FileSystems which don't issue delegation tokens Reason: Bug Author: Tom White Ref: CDH-4361 commit 4634ac9b23243ead8feda26cd32b10561081381d Author: Tom White Date: Tue Jan 31 15:27:14 2012 -0800 MAPREDUCE-3749. ConcurrentModificationException in counter groups Reason: Bug Author: Tom White Ref: CDH-4301 commit 3901e4f62ff610af80fd3aba71555b21fb7cadb9 Author: Tom White Date: Fri Jan 27 10:01:40 2012 -0800 CLOUDERA-BUILD. Add missing constructors that take a TaskType in TaskAttemptID and TaskID. commit 7462d7d84f2cc3f36c371227bead854bd5fdb7d5 Author: Tom White Date: Wed Jan 25 11:53:26 2012 -0800 MAPREDUCE-3138. Allow for applications to deal with MAPREDUCE-954 Author: Owen O'Malley Reason: Support 0.23 API in MR1 Ref: CDH-4264 commit ba023e7549e273b5f58065d371844f891d8982dd Author: Tom White Date: Wed Jan 25 11:00:42 2012 -0800 MAPREDUCE-3563. LocalJobRunner doesn't handle Jobs using o.a.h.mapreduce.OutputCommitter Author: Arun C Murthy Reason: Support 0.23 API in MR1 Ref: CDH-4263 commit dcbd59874fab8cf68bf058e1a4a063b644ae8c4c Author: Tom White Date: Tue Jan 24 12:05:51 2012 -0800 CLOUDERA-BUILD. Exclude Gridmix unit tests. commit c3487c6a9f79b3e184d567a9ef60ccf05e95be64 Author: Tom White Date: Mon Jan 23 11:28:52 2012 -0800 CLOUDERA-BUILD. Fix TestAuditLogger, TestSubmitJob and TestWebUIAuthorization. commit 6aa615f878b621ae7ee42cb86adc455c2518b4a3 Author: Tom White Date: Fri Jan 13 14:10:30 2012 -0800 CLOUDERA-BUILD. Fix TestMultipleCachefiles, TestSymLink, and TestStreamingStatus. Author: Tom White Ref: CDH-4128 commit 3fa92bbb45fee92517fc63fbbbed29861f38408c Author: Alejandro Abdelnur Date: Wed Jan 11 11:42:31 2012 -0800 CLOUDERA BUILD. MR1 contrib testcase are not compiling Author: Alejandro Abdelnur Ref: CDH-4082 commit 92372f57aaeb1fb4994d754d17356d675f5e91c7 Author: Alejandro Abdelnur Date: Mon Jan 9 14:35:00 2012 -0800 CLOUDERA BUILD. fixing MR1 TestSubmitJob failure. Author: Alejandro Abdelnur Ref: CDH-4066 commit af7d5cd7eb3efb93316924b81d4251f4392267bb Author: Alejandro Abdelnur Date: Mon Jan 9 13:46:32 2012 -0800 CLOUDERA BUILD. Fixing MR1 testcases failures. Reason: Due to changes in hadoop-common 0.23. Author: Alejandro Abdelnur Ref: CDH-3997 commit 3a0c8312fd650bdf09fc8186527c663357120193 Author: Tom White Date: Fri Dec 30 13:28:40 2011 -0800 CLOUDERA-BUILD. Fix TestMapReduceLocal, TestJobTrackerXmlJsp, TestMRMultipleOutputs, and remove failing test TestFileSystem.testCommandFormat since the code has changed and the test is no longer appropriate. commit 8d46f8b003414c4cb0d16ed8662f909501dcf8e7 Author: Roman Shaposhnik Date: Thu Dec 29 14:44:19 2011 -0800 CLOUDERA-BUILD. changing MR1 version to 0.23.0-mr1-cdh4b1-SNAPSHOT commit c168a0e4512762c79987ca99a35fd69b58235976 Author: Andrew Bayer Date: Wed Dec 28 13:24:42 2011 -0800 Updating for cdh4b1. commit d831588e9a02ed3693e943f0f428dcec31e3cd9a Author: Tom White Date: Wed Dec 28 10:50:26 2011 -0800 CLOUDERA-BUILD. Support TaskID.getTaskType(). commit b42570d00009d7cbffb05b4c6910d422aca2290e Author: Andrew Bayer Date: Thu Dec 22 11:49:46 2011 -0800 Excluding commons-daemon from dependencies due to bad POM. commit 481bdd70c20df414e081508c184212fc760d03d8 Author: Andrew Bayer Date: Thu Dec 22 11:27:05 2011 -0800 Handle commons-daemon POM inconsistency. commit c898178749f6c748eb4d7f676f162b0e0de71ffa Author: Andrew Bayer Date: Thu Dec 22 11:13:58 2011 -0800 CLOUDERA-BUILD. Further tweaking Ivy resolver logic for MR1. commit 373964c0ccdc6a119bb29306c22c14446a8990d9 Author: Andrew Bayer Date: Thu Dec 22 10:58:29 2011 -0800 CLOUDERA-BUILD. Adding cloudera-snapshots to ivy resolvers. commit 296be444d7bbf134c9d3dc64e432bcea9341e63e Author: Roman Shaposhnik Date: Wed Dec 21 21:35:39 2011 -0800 CLOUDERA-BUILD. Hooking up to the correct top-level pom and starting to use version properties for CDH4 commit cd71b603fc35ffc3543cbaf8430ac50fe220917f Author: Roman Shaposhnik Date: Wed Dec 21 21:06:28 2011 -0800 CLOUDERA-BUILD. Disabling a Counter test commit eb28c2e4f16c7ed446f479a5bf87ef798e6c79fa Author: Andrew Bayer Date: Tue Dec 20 15:17:52 2011 -0800 CLOUDERA-BUILD. Switching functional POM versioning. commit 3170c19205e8fa270e9794ea4e6b860aa50c80cb Author: Roman Shaposhnik Date: Fri Dec 16 15:07:15 2011 -0800 CLOUDERA-BUILD. Disabling missing artifacts in yet another profile commit f04ce3e6db2bbe727e012fe700de8d3c6551c6cb Author: Roman Shaposhnik Date: Thu Dec 15 19:44:44 2011 -0800 CLOUDERA-BUILD. Making MR1 deployable to Maven repo commit 84b257f002bebd66e7cb76ae828241ac67415171 Author: Roman Shaposhnik Date: Wed Dec 14 14:43:23 2011 -0800 CLOUDERA-BUILD. Packaging work commit ca9330e3eadb11cb534ee6a25484e966fa3b1718 Author: Tom White Date: Tue Dec 20 13:49:53 2011 -0800 MAPREDUCE-2531. org.apache.hadoop.mapred.jobcontrol.getAssignedJobID throw class cast exception. Author: Robert Joseph Evans Reason: Support 0.23 API in MR1 Ref: CDH-3970 commit 42282403d05a26ed6bd5af8f327ca2e4409cc71a Author: Tom White Date: Tue Dec 20 12:32:22 2011 -0800 MAPREDUCE-368. Change org.apache.hadoop.mapred.jobcontrol to use new api Author: Amareshwari Sriramadasu Reason: Support 0.23 API in MR1 Ref: CDH-3970 commit 3710654cb51fc4009e6ba18d2d5f1931151eee74 Author: Tom White Date: Mon Dec 19 10:07:46 2011 -0800 CLOUDERA-BUILD. Disable MRUnit build. commit ce6b01705281f5072100b6812cce857befc45de8 Author: Tom White Date: Fri Dec 16 11:25:15 2011 -0800 MAPREDUCE-3542. Support FileSystemCounter legacy counter group name for compatibility. Author: Tom White Reason: Support 0.23 API in MR1 Ref: CDH-3861 commit bc76ecd932df8c5c97689bb2585e270d3abcb61d Author: Tom White Date: Fri Dec 16 11:20:04 2011 -0800 MAPREDUCE-3433. Finding counters by legacy group name returns empty counters. Author: Tom White Reason: Support 0.23 API in MR1 Ref: CDH-3861 commit b5eaf8099d1ea3d3900fd9ae0c7ced0cd5e46b92 Author: Tom White Date: Thu Dec 15 16:05:36 2011 -0800 CLOUDERA_BUILD. Revert "MAPREDUCE-1943. Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes" This reverts commit e3f8dc3926d119ce3b765325114e5b8ada01120f. commit b2eddc8bb41f34042da9c6ebeb08a4d3bc7bb655 Author: Tom White Date: Thu Dec 15 15:58:46 2011 -0800 MAPREDUCE-901. Move Framework Counters into a TaskMetric structure. Author: Luke Lu Reason: Support 0.23 API in MR1 Ref: CDH-3861 commit 64999d5aebf13ee81e917b4ccacc9047964dfaab Author: Tom White Date: Mon Dec 12 16:15:12 2011 -0800 MAPREDUCE-954. The new interface's Context objects should be interfaces. Author: Arun C Murthy Reason: Support 0.23 API in MR1 Ref: CDH-3861 commit 6ae7a107a7213b31b60ddde25cc5be071246da87 Author: Tom White Date: Mon Dec 12 13:35:52 2011 -0800 MAPREDUCE-2455. Remove deprecated JobTracker.State in favour of JobTrackerStatus. Author: Tom White Reason: Support 0.23 API in MR1 Ref: CDH-3861 commit 9e291fd155461c6c303f1c2b47ff49a838d7efe5 Author: Tom White Date: Fri Dec 9 14:59:00 2011 -0800 CLOUDERA-BUILD. Re-instate mvn-install to install files locally for testing. commit c7b6a473d96cef7cb1faa577b937f9600f8a46fb Author: Tom White Date: Mon Dec 5 09:21:37 2011 -0800 CLOUDERA-BUILD. Remove usage of ChecksumDistributedFileSystem in tests. commit df16ab6f2ca09cdfe2b5d28d3039a8d3201acb0e Author: Tom White Date: Fri Dec 2 17:46:52 2011 -0800 CLOUDERA-BUILD. Remove smoke tests. commit 31674a77337b553348d3280ddc9ac1ae20276d85 Author: Tom White Date: Fri Dec 2 17:19:15 2011 -0800 MAPREDUCE-895. FileSystem::ListStatus will now throw FileNotFoundException, MapRed needs updated. Author: Jakob Homan. Reason: Support 0.23 API in MR1 Ref: CDH-3861 commit de1a909897e0aa745b2074d0449bd6ca6747e0f6 Author: Tom White Date: Fri Dec 2 16:48:03 2011 -0800 Fix to allow tar build. commit 64bd9978a192c3154d16837ab1aed1cf338daeef Author: Todd Lipcon Date: Fri Dec 2 15:27:56 2011 -0800 Code changes necessary for compatibility with 0.23 common/HDFS APIs commit 1ef41133675413f1839e4c5a4106e45c453b3919 Author: Todd Lipcon Date: Fri Dec 2 15:25:44 2011 -0800 Add files into mapred tree that are part of Common/HDFS 0.20 but no longer in 0.23. commit caba5845452274f934c0c586a1bba56ee7c411da Author: Todd Lipcon Date: Fri Dec 2 15:24:30 2011 -0800 Delete HDFS and Common, HDFSProxy contrib, and non-MR test cases commit fcb7d3f01f3575acbd53e1583a2f8ccb1d7021e9 Author: Todd Lipcon Date: Fri Dec 2 15:23:42 2011 -0800 Add ivy templates. Are these actually used? Maybe not in CDH build. TODO commit bfcf4a55b3d47e7529a6cbdff77f036c3c5bb648 Author: Todd Lipcon Date: Fri Dec 2 15:10:50 2011 -0800 Build changes commit c9e37a4c57a325228ecb3d333dba302ac2098e2f Author: Andrew Bayer Date: Tue Dec 20 10:15:26 2011 -0800 Updating for CDH3u3 release. commit a04c9e2a6bd20dcb50e7242b1c0fb35e5614d1cc Author: Eli Collins Date: Sun Dec 18 14:05:30 2011 -0800 HDFS-2702. A single failed name dir can cause the NN to exit. There's a bug in FSEditLog#rollEditLog which results in the NN process exiting if a single name dir has failed. Reason: Bug Author: Eli Collins Ref: CDH-3921 commit fcab1c7f36866fdc09cb9939ff8786d690502b81 Author: Eli Collins Date: Sun Dec 18 14:00:58 2011 -0800 HDFS-2703. removedStorageDirs is not updated everywhere we remove a storage dir. There are a number of places (FSEditLog#open, purgeEditLog, and rollEditLog) where we remove a storage directory but don't add it to the removedStorageDirs list. This means a storage dir may have been removed but we don't see it in the log or Web UI. Reason: Bug Author: Eli Collins Ref: CDH-3921 commit 09a73c40213864289256e8b7e749eef3f7caa778 Author: Eli Collins Date: Sun Dec 18 13:52:48 2011 -0800 HDFS-2701. Cleanup FS* processIOError methods. Let's rename the various "processIOError" methods to be more descriptive. The current code makes it difficult to identify and reason about bug fixes. While we're at it let's remove "Fatal" from the "Unable to sync the edit log" log since it's not actually a fatal error (this is confusing to users). And 2NN "Checkpoint done" should be info, not a warning (also confusing to users). Reason: Improvement Author: Eli Collins Ref: CDH-3921 commit d0b1764daa4a3ca527d6ea553c009e46def2aeee Author: Eli Collins Date: Sun Dec 18 13:28:46 2011 -0800 Ammend HADOOP-4885. HADOOP-4885 erroneously removed the call to processIOError in rollEditLog. Without it the NN process will not exit if there are no valid edit streams. GetImageServlet will NPE but the NN keeps running, accepting modifications it will not be able to persist. Reason: Bug Author: Eli Collins Ref: CDH-3921 commit 9817a9bd9bf215c4f66268e8d5e9f87cd8a417b4 Author: Eli Collins Date: Sun Dec 18 13:28:21 2011 -0800 Ammend HADOOP-4885. Cleanup. Author: Eli Collins Ref: CDH-3921 commit 60a51e89b21e9577519b447d3779acc8b83ce990 Author: Harsh J Date: Wed Nov 23 14:09:55 2011 +0530 CLOUDERA-BUILD. Log jsvc output to HADOOP_LOG_DIR instead of /tmp. Description: JSVC output currently goes to /tmp and is not configurable. Reason: Customer request Author: Harsh J Ref: CDH-3832 commit 3d21fccdb5d67426a05655377e5dc0b926479673 Author: Eli Collins Date: Tue Dec 13 18:46:10 2011 -0800 CLOUDERA-BUILD. Update url for searching generated docs. commit 164e84c499699ada743fdc5f39dce212f3e742fc Author: Philip Zeyliger Date: Sun Dec 4 21:31:52 2011 -0800 CLOUDERA-BUILD. Add deprecated alias to removed TransferFsImage method. The fix for HDFS-2305/CDH-2761 changed the method signature of a static method in TransferFsImage. This adds back the old method signature to allow for compatibility with plug-ins that wish to use it with CDH3. Reason: Compatibility Author: Philip Zeyliger Ref: OPSAPS-5105, CDH-2761 commit 7a636fd4deeb1fbe8f73dd1cab15c180e996ef8a Author: Eli Collins Date: Fri Dec 9 16:55:46 2011 -0800 HDFS-2654. Make BlockReaderLocal not extend RemoteBlockReader2. Reason: Performance Author: Eli Collins Ref: CDH-3850 commit 2b54843a4f2fb1d85c4a2cc4a4ad981f961e6a77 Author: Eli Collins Date: Fri Dec 9 20:25:09 2011 -0800 HDFS-2653. DFSClient should cache whether addrs are non-local when short-circuiting is enabled. Reason: Performance Author: Eli Collins Ref: CDH-3850 commit c955c99664e9902542ce4a0787ffac909a22a69c Author: Eli Collins Date: Thu Dec 8 17:07:29 2011 -0800 HDFS-2246. Shortcut a local client reads to a Datanodes files directly. Reason: Performance Author: Jitendra Nath Pandey, Eli Collins Ref: CDH-3850 commit ae79e854209b90fcf5e574a8c927cde3bdd3eb9c Author: Jonathan Hsieh Date: Tue Dec 6 08:08:25 2011 -0800 HADOOP-6886 LocalFileSystem Needs createNonRecursive API Reason: Bug (HBase data loss) Author: Jitendra Nath Pandey and Nicholas Spiegelberg Ref: CDH-3816 commit fa023cef12584d6f38f17b05ea95445eb187cb9e Author: Jonathan Hsieh Date: Fri Dec 2 17:15:06 2011 -0800 HADOOP-7879 DistributedFileSystem#createNonRecursive should also incrementWriteOps statistics. Reason: Bug (hbase data loss) Author: Jonathan Hsieh Ref: CDH-3798 commit dc1f32f60531bc146d6c42f473d5ba6a0b59fff3 Author: Jonathan Hsieh Date: Wed Nov 30 08:54:06 2011 -0800 HADOOP-7870 Fix recursive create. Reason: Bug (hbase data loss) Author: Jonathan Hsieh Ref: CDH-3798 commit 9934440df63c790387a54b212aacad4ee12a9dc9 Author: Jonathan Hsieh Date: Mon Nov 28 10:19:56 2011 -0800 HADOOP-6840 Support non-recursive create() in FileSystem and SequenceFile.Writer Reason: Bug (hbase data loss) Author: Nicolas Spiegelberg and Jitendra Nath Pandey Ref: CDH-3815 commit 2d52a4c5814814bf0a95c71906d2b4efcc1aa755 Author: Jonathan Hsieh Date: Mon Nov 28 10:50:31 2011 -0800 HDFS-617 Support for non-recursive create() in HDFS This backport adds CDH-specific backwards-compatibility handling of non-recursive file create() Reason: Bug (hbase data loss) Author: Kan Zhang Ref: CDH-3815 commit 356e443236ce15100943cfeabc9001b1d26bc77c Author: Alejandro Abdelnur Date: Fri Dec 9 13:46:10 2011 -0800 HADOOP-7902 skipping name rules setting (if already set) should be done on UGI initialization only Fixes regression introduced by HADOOP-7887 Author: Alejandro Abdelnur Ref: CDH-3898 commit b9bc59e6d6f024d69f7cbee65b50fc0d5f99ead4 Author: Alejandro Abdelnur Date: Wed Dec 7 15:38:31 2011 -0800 HADOOP-7887 KerberosAuthenticatorHandler is not setting KerberosName name rules from configuration Author: Alejandro Abdelnur Ref: CDH-3890 commit 25d2544f138b41918c7861880c4dc61988b808b0 Author: Eli Collins Date: Mon Dec 5 16:16:31 2011 -0800 HDFS-2638. Improve a block recovery log. It would be useful to know whether an attempt to recover a block is failing because the block was already recovered (has a new GS) or the block is missing. Reason: Debugging Author: Eli Collins Ref: CDH-3888 commit df2ba14900cd081fa87eec9aedf3f75eca9c2885 Author: Eli Collins Date: Tue Dec 6 10:54:51 2011 -0800 HDFS-2637. The rpc timeout for block recovery is too low. The RPC timeout for block recovery does not take into account that it issues multiple RPCs itself. This can cause recovery to fail if the network is congested or DNs are busy. Reason: Bug Author: Eli Collins Ref: CDH-3834 commit 8b29bcbea6d6886e84d6dfced7179439f219543e Author: Eli Collins Date: Sun Nov 27 21:57:11 2011 -0800 HDFS-854. Datanode should scan devices in parallel to generate block report. A Datanode should scan its disk devices in parallel so that the time to generate a block report is reduced. This will reduce the startup time of a cluster. Author: Dmytro Molkov, Eli Collins Ref: CDH-3853 commit caa7d7ddd79b0775edb780c6a9cbf71d23bc4a99 Author: Alejandro Abdelnur Date: Tue Nov 29 11:49:09 2011 -0800 HADOOP-7853 multiple javax security configurations cause conflicts. Reason: Hadoop-auth initialization and UGI initialization issues due to global config. Author: Daryn Sharp Ref: CDH-3865 commit ade43c9a6eee7eec2f237edcadb370f07c72a176 Author: Eli Collins Date: Mon Nov 28 10:47:17 2011 -0800 CLOUDERA-BUILD. Remove external guava dependency. Remove the guava-r09 external dependency by rebasing the r09 jar on o.a.h.thirdparty.guava and bundling the rebased jar in our lib dir. Author: Eli Collins Ref: CDH-3833 commit aaeecc5050d10ff5788cea98de3004e72a0c3a3c Author: Eli Collins Date: Sun Nov 27 18:20:29 2011 -0800 CLOUDERA-BUILD. Update eclipse classpath. commit 99b2072558bb79eea211c1965a6c896750c850ea Author: Eli Collins Date: Sun Nov 27 10:23:23 2011 -0800 Ammend MAPREDUCE-3015. Rename TaskTrackerStatus#getTaskFailures. Reason: JobTracker plugin compatibility Author: Eli Collins Ref: CDH-3307 commit 12c64dfa5a3548776a039c835c5e5c7aa844a2f5 Author: Eli Collins Date: Fri Nov 25 23:45:41 2011 -0800 MAPREDUCE-2413. TT should handle disk failures by reinitializing itself. MAPREDUCE-2928. MR-2413 improvements. MAPREDUCE-2957. The TT should not re-init if it has no good local dirs. MAPREDUCE-2850. Add test for MAPREDUCE-2413. MAPREDUCE-3395. Add mapred.disk.healthChecker.interval to mapred-default.xml. MAPREDUCE-2415. Distribute the user task logs on to multiple disks. MAPREDUCE-3424. MR-2415 cleanup. MAPREDUCE-3015. Add local dir failure info to metrics and the web UI. MAPREDUCE-3419. Don't mark exited TT threads as dead in MiniMRCluster. Author: Ravi Gummadi, Bharath Mundlapudi, Eli Collins Ref: CDH-3307 commit 84bbaaca1521811875b9926d98658259421be1f6 Author: Eli Collins Date: Sat Nov 19 18:59:15 2011 -0800 HDFS-2541. For a sufficiently large value of blocks, the DN Scanner may request a random number with a negative seed value. Author: Harsh J Ref: CDH-3803 commit 898431eb9641a2932a09ff1373f478b0f77075d5 Author: Todd Lipcon Date: Tue Nov 22 11:44:28 2011 -0800 MAPREDUCE-2905. Fix fair scheduler to prevent clumping of tasks when assignmultiple is enabled. Reason: spread load more evenly on clusters with many slots Author: Todd Lipcon and Jeff Bean Ref: CDH-3509 commit 1944559432d8af46f68feb041969bd26b2f950b8 Author: Todd Lipcon Date: Tue Nov 15 14:20:09 2011 -0800 MAPREDUCE-936. Allow a load difference in fairshare scheduler Improves throughput of task scheduling in the scheduler, by allowing some nodes to have more tasks scheduled than others while scheduling is happening. Reason: Backporting in advance of MAPREDUCE-2905, which depends on this patch. Author: Zheng Shao Ref: CDH-3509 commit 62916ff723c390df008abce1b0fd80cccdbc4105 Author: Eli Collins Date: Sun Nov 20 15:39:23 2011 -0800 HADOOP-7457. Remove out-of-date Chinese language documentation. Author: Jakob Homan Ref: CDH-3842 commit abe5bf1c3ef82f90c356695fc16eb615293a56df Author: Eli Collins Date: Sat Nov 19 23:15:06 2011 -0800 MAPREDUCE-2555. Avoid spurious logging from completed tasks. Author: Thomas Graves Ref: CDH-3855 commit a390e45180fbaaf4a3ea84d97ec0316c182ec8c9 Author: Eli Collins Date: Sun Nov 20 11:22:14 2011 -0800 HADOOP-6614. RunJar should provide more diags when it can't create a temp file. Author: Jonathan Hsieh Ref: CDH-3841 commit e732b3c97a9232182d3a28917ca3f9006f968ac9 Author: Eli Collins Date: Fri Nov 18 15:45:26 2011 -0800 MAPREDUCE-3343. TaskTracker Out of Memory because of distributed cache. This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. Author: Zhao Yunjiong Ref: CDH-3798 commit 371e8d5b2a38f743adecbe146b7c7e77683568c8 Author: Todd Lipcon Date: Wed Nov 16 12:33:47 2011 -0800 HADOOP-7761. Improve performance of raw comparisons. Reason: low risk performance improvement Author: Todd Lipcon Ref: CDH-3822 commit 8a120e1edaf1a833708003af806d7929f84a08be Author: Todd Lipcon Date: Wed Nov 16 12:15:14 2011 -0800 HDFS-2379. Allow block reports to proceed without holding FSDataset lock Reason: fix timeouts talking to DNs on datanodes with lots of blocks Author: Todd Lipcon Ref: CDH-3823 commit 2818706e4a48344eefb3f840fc70d8f3f34a381e Author: Eli Collins Date: Thu Nov 17 08:44:19 2011 -0800 Ammend MAPREDUCE-2777. Remove TestTTMemoryReporting. commit e691c9ae61baee4920ac8dd8535bb75503339bd8 Author: Eli Collins Date: Wed Nov 16 16:30:55 2011 -0800 MAPREDUCE-2777. Adds cumulative cpu usage and total heap usage to task counters. Backport MAPREDUCE-220 and MAPREDUCE-2469. MAPREDUCE-220. Collecting cpu and memory usage for MapReduce tasks. It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. MAPREDUCE-2469. Task counters should also report the total heap usage of the task. Currently, the task counters report VSS and RSS usage of the task. The task counter should also report the total heap usage of the task also. The task might be configured with a max heap size of M but the task's total heap usage might only be H, where H < M. In such a case, knowing only M doesn't provide a complete picture of the task's memory usage. Author: Scott Chen, Amar Kamat Ref: CDH-1458 commit ea81df1ff5de3b12df08e22eeffca29295914eae Author: Roman Shaposhnik Date: Wed Nov 16 17:22:21 2011 -0800 BIGTOP-261. pseudo distributed config would benefit from dfs.safemode.extension set to 0 and dfs.safemode.min.datanodes set to 1 Reason: Improvement Author: Roman Shaposhnik Ref: DISTRO-330 commit b45a9f40b02b5d5859c389bfa7b17df94317614a Author: Todd Lipcon Date: Fri Nov 11 14:40:01 2011 -0800 MAPREDUCE-3289. Use fadvise in the TaskTracker's MapOutputServlet. The TaskTracker now uses the posix_fadvise syscall to page in map output before serving it to the reducers. After serving the output, it evicts it from the buffer cache since it will not be read again in the majority of cases. This new behavior can be disabled by setting mapred.tasktracker.shuffle.fadvise to false. This patch differs from the upstream version since the upstream version applies to the NodeManager in MR2. Reason: Low-risk performance improvement Author: Todd Lipcon Ref: CDH-3818 commit 139923b6c91849e59e4288f65c245f7a71cecc22 Author: Todd Lipcon Date: Mon Nov 14 15:43:23 2011 -0800 MAPREDUCE-3184. Add a thread to the TaskTracker which monitors for spinning Jetty selector threads, and shuts down the daemon when one is detected. Reason: detect common JVM/Jetty bug and cause the TT to suicide, minimizing impact on running jobs Author: Todd Lipcon Ref: CDH-2785 commit 45504c489dfa7255be40ecf2f2a7a8c60cece01b Author: Todd Lipcon Date: Wed Oct 26 17:57:22 2011 -0700 HDFS-2267. DataXceiver thread name incorrect while waiting on op during keepalive. Reason: trivial bug fix for thread names after HDFS-941 Author: Todd Lipcon Ref: CDH-3777 commit 7ed1f860044cef4a43b1eceadf2d0a5e8f11c174 Author: Todd Lipcon Date: Wed Oct 26 16:53:27 2011 -0700 HDFS-941. Reuse connections between client and DN Also incorporates HDFS-2071. Use of isConnected() in DataXceiver is invalid Reason: big performance improvement for random-read workloads Author: bc Wong and Todd Lipcon Ref: CDH-3777 commit e7c439b8842ae77d185c85d56ce294f8c6467507 Author: Todd Lipcon Date: Wed Apr 21 23:03:27 2010 -0700 HDFS-1001. DataXceiver and BlockReader disagree on when to send/recv CHECKSUM_OK Reason: This patch is necessary for backport of HDFS-941 (socket reuse). Author: bc Wong Ref: CDH-3777 commit 11a4341b9a7cddce9f86b1a47219c64a09cd0c8f Author: Todd Lipcon Date: Fri Nov 11 14:39:43 2011 -0800 HDFS-2465. Add HDFS support for fadvise readahead and drop-behind. The DataNode now can pass IO advice down to the operating system to improve performance. The new behavior defaults off and can be enabled with the following configs: - dfs.datanode.readahead.bytes (number of bytes to readahead) - dfs.datanode.drop.cache.behind.writes (boolean) - dfs.datanode.sync.behind.writes (boolean) - dfs.datanode.drop.cache.behind.reads (boolean) Reason: low-risk performance improvements Author: Todd Lipcon Ref: CDH-3818 commit d23d17a52da7e94deac6d720e7d62706f00ee6f8 Author: Todd Lipcon Date: Fri Nov 11 14:38:40 2011 -0800 HADOOP-7753. Support fadvise and sync_file_range in NativeIO. Add ReadaheadPool infrastructure for use in HDFS and MR. Reason: low-risk performance improvement Author: Todd Lipcon Ref: CDH-3818 commit a888190fdc5b6356dd5325073b3120c6dfc66288 Author: Todd Lipcon Date: Wed Oct 26 21:14:35 2011 -0700 MAPREDUCE-3278. Fix a busy loop in ReduceTask that would cause 100% cpu utilization during the fetch phase. Previously, if the number of fetch threads in the reducer exceeded the number of unique hosts on which map outputs were available, the reducer would spin in a tight loop waiting for fetches to complete. This adds a proper wait/notify to avoid wasting CPU. Author: Todd Lipcon Reason: low risk performance improvement Ref: CDH-3817 commit 6ef50ae8cebf1b974e3fd4da6c18e2b7ff12dc52 Author: Ahmed Radwan Date: Fri Nov 11 15:38:00 2011 -0800 HDFS-94. The "Heap Size" in HDFS web ui may not be accurate. Reason: Bug Author: Dmytro Molkov Ref: CDH-3681 commit f2d1ba670c2ede6b02f4ee86ca57fed7cedfc38f Author: Andrew Bayer Date: Wed Nov 9 09:47:23 2011 -0800 CLOUDERA-BUILD. Setting default for IVY_MIRROR_PROP. commit fcfa442e85a7f3f107b2d4ea71b5f362b9fd3f99 Author: Andrew Bayer Date: Wed Nov 2 13:11:26 2011 -0700 CLOUDERA-BUILD. Adding IVY_MIRROR_PROPS to ant calls. This will allow overriding the URLs Ivy uses for Maven repositories, so that we can have internal builds take advantage of our internal Maven mirror. commit 7b21fe4cdbd166feb49af7b0a5c266010b8cc1aa Author: Andrew Bayer Date: Thu Oct 20 09:23:21 2011 -0700 Updating for CDH3u3 development commit 95a824e4005b2a94fe1c11f1ef9db4c672ba43cb Author: Roman Shaposhnik Date: Tue Oct 11 18:00:22 2011 -0700 CLOUDERA-BUILD. hadoop-0.20 package should not ship a cloudera folder commit 5fc6261a4c399bcb75bcc7cadf6cdd74f9362bcd Author: Aaron T. Myers Date: Mon Oct 10 18:02:41 2011 -0700 HDFS-2422. The NN should tolerate the same number of low-resource volumes as failed volumes Reason: Bug Author: Aaron T. Myers Ref: CDH-3684 commit 207f93e5bb26dab8f38ac0f4e0740f9ac1910791 Author: Andrew Bayer Date: Fri Sep 30 09:46:56 2011 -0700 Prep for CDH3u2 release commit 4f2ec73b8231ee5c7f4b1423f5a5dd895386fd2c Author: Todd Lipcon Date: Wed Sep 28 03:51:39 2011 -0700 MAPREDUCE-2980. Use patched jetty to avoid fetch failures and other HTTP-related issues This changes the jetty build to the following tag: https://github.com/toddlipcon/jetty-hadoop-fix/tree/6.1.26.cloudera.1 This tag was built by taking Jetty 6.1.26, then merging the NIO selector code from the Jetty "6.1.22z6" branch, provided by Greg Wilkins. In cluster testing, it resolves many HTTP-related issues. Reason: avoid high fetch failure rate, causing production issues Author: Todd Lipcon Ref: CDH-2785 commit 4c8f9e91fde59526b729c56276c5649e58fa10fb Author: Todd Lipcon Date: Tue Sep 27 20:41:15 2011 -0700 HDFS-2332. Add test for HADOOP-7629 (using an immutable FsPermission object as an RPC parameter fails). Author: Todd Lipcon Reason: unit test corresponding to other backport Ref: CDH-3568 commit 864b534240e5837629c1419893aa75346e66bf64 Author: Todd Lipcon Date: Tue Sep 27 20:40:18 2011 -0700 HADOOP-7629. Allow immutable FsPermission objects to be used as IPC parameters. Author: Todd Lipcon Reason: necessary for Mahout tests to pass Ref: CDH-3568 commit ba8f83f3ff5877e2f4039dc0c02cb26fc4f45852 Author: Harsh J Date: Fri Sep 23 17:19:58 2011 +0530 MAPREDUCE-2932. Missing instrumentation plugin class shouldn't crash the TT startup Missing instrumentation plugin class shouldn't crash the TT startup per design, and should fallback to default instead. Reason: Improvement Author: Harsh J Ref: CDH-3533 commit 3b6931e2d9882c8b4aa83436fefa6d2bb36973c8 Author: Aaron T. Myers Date: Thu Sep 22 18:47:02 2011 -0700 HADOOP-7674. TestKerberosName fails in 20 branch. Reason: Bug Author: Jitendra Nath Pandey Ref: CDH-3632 commit a54b6aa06f2b7e22691455c043cf9535fb51d703 Author: Aaron T. Myers Date: Thu Sep 22 18:41:54 2011 -0700 Amend HADOOP-7119. Add in duplicate test TestKerberosName Reason: Didn't commit this test originally since it would have always failed. Author: Alejandro Abdelnur Ref: CDH-3558 commit 15d9970fde48893b17320bf6596b39f4c512b0f2 Author: Aaron T. Myers Date: Thu Sep 22 18:27:40 2011 -0700 HADOOP-7645. HTTP auth tests requiring Kerberos infrastructure are not disabled on branch-0.20-security Reason: Bug Author: Jitendra Nath Pandey Ref: CDH-3609 commit b85680d3efc1fbefa6e2237e5756562106b0c39b Author: Aaron T. Myers Date: Thu Sep 22 18:26:14 2011 -0700 Amend HADOOP-7119. Add in tests which require Kerberos infrastructure Reason: Didn't commit these tests originally since they would have always failed. Author: Alejandro Abdelnur Ref: CDH-3558 commit e6697b11c4d14cfb0832eae2ac0c496684a69f22 Author: Aaron T. Myers Date: Wed Sep 21 16:02:24 2011 -0700 HADOOP-7621. alfredo config should be in a file not readable by users Reason: Bug Author: Alejandro Abdelnur Ref: CDH-3608 commit f77da510bd7e4606477bf475731c8b303ffcae68 Author: Aaron T. Myers Date: Wed Sep 21 15:58:20 2011 -0700 HADOOP-7666. branch-0.20-security doesn't include o.a.h.security.TestAuthenticationFilter Reason: Bug Author: Aaron T. Myers Ref: CDH-3626 commit d6c9b1b1a69e27a2d7ea66cc6bca427ebc0ed426 Author: Aaron T. Myers Date: Wed Sep 21 15:56:13 2011 -0700 HADOOP-7665. branch-0.20-security doesn't include SPNEGO settings in core-default.xml Reason: Bug Author: Aaron T. Myers Ref: CDH-3625 commit 81caf3bf3a5099a7b15a6f55aae930d7beb96a0c Author: Todd Lipcon Date: Sun Sep 18 10:14:23 2011 -0700 HDFS-1779. Fix a bug regarding recovery of blocks being written while NN restarts This patch adds a new RPC 'blocksBeingWrittenReport()' which the DN calls on startup and when it re-connects to a restarted NameNode. This reports all blocks currently under construction, so the NN can re-add them to the targets list for a block if necessary. Reason: avoid HBase data-loss scenario when NN crashes Author: Hairong Kuang Ref: CDH-3507 commit b5bf4322cc047c1f95b814b49bc872c1433dd235 Author: Eli Collins Date: Wed Sep 21 16:49:22 2011 -0700 HADOOP-7653. tarball doesn't include .eclipse.templates. The hadoop tarball doesn't include .eclipse.templates. This results in a failure to successfully run ant eclipse-files. Reason: Bug Author: Jonathan Natkins Ref: CDH-3266 commit 77e1a32e46942124c1dcaa8ba731da7e499bd547 Author: Aaron T. Myers Date: Mon Sep 19 00:10:39 2011 -0700 HADOOP-7119. add Kerberos HTTP SPNEGO authentication support to Hadoop JT/NN/DN/TT web-consoles Reason: New Feature Author: Alejandro Abdelnur Ref: CDH-3558 commit d6d4c8bbd31486ad7661331044d0d568f6b5eabb Author: Eli Collins Date: Fri Sep 16 16:37:32 2011 -0700 HDFS-2186. DN volume failures on startup are not counted. Volume failures detected on startup are not currently counted/reported as such. Eg if you have configured 4 volumes, 2 tolerated failures, and you start a DN with two failed volumes it will come up and report (to the NN) no failed volumes. The DN will still be able to tolerate 2 additional volume failures (ie it's OK with no valid volumes remaining). The intent of the volume failure toleration config value is that if more than this # of volumes of the total set of configured volumes have failed the DN should shutdown, therefore volume failures detected on startup should count against this quota. Reason: Bug Author: Eli Collins Ref: CDH-3371 commit 8f0f3f85374720b8daa89b69a053a85138804f94 Author: Ahmed Radwan Date: Fri Sep 16 01:55:10 2011 -0700 MAPREDUCE-2836. Provide option to fail jobs when submitted to non-existent pools. Reason: Improvement Author: Ahmed Radwan Ref: CDH-3464 commit ea52f4e327fea44c37ff6f1c99619b254bf1f25e Author: Aaron T. Myers Date: Tue Sep 13 13:48:57 2011 -0700 HDFS-2305. Running multiple 2NNs can result in corrupt file system Reason: Bug Author: Aaron T. Myers Ref: CDH-2761 commit 4d71976d7346279ff6aa6063b8e9562a9d0df281 Author: Ahmed Radwan Date: Wed Aug 17 12:52:48 2011 -0700 MAPREDUCE-2992. TestLinuxTaskController is broken. Reason: Bug Author: Ahmed Radwan Ref: CDH-3477 commit f29635a416a79d2aad6fa7361438381d9c02b713 Author: Todd Lipcon Date: Mon Aug 15 18:38:54 2011 -0700 HDFS-1480. Fix some cases where rack policy could be violated This patch fixes an issue in determining replication targets where decomissioning or corrupt replicas were considered the same as valid replicas when considering rack locality policy. This would cause all replicas of a block to end up on the same rack when many nodes were decommissioned. Reason: avoid replication policy violation Ref: CDH-3069 Author: Todd Lipcon commit cac73dae4df7c536f36870083840a1a8f8c44303 Author: Eli Collins Date: Wed Sep 7 18:19:33 2011 -0700 MAPREDUCE-2760. mapreduce.jobtracker.split.metainfo.maxsize typoed in mapred-default.xml. The configuration mapreduce.jobtracker.split.metainfo.maxsize is incorrectly included in mapred-default.xml as mapreduce.job.split.metainfo.maxsize. It seems that jobtracker is correct, since this is a JT-wide property rather than a job property. Reason: Bug Author: Todd Lipcon Ref: CDH-3547 commit 9459e990a858f2452f04de02fce4cd011c1a8c6d Author: Alejandro Abdelnur Date: Thu Aug 25 08:46:06 2011 -0700 HADOOP-7507. jvm metrics all use the same namespace. Reason: Bug Author: Alejandro Abdelnur Ref: CDH-3297 commit 542c18a9d5d871d6363f93d99133e627688ef564 Author: Harsh J Date: Fri Aug 26 14:05:42 2011 +0530 HDFS-1959. Better error message for missing namenode directory. Better error message when NN starts with a missing name dir. Reason: Improvement Author: Eli Collins Ref: CDH-3502 commit 64eca816d6b6e35c27464065d300a05932165ac8 Author: Aaron T. Myers Date: Fri Aug 26 12:44:14 2011 -0700 HDFS-970. FSImage writing should always fsync before close Reason: Bug Author: Todd Lipcon Ref: CDH-3474 commit dbf17dc0186ee1096f760751d2bb872798d91f26 Author: Alejandro Abdelnur Date: Thu Aug 25 17:20:08 2011 -0700 CLOUDERA BUILD. Add Snappy-Java config file to switch off in-JAR native library Reason: Improvement Author: Alejandro Abdelnur Ref: CDH-3492 commit e87de0366ed98d24abb2f575509937ec25f38330 Author: Roman Shaposhnik Date: Mon Aug 22 14:50:45 2011 -0700 CLOUDERA-BUILD. 32-bit builds of jsvc embed 64-bit library paths commit 76e6564a623218e81417b241a4f1fa71b1db5606 Author: Ahmed Radwan Date: Wed Aug 3 10:36:58 2011 -0700 MAPREDUCE-2651. Race condition in LTC for job log directory creation Reason: Bug Author: Bharath Mundlapudi Ref: CDH-3385 commit 935bf0003568fa985ea241b5e68c1dc462395d13 Author: Roman Shaposhnik Date: Wed Aug 17 20:47:10 2011 -0700 CLOUDERA BUILD. Provide libsnappyjava.so commit e3f8dc3926d119ce3b765325114e5b8ada01120f Author: Tom White Date: Wed Aug 10 17:12:27 2011 -0700 MAPREDUCE-1943. Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes Reason: Improvement Author: Mahadev konar Ref: CDH-1794 commit aeebe0d1415a3dc7a70aa88771607ef1eaebb192 Author: Tom White Date: Wed Aug 10 16:59:00 2011 -0700 MAPREDUCE-1482. Better handling of task diagnostic information stored in the TaskInProgress. Reason: Improvement Author: Amar Kamat Ref: CDH-1794 commit 27e22060592cee0ce920b592a43e46f84d01857b Author: Eli Collins Date: Sat Aug 13 13:15:21 2011 -0700 HDFS-2259. DN web-UI doesn't work with paths that contain html. Reason: Bug Author: Eli Collins Ref: CDH-3304 commit af12a8df06d5a9a72f2f22b2c47e71808437ac81 Author: Eli Collins Date: Fri Aug 12 13:28:44 2011 -0700 HDFS-2235. Encode servlet paths. HADOOP-7531. Add servlet util methods for handling paths in requests. Reason: Bug Author: Eli Collins Ref: CDH-3304 commit d46071c27b19c7a691005882260967d62cda6dfd Author: Eli Collins Date: Wed Aug 3 13:48:23 2011 -0700 HDFS-1317. HDFSProxy needs additional changes to work after changes to streamFile servlet in HDFS-1109. Reason: Bug Author: Rohini Palaniswamy Ref: CDH-3304 commit bdd9f9900811d5032e0c29b013ef45b46f7ffea2 Author: Eli Collins Date: Wed Aug 3 13:43:49 2011 -0700 HDFS-1109. HFTP and URL Encoding. Reason: Bug Author: Dmytro Molkov Ref: CDH-3304 commit 349cd124819f31d29c0a6dad7f21ad595e7ab788 Author: Eli Collins Date: Wed Aug 3 13:32:41 2011 -0700 HDFS-1340. A null delegation token is appended to the url if security is disabled when browsing filesystem. Reason: Bug Author: Jitendra Pandey Ref: CDH-3304 commit 581dfd390459c3e3b9962724330b4b33967f4559 Author: Eli Collins Date: Sat Aug 13 19:00:43 2011 -0700 HDFS-2023. Backport of NPE for File.list and File.listFiles. Merged ports of HADOOP-7322, HDFS-1934, HADOOP-7342, and HDFS-2019. Reason: Bug Author: Bharath Mundlapudi Ref: CDH-3307 commit 95683ebe551b9e4ee2e0b0419696dc46e2710162 Author: Eli Collins Date: Sat Aug 13 19:17:59 2011 -0700 CLOUDERA-BUILD. Point *-default doc links to the right place. commit 28582806dc186d5abcbdc0c442d72eff84aa2c34 Author: Ahmed Radwan Date: Tue Aug 9 19:24:53 2011 -0700 MAPREDUCE-2524. Backport trunk heuristics for failing maps when we get fetch failures retrieving map output during shuffle. Reason: Improvement Author: Thomas Graves Ref: CDH-3441 commit 33a9d3f31ae34a873890b2a8b16bfb49808dc537 Author: Aaron T. Myers Date: Mon Aug 8 15:05:11 2011 -0700 HDFS-2190. NN fails to start if it encounters an empty or malformed fstime file Reason: Bug Author: Aaron T. Myers Ref: CDH-3331 commit 8d1c5e03eb7440901fadb1fc85509dc1e61bff86 Author: Eli Collins Date: Fri Jul 29 15:15:28 2011 -0700 HADOOP-7491. hadoop command should respect HADOOP_OPTS when given a class name. Reason: Improvement Author: Eli Collins Ref: CDH-3392 commit ad6ac50988232cc950bc69d0866e67f5565ec9fa Author: Andrew Bayer Date: Fri Jul 22 10:06:59 2011 -0700 CLOUDERA-BUILD. Updating for CDH3u2 SNAPSHOT. commit 927c26b2cabbbe742026e5ba70855476dc38968e Author: Ahmed Radwan Date: Mon Jul 18 05:26:48 2011 -0700 MAPREDUCE-2529. Recognize Jetty bug 1342 and handle it. Reason: Bug Author: Thomas Graves Ref: CDH-3351 commit edb61378f7b6c80a7385e3e24997de429f39e0d8 Author: Tom White Date: Wed Jul 13 13:34:52 2011 -0700 MAPREDUCE-2638. Create a simple stress test for the fair scheduler Reason: Test Author: Tom White Ref: CDH-2847 commit bdafb1dbffd0d5f2fbc6ee022e1c8df6500fd638 Author: Eli Collins Date: Mon Jul 11 18:49:14 2011 -0700 MAPREDUCE-2670. Fixing spelling mistake in FairSchedulerServlet.java. Reason: Bug Author: Eli Collins Ref: DISTRO-273 commit be5652e5aa6a3888f4e52608c3a591b03ad48487 Author: Tom White Date: Wed Jul 6 17:43:59 2011 +0100 CLOUDERA-BUILD. Undeprecate backported MapReduce library classes using the old API. Ref: CDH-3203 commit 3e156a031877ce98c438376dcf7accb56b95dc65 Author: Andrew Bayer Date: Wed Jul 6 00:42:16 2011 -0700 CLOUDERA-BUILD. Updating versions for cdh3u1 release. commit 3bec5b05964cf8a1f705ad0cf3b10c3ac707f1d5 Author: Aaron T. Myers Date: Tue Jul 5 19:43:03 2011 -0700 HDFS-1758. Web UI JSP pages thread safety issue Reason: Bug Author: Tanping Wang Ref: CDH-2842 commit 8eff3591387814abe8e079f2689bf9a38aa498f2 Author: Aaron T. Myers Date: Tue Jul 5 18:15:17 2011 -0700 HDFS-2011. Removal and restoration of storage directories on checkpointing failure doesn't work properly Reason: Bug Author: Ravi Prakash Ref: CDH-3315 commit 12a5778288de7628dfa2a27fd344e83a8ce6cdc2 Author: Todd Lipcon Date: Tue Jul 5 19:44:20 2011 -0700 MAPREDUCE-2447. Fix Child.java to set Task.jvmContext sooner to avoid corner cases in error handling. Reason: Fix possible NPE if TaskLogs.syncLogs fails in child Author: Siddharth Seth Ref: CDH-3132 commit 0eab1fbea6a968c2514a16a2f96a36ebfa30c6b6 Author: Todd Lipcon Date: Tue Jul 5 18:13:12 2011 -0700 Amend MAPREDUCE-2373. Fix a possible NPE if setPermissions fails while launching task script. Reason: avoid NPE seen in production Author: Todd Lipcon Ref: CDH-3151 commit b5c5941d73cf037dc03ee5c8848708da6f6d5566 Author: Eli Collins Date: Tue Jul 5 17:52:35 2011 -0700 HDFS-1628. AccessControlException should display the full path. org.apache.hadoop.security.AccessControlException should display the full path for which the access is denied. Reason: Improvement Author: John George Ref: CDH-2765 commit 50cee77a34b3d7b7c8a7a710fb3f4e8e1448288c Author: Todd Lipcon Date: Tue Jul 5 16:55:57 2011 -0700 MAPREDUCE-2443. Fix TaskAspect for TaskUmbilicalProtocol.ping. Author: Siddharth Seth Reason: fix test-system compile after MR-2429 Ref: CDH-3132 commit 5829715e1d4bf739668d5b246bf00b3f136733f2 Author: Todd Lipcon Date: Tue Jul 5 16:33:13 2011 -0700 MAPREDUCE-2429. Validate JVM in TaskUmbilicalProtocol. Reason: Fix issue where TT gets into inconsistent state Author: Siddharth Seth Ref: CDH-3132 commit 708b259abe2a0e287b4370cc41da87254b4c46dd Author: Eli Collins Date: Tue Jul 5 16:02:22 2011 -0700 HDFS-1836. Thousand of CLOSE_WAIT socket. Reason: Bug Author: Bharath Mundlapudi Ref: CDH-3200 commit 3412dde10617df0cffa4d4744d6b1f2a0d59e23a Author: Eli Collins Date: Tue Jul 5 14:49:16 2011 -0700 HADOOP-7272. Remove unnecessary security related info logs. Two info logs are printed when connection to RPC server is established, is not necessary. On a production cluster, these log lines made up of close to 50% of lines in the namenode log. I propose changing them into debug logs. Reason: Improvement Author: Suresh Srinivas Ref: CDH-3174 commit 7d08d6a9f223f270e5f4728a85e0ed3934a347f7 Author: Eli Collins Date: Tue Jul 5 14:33:07 2011 -0700 HADOOP-7325. hadoop command - do not accept class names starting with a hyphen. Reason: Improvement Author: Brock Noland Ref: CDH-3244 commit ece7c80048db98aae5a81603ae426b8663afb975 Author: Eli Collins Date: Tue Jul 5 13:16:18 2011 -0700 HADOOP-7053. wrong FSNamesystem Audit logging setting in conf/log4j.properties. "log4j.logger.org.apache.hadoop.fs.FSNamesystem.audit=WARN" should be "log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=WARN". Reason: Bug Author: Jingguo Yao Ref: CDH-3293 commit 72cefcf1c9848ccb08a391a22830b403cd70a9a9 Author: Eli Collins Date: Tue Jul 5 12:18:57 2011 -0700 HDFS-1897. Documention refers to removed option dfs.network.script. The HDFS user guide tells users to use dfs.network.script for rack awareness. In fact, this option has been removed and using it will trigger a fatal error on DataNode startup. Documentation should describe the current rack awareness configuration system. Reason: Bug Author: Andrew Whang Ref: CDH-3153 commit c30f419af201daea7d7131d5c50fef6b09997513 Author: Eli Collins Date: Tue Jul 5 12:15:41 2011 -0700 MAPREDUCE-2472. Extra whitespace in mapred.child.java.opts breaks JVM initialization. When creating taskjvm.sh, we split mapred.child.java.opts on " " and then create a quoted argument for each of those results. So, if you have an extra space anywhere in this configuration, you get an argument '' in the child command line, which the JVM interprets as an empty class name. This results in a ClassNotFoundException and the task cannot run. Reason: Bug Author: Aaron T. Myers Ref: CDH-3152 commit 76ac08dad430d600c6cc69424c21d16a4ba42d42 Author: Eli Collins Date: Tue Jul 5 12:10:54 2011 -0700 HADOOP-7247. Fix documentation to reflect new jar names. In several places, we have the old jar naming style of hadoop - * - examples.jar. With Ivy and Maven, we had to rename the jars to hadoop - examples - *.jar. Therefore, we need to update the documentation. Reason: Improvement Author: Owen O'Malley Ref: CDH-3099 commit d0a46bc2c278a9f6c19365ac712c2945269f8ee1 Author: Eli Collins Date: Tue Jul 5 11:26:08 2011 -0700 HADOOP-5464. DFSClient does not treat write timeout of 0 properly. dfs.datanode.socket.write.timeout is used for sockets to and from datanodes. It is 8 minutes by default. Some users set this to 0, effectively disabling the write timeout (for some specific reasons). When this is set to 0, DFSClient sets the timeout to 5 seconds by mistake while writing to DataNodes. This is exactly the opposite of real intention of setting it to 0 since 5 seconds is too short. Reason: Bug Author: Raghu Angadi Ref: CDH-3101 commit 1476f32a4bb161a3ffc81231b99b472b0dbe3adb Author: Eli Collins Date: Tue Jul 5 10:30:44 2011 -0700 HDFS-1753. Resource Leak in StreamFile. Reason: Bug Author: Uma Maheswara Rao G Ref: CDH-3243 commit 8b1f6a660e604fb39284ef8cad7821a6ec27baf5 Author: Todd Lipcon Date: Mon Jul 4 14:53:15 2011 -0700 HADOOP-7428. IPC connection is orphaned with null 'out' member Reason: Can impact a user's ability to submit jobs, among other issues Author: Todd Lipcon Ref: CDH-3306 commit 2e9c10c247d5be1ea9b9b20983ada0e898d7e3ab Author: Todd Lipcon Date: Mon Jul 4 14:44:06 2011 -0700 HADOOP-7440. HttpServer.getParameterValues throws NPE for missing parameters Reason: fix user-visible NPE Author: Todd Lipcon Ref: CDH-3083 commit e08745678913d8a348815dcee69465f4a6a03540 Author: Todd Lipcon Date: Fri Jun 17 18:05:16 2011 -0700 HADOOP-7402. TestConfiguration doesn't clean up after itself Reason: test cleanliness Author: Aaron T. Myers Ref: CDH-3279 commit f0e8a989ff4be91c1428eea1d6e27cc3d13d5817 Author: Ahmed Radwan Date: Thu Jun 23 07:20:43 2011 -0700 MAPREDUCE-2254. Allow setting of end-of-record delimiter for TextInputFormat. MAPREDUCE-2602. Allow setting of end-of-record delimiter for TextInputFormat (for the old API). Reason: Improvement Author: Ahmed Radwan Ref: CDH-3268 commit 671085620586a21d4c4e3a35476e823d237045c9 Author: Eli Collins Date: Thu Jun 30 01:21:10 2011 -0700 HDFS-1592. Datanode startup doesn't honor volumes.tolerated. Reason: Bug Author: Bharath Mundlapudi Ref: CDH-3064 commit 0faf23ca0a9b6b8a90282a3b266db278a28394fa Author: Eli Collins Date: Sat Jun 25 16:10:55 2011 -0700 HDFS-1692. In secure mode, Datanode process doesn't exit when disks fail. Reason: Bug Author: Bharath Mundlapudi Ref: CDH-3064 commit 5bd0314bbe72ffab90c310d110986fc71165f121 Author: Eli Collins Date: Mon Jun 27 22:40:57 2011 -0700 HDFS-2117. DiskChecker#mkdirsWithExistsAndPermissionCheck may return true even when the dir is not created. Reason: Bug Author: Eli Collins Ref: CDH-3064 commit 645b176875769c8dcbf7d839926eb735bdfd5b14 Author: Eli Collins Date: Wed Jun 29 23:16:27 2011 -0700 HADOOP-7040. DiskChecker:mkdirsWithExistsCheck swallows FileNotFoundException. Reason: Bug Author: Boris Shkolnik Ref: CDH-3064 commit 8d991630bc1a04c70fbc31435b4fdb3f26033cf8 Author: Eli Collins Date: Sat Jun 25 10:35:11 2011 -0700 HDFS-235. Add support for byte-ranges to hftp. HDFS-2110. Cleanup StreamFile#sendPartialData. HADOOP-7429. Add another IOUtils#copyBytes method. HADOOP-7057. IOUtils.readFully and IOUtils.skipFully have typo in exception creation's message. Reason: Improvement Author: Eli Collins Ref: CDH-3243 commit 4fd9e18fb751824c140d5b67645fc925a92a7c1f Author: Alejandro Abdelnur Date: Tue Jun 28 17:35:48 2011 -0700 HADOOP-7433. Snappy SO file/links are copied to the wrong directory Reason: Bug, they must be copied to the $OS_ARCH directory Author: Alejandro Abdelnur Ref: CDH-3300 commit 3c9402dcc658e6415c59e4866ec3ee0227e819f1 Author: Eli Collins Date: Wed Jun 22 19:10:58 2011 -0700 HDFS-1850. DN should transmit absolute failed volume count rather than increments to the NN. Reason: Improvement Author: Eli Collins Ref: CDH-3065 commit 5a995e37d6430a6790f27476680da1555fbfc031 Author: Eli Collins Date: Mon May 9 18:06:45 2011 -0700 HDFS-556. Provide info on failed volumes in the web ui. HDFS-457 provided better handling of failed volumes but did not provide a co rresponding view of this functionality on the web ui, such as a view of which datanodes have failed volumes. This would be a good feature to have. Reason: Improvement Author: Eli Collins Ref: CDH-1099 commit 13eaedf77798820b92ac17caf717b8e3ea5f8562 Author: Eli Collins Date: Mon May 9 18:05:57 2011 -0700 HDFS-811. Add metrics, failure reporting and additional tests for HDFS-457. Reason: Improvement Author: Eli Collins Ref: CDH-1099 commit c216ea863bcca97efc8220bf1a7507bcd4b12ca5 Author: Eli Collins Date: Tue Jun 28 17:35:32 2011 -0700 HADOOP-7290. Unit test failure in TestUserGroupInformation. Reason: Bug Author: Eli Collins Ref: DISTRO-266 commit 87bcc8eb1940ed60ee1b9dc6489781dd1841e932 Author: Aaron T. Myers Date: Tue Jun 28 14:48:18 2011 -0700 HADOOP-7144. Implement capability of querying individual property of a mbean using JMXProxyServlet Reason: Improvement Author: Tanping Wang Ref: CDH-3229 commit 329522db408a8cbd943f7a8f62b646b8b238bfe3 Author: Aaron T. Myers Date: Tue Jun 28 14:20:35 2011 -0700 HADOOP-7144. Expose JMX with something like JMXProxyServlet Reason: Provide the ability to get JMX metrics and status info via HTTP Author: Robert Joseph Evans Ref: CDH-3229 commit a3dcabc6542035a0943648bdace5702192c7187c Author: Todd Lipcon Date: Tue Apr 5 22:47:40 2011 -0700 Amend HADOOP-6762 to fix potential deadlock. This fixes a deadlock that occurs if the writing of the call parameters throws an IOException after the timeout ping time has elapsed. Author: Todd Lipcon Ref: DISTRO-120 commit 0fb261a32591092f5b0601b294f0ea0ba8b72310 Author: Todd Lipcon Date: Fri Jun 17 17:35:35 2011 -0700 HADOOP-7121. Exceptions while serializing IPC call responses are not handled well. Contributed by Todd Lipcon. Reason: bug fixes for potential hangs of IPC layer, and additional test coverage for DISTRO-120 Author: Todd Lipcon Ref: DISTRO-120 commit e7ab332a8037e7117919c833c0ac0a999307d681 Author: Roman Shaposhnik Date: Tue Jun 28 09:20:13 2011 -0700 CLOUDERA BUILD. Making call succeed regardless of the file permissions commit ec7e2867282ab51ad353ff6ed4f268425036a6e3 Author: Alejandro Abdelnur Date: Fri Jun 24 14:30:59 2011 -0700 CLOUDERA BUILD. Pull Snappy source, build it and wire it to Hadoop build commit bd3ea0cbe457e49b5e1bbdf4d7dc57599feb3537 Author: Alejandro Abdelnur Date: Fri Jun 24 11:33:16 2011 -0700 HADOOP-7206 SNAPPY backport Adds Snappy compression support. Reason: New Functionality Author: Issei Yoshida, Alejandro Abdelnur Ref: CDH-3039 HADOOP-7206 SNAPPY backport commit 871f34b3bdb7d323cfc91f0538ccb848deebc7f3 Author: Aaron T. Myers Date: Mon Jun 20 17:13:22 2011 -0700 HDFS-1602. NameNode storage failed replica restoration is broken Reason: Bug Author: Boris Shkolnik Ref: CDH-3208 commit 032c764a0a933e004085442758083d4fea2cf876 Author: Aaron T. Myers Date: Tue Jun 21 15:49:06 2011 -0700 HDFS-2100. Improve TestStorageRestore Reason: Test Author: Aaron T. Myers Ref: CDH-3208 commit 699edb198e0518572957f7edb77570615850da59 Author: Eli Collins Date: Wed Jun 22 15:22:51 2011 -0700 CLOUDERA-BUILD. Update eclipse template classpath to remove dupes and update verions. commit b4d1557ba2d16f1fa3af7e4b7bb1265bc7cb6a30 Author: Eli Collins Date: Sun May 22 19:51:10 2011 -0700 HDFS-1978. All but first option in LIBHDFS_OPTS is ignored. Reason: Bug Author: Eli Collins Ref: CDH-3210 commit 1f3e7f44a9c6f56b4a2921faa82f0d81321dbd64 Author: Eli Collins Date: Wed Jun 22 10:54:00 2011 -0700 HDFS-2055. Add hflush support to libhdfs. Reason: New Feature Author: Travis Crawford Ref: DISTRO-257 commit 2348c8bfd82cbb3aa4685e7f8d85968c7cbe08b1 Author: Eli Collins Date: Wed Jun 22 10:30:47 2011 -0700 HDFS-420. Fuse-dfs should cache fs handles. Fuse-dfs should cache fs handles on a per-user basis. This significantly increases performance (and has the side effect of fixing the current code which leaks fs handles). Reason: Improvement Author: Brian Bockelman, Eli Collins Ref: CDH-2786 commit 0eeb795d156edf6f4e7c5c4b722d85737cd49736 Author: Roman Shaposhnik Date: Fri Jun 17 10:07:31 2011 -0700 CLOUDERA-BUILD. Building RPMs from SRPMs in CDH needs to rebuild the projects commit 21868a0d245c73742c90d23a82a7536c198a5a3f Author: Todd Lipcon Date: Tue Feb 15 10:53:23 2011 -0800 HADOOP-7145. Configuration.getLocalPath should trim strings Reason: fix potential bug with local dirs Author: Todd Lipcon Ref: CDH-2662 commit 9d0bd80bedd72f3b366d5ceda970109a0d3e124a Author: Aaron T. Myers Date: Fri Jun 17 16:02:59 2011 -0700 HDFS-2082. SecondayNameNode web interface doesn't show the right info Reason: Bug Author: Aaron T. Myers Ref: CDH-3277 commit 68250308093f335bac63e65171ae22db03412c13 Author: Aaron T. Myers Date: Fri Jun 17 12:30:43 2011 -0700 HADOOP-3741. SecondaryNameNode has http server on dfs.secondary.http.address but without any contents Reason: New Feature Author: Tsz Wo (Nicholas), SZE Ref: CDH-1695 commit d7915a354ade800a163788af7dd43f187f0442aa Author: Aaron T. Myers Date: Fri Jun 17 14:24:48 2011 -0700 HADOOP-4794. Add branch info to HadoopVersionAnnotation Reason: Improvement Author: Chris Douglas Ref: CDH-3274 commit a7f154b738ffdb129eb07be88abc925e447d6b00 Author: Todd Lipcon Date: Fri Jun 17 17:04:09 2011 -0700 Amend MAPREDUCE-2323. Fix bug causing NPE when the "set pool" option is used twice on the same job. Reason: important bug fix Ref: CDH-3036 Author: Todd Lipcon commit 97d8bb472f57c1abc73a5240675a37b8e4b5b31a Author: Tom White Date: Tue Jun 7 16:59:31 2011 -0700 HADOOP-7323. Add capability to resolve compression codec based on codec name Reason: Improvement Author: Alejandro Abdelnur Ref: CDH-3226 commit 829bc94b23b9ab447fc51919cecfe5d9bd0a0c2b Author: Tom White Date: Tue Jun 7 16:40:23 2011 -0700 HADOOP-6996. Allow CodecFactory to return a codec object given a codec' class name Reason: Improvement Author: Hairong Kuang Ref: CDH-3226 commit 8b49cf2446f0a5ac5f750b2abc07787c40142878 Author: Roman Shaposhnik Date: Fri Jun 3 19:07:34 2011 -0700 MAPREDUCE-2260. Remove auto-generated native build files HADOOP-6436. Remove auto-generated native build files HDFS-1582. Remove auto-generated native build files HDFS-1619. Remove AC_TYPE* from the libhdfs Reason: Native build files generated on older version of Linux/autotools tend to break builds on newer OSes Author: Roman Shaposhnik Ref: CDH-894 commit d94813ecd0d4b3f63f4d30baa8a22a59dc76d5a8 Author: Aaron T. Myers Date: Tue May 24 14:26:47 2011 -0700 Revert "HADOOP-6988. Add support for reading multiple hadoop delegation token files" This reverts commit ce67cd87f21543348ca5c137dee3ff0dc7f338dd. commit 0dc60cdb69d7a52068629bcecf79d69dd1cb1132 Author: Eli Collins Date: Wed May 18 14:44:27 2011 -0700 MAPREDUCE-2505. Explain how to use ACLs in the fair scheduler. The fair scheduler already works with the ACL system introduced through the mapred.queue.* parameters, but the documentation doesn't explain how to use this. We should add a paragraph or two about it. Reason: Improvement Author: Matei Zaharia Ref: CDH-2050 commit 6a658661998bfec440c181e01cefb3dbee7f525a Author: Roman Shaposhnik Date: Fri May 13 18:23:00 2011 -0700 DISTRO-224. CDH packages should depend on JRE, not JDK commit 1db6dae127e0a93084ab4cebb840d3af91e429c0 Author: Aaron T. Myers Date: Mon May 16 19:17:18 2011 -0700 Back-port HADOOP-7124, HDFS-1814, MAPREDUCE-2473 - Hadoop /usr/bin/groups equivalent Reason: Allows users to query and display their group membership. Author: Aaron T. Myers Ref: CDH-2986 commit 86fe9a95f6356855038f9c605fe54e682304c88e Author: Aaron T. Myers Date: Thu May 12 15:52:25 2011 -0700 Amend HDFS-1378. Edit log replay should track and report file offsets in case of errors Reason: Original back-port had a bug. This back-port includes the fix as committed to trunk. Author: Aaron T. Myers and Todd Lipcon Ref: CDH-3072 commit d30b8b83175fbf96644cffbda37e90cb4703c139 Author: Todd Lipcon Date: Thu May 12 15:07:51 2011 -0700 HADOOP-6947. Kerberos login should set the refreshKrb5Config option Reason: necessary for daemons that will use multiple keytab files for different principals Author: Todd Lipcon Ref: CDH-3184 commit b96019898f7c4cba702371a7ef238977ddc88b0e Author: Todd Lipcon Date: Thu May 12 15:05:11 2011 -0700 HADOOP-7189. Add ability to enable JAAS debug option with an environment variable. Adds the HADOOP_JAAS_DEBUG environment variable, which, when set to "true", dumps extra debugging information out of JAAS. Reason: aids debugging of security issues like "Failure to login" Author: Ted Yu Ref: CDH-3183 commit c2804fb55590f62018f4fc379275ae01af001adc Author: Konstantin Boudnik Date: Wed May 4 17:01:30 2011 -0700 MAPREDUCE-2023. TestDFSIO read test may not read specified bytes. Reason: Fixing a bug in the test Author: Hong Tang Ref: CDH-3148 commit b9c48d0bf87c3ca3cabd467c6b03360a022e5669 Author: Konstantin Boudnik Date: Wed May 4 14:10:34 2011 -0700 MAPREDUCE-1832. Support for file sizes less than 1MB in DFSIO benchmark. Reason: Reverting backport of MAPREDUCE-1614 and completing the merge of MAPREDUCE-1832. Author: Konstantin Boudnik Ref: CDH-3140 commit 763893247e8e94a6da8060d2335550b90cf0662e Author: Alejandro Abdelnur Date: Thu Apr 28 13:21:45 2011 -0700 MAPREDUCE-2457. job submission should inject group.name Description: Reason: common used functionality by FairScheduler Author: Alejandro Abdelnur Ref: CDH-3088 commit a7c507d6d763fd6f8868198959f2759749841426 Author: Konstantin Boudnik Date: Mon May 2 14:43:35 2011 -0700 MAPREDUCE-1614. TestDFSIO should allow to configure output directory Reason: Fixing bug in the test Author: Konstantin Boudnik Ref: CDH-3123 commit 858d5bb8ad49cf2b3f65af939be97bcbae6b25e5 Author: Konstantin Boudnik Date: Mon May 2 14:42:41 2011 -0700 MAPREDUCE-1832. Support for file sizes less than 1MB in DFSIO benchmark. Reason: Backport test improvements. Author: Konstantin Shvachko Ref: CDH-3117 commit 0efd27c636b1a2a23c64e019af50cebcc2c98d83 Author: Aaron T. Myers Date: Thu Apr 28 00:32:33 2011 -0700 Amend HADOOP-6995. Allow wildcards to be used in ProxyUsers configurations Reason: Forgot to backport documentation portion of the change Author: Todd Lipcon Ref: CDH-3100 commit e863f5bee5763ec354384645b9d62743a052fae9 Author: Aaron T. Myers Date: Thu Apr 28 00:20:22 2011 -0700 HDFS-1846. Don't fill preallocated portion of edits log with 0x00 Reason: Improvement Author: Aaron T. Myers Ref: CDH-3059 commit e78be89d287e49207547f82a68e92b0d9a6d5413 Author: Aaron T. Myers Date: Wed Apr 27 13:45:03 2011 -0700 HDFS-1862. Improve test reliability of HDFS-1594 Reason: Test Author: Aaron T. Myers Ref: CDH-3095 commit 08b60c48c049cc3e8965f6f8cf8bad55b2969e99 Author: Aaron T. Myers Date: Fri Apr 22 18:02:26 2011 -0700 HDFS-1594. When the disk becomes full Namenode is getting shutdown and not able to recover Reason: Bug Author: Aaron T. Myers Ref: CDH-2895 commit 71618e9eb918d8e27a57db8a40683bd8a3e0d7d1 Author: Aaron T. Myers Date: Thu Apr 21 17:35:44 2011 -0700 HADOOP-7229. Absolute path to kinit in auto-renewal thread Reason: Bug Author: Aaron T. Myers Ref: CDH-3024 commit 927d00941693e7774174c795b91bba1811d801bd Author: Tom White Date: Tue Apr 19 14:19:58 2011 -0700 MAPREDUCE-1813. NPE in PipeMapred.MRErrorThread Reason: Bug Author: Ravi Gummadi Ref: CDH-2154 commit 58815d50145c62a961f14a6f789491b3e4272fbe Author: Eli Collins Date: Mon Apr 18 14:52:10 2011 -0700 HADOOP-7045. TestDU fails on systems with local file systems with extended attributes. We should modify the test to allow for some extra on-disk slack. The on-disk usage could also be smaller if the file data is all zeros or compression is enabled. The test currently handles the former by writing random data, we're punting on the latter. Reason: Test Author: Eli Collins Ref: CDH-3033 commit d4375b1e0415d9c76885af1df6cd2ebc3db33237 Author: Eli Collins Date: Fri Apr 15 12:29:35 2011 -0700 HADOOP-7159. RPC server should log the client hostname when read exception happened. Reason: Improvement Author: Scott Chen Ref: CDH-2766 commit d6a988a1f38609634c8b5364a7caac03871d2c25 Author: Andrew Bayer Date: Thu Apr 7 12:15:29 2011 -0700 CLOUDERA-BUILD. Updating for CDH3u1 development. commit 81256ad0f2e4ab2bd34b04f53d25a6c23686dd14 Author: Bruno Mahé Date: Thu Mar 24 11:47:04 2011 -0700 DISTRO-185. Needs to add sun jdk provided by RHEL to the list of jvm candidates Description: Services wouldn't start since they could not find the sun jdk on RHEL6 Reason: Bug Author: Bruno Mahé Ref: CDH-2858 commit 03ab6d72146cfd99028fec22f6d28994d515df12 Author: Andrew Bayer Date: Mon Mar 21 11:26:14 2011 -0700 CLOUDERA-BUILD. Changing fuse-dfs tests to use test.junit.output.format for Junit formatter, rather than hardcoding as plain. commit aa3b91aaeca5e5bcd5988ee0fe1d619167ed38fa Author: Tom White Date: Sun Mar 20 21:43:55 2011 -0700 HDFS-1762. Allow TestHDFSCLI to be run against a cluster Author: Tom White Ref: CDH-2797 commit b399ca4df2d7bafc27fc91361d451358ef8a394a Author: Todd Lipcon Date: Tue Mar 15 17:21:29 2011 -0700 MAPREDUCE-2366. TaskTracker can't retrieve stdout and stderr from web UI Reason: bug fix via 0.20-security-203 Author: Richard King Ref: CDH-2772 commit 53dd2c7f23291ee58a5d0d4ab8bab1b5bf47b2ba Author: Todd Lipcon Date: Fri Mar 11 10:31:04 2011 -0800 HDFS-1520, HDFS-1554, HDFS-1555. Add new lightweight recoverLease API for use by HBase This adds a limited-public API recoverLease() which is used by the HBase master when recovering the HBase write-ahead log. Author: Hairong Kuang, backport help from Andrew Purtell Ref: CDH-2812 commit fd14a491d0a1bcae807aa4d985b71c4170eb1136 Author: Todd Lipcon Date: Wed Mar 16 15:14:23 2011 -0700 HDFS-1759. Improve error message when starting secure DN without jsvc Author: Todd Lipcon Ref: CDH-2554 commit 978164b1b4e1ed07f236b21c6cc757b3a96f3ec0 Author: Todd Lipcon Date: Tue Mar 15 15:01:08 2011 -0700 MAPREDUCE-2364. Don't hold the rjob lock while localizing resources. Reason: TT deadlock, patch from branch 0.20-security-203 Author: Devaraj Das Ref: CDH-2772 commit e1d94a529d7adac4012854703ea4b10d21f8829b Author: Todd Lipcon Date: Tue Mar 15 15:00:27 2011 -0700 MAPREDUCE-1563. TaskDiagnosticInfo may be missed sometime Reason: Bug fix via 0.20-security-203 Author: Krishna Ramachandran Ref: CDH-2772 commit 244bc14fd518142b015ef1b539ec899daeb18e77 Author: Todd Lipcon Date: Tue Mar 15 14:55:38 2011 -0700 MAPREDUCE-2356. Fix a task state corrupting race Reason: can cause a task to succeed even though all attempts were errors Author: Luke Lu Ref: CDH-2772 commit 7c3266e8072d54c2d18755c4b0c4d3fb153f5dc0 Author: Todd Lipcon Date: Tue Mar 15 14:22:34 2011 -0700 CLOUDERA-BUILD. Add .gitignore for Cloudera maven target directories commit d8f01d688916739125f3d321be75ed741b7b3a6f Author: Todd Lipcon Date: Wed Mar 16 13:43:06 2011 -0700 CLOUDERA-BUILD. Update footer to indicate new product naming Author: Todd Lipcon Ref: CDH-2831 commit d52118b2e49f1ccd29286574ad017707cdd63d0c Author: Todd Lipcon Date: Wed Mar 16 13:20:17 2011 -0700 MAPREDUCE-2377. task-controller fails to parse configuration if it doesn't end in \n Reason: fix hard-to-diagnose bug Author: Todd Lipcon Ref: CDH-2578 commit ca46366798e704396bd2de8e3ef4bc1b074b88a9 Author: Todd Lipcon Date: Tue Mar 15 18:34:37 2011 -0700 CLOUDERA-BUILD. Default cloudera.hash to empty string This restores the proper behavior of inferring the git hash from the current repository, if it's not overridden on the command line. Author: Todd Lipcon Ref: CDH-2829 commit 6ca2af6321cbabf8029092ce6550ec8e78673fba Author: Todd Lipcon Date: Tue Mar 15 14:28:51 2011 -0700 HADOOP-7104. Remove unnecessary DNS reverse lookups from RPC layer Reason: Fixes potential performance issues when DNS blips occur Author: Kan Zhang Ref: DISTRO-108 commit bf19274fc15bb5b37089f3a50db7dbb053c92490 Author: Eli Collins Date: Tue Mar 15 21:36:53 2011 -0700 HDFS-780. Revive TestFuseDFS. Reason: Improvement Author: Eli Collins Ref: CDH-2778 commit f0cefd74f8727ebd331c6712ab5b4c004e46a629 Author: Eli Collins Date: Tue Mar 15 10:37:46 2011 -0700 CLOUDERA-BUILD. Nuke DFSConfigKeys.DFS_BLOCK_SIZE_KEY. Ref: CDH-2828 commit 04dffae7b2e160fedc5aa9fdb7daa0eb79e93b0f Author: Eli Collins Date: Tue Mar 15 10:29:45 2011 -0700 HDFS-1189. Quota counts missed between clear quota and set quota. Reason: Bug Author: John George Ref: CDH-2788 commit 6b6df88107a98b899aaac7dc20f061bc9f60735f Author: Eli Collins Date: Tue Mar 15 10:12:26 2011 -0700 HDFS-1258. Clearing namespace quota on "/" corrupts FS image. Reason: Bug Author: Aaron T. Myers Ref: CDH-2788 commit 0863f15d727e1ad6e96a0887a93a82f315c8f734 Author: Todd Lipcon Date: Fri Mar 11 16:33:05 2011 -0800 Amend HADOOP-7167. Allow list of tests to be excluded during build. No longer uses /dev/null as a canonical empty file, since it causes the build to fail on Cygwin. Author: Todd Lipcon Ref: CDH-2777 commit 16fa5d016e9bbe79896adf0c24dd8510b31c0325 Author: Todd Lipcon Date: Fri Mar 11 15:55:15 2011 -0800 MAPREDUCE-2379, HADOOP-7184. Distributed cache sizing configurations are missing from mapred-default.xml * Moves local.cache.size from core-default.xml into mapred-default.xml * Adds documentation for mapreduce.tasktracker.cache.local.numberdirectories * Fixes the configuration parameter mapreduce.tasktracker.cache.local.numberdirectories to be named the same as it is in trunk -- previous betas had the incorrect name mapreduce.tasktracker.local.cache.numberdirectories Reason: fix docs Author: Todd Lipcon Ref: CDH-2815 commit fd48669392567338109a981164083c781d5e7993 Author: Jenkins Date: Sat Mar 12 13:40:39 2011 -0800 CLOUDERA-BUILD. Updating versions for cdh3u0 release. commit 0929aa6f798e6e1b736bc8715ade29686bec08f3 Author: Tom White Date: Fri Mar 11 18:49:07 2011 -0800 HADOOP-7183. WritableComparator.get should not cache comparator objects Reason: regression in HADOOP-6881 Author: Tom White Ref: CDH-2810 commit 2a243d114e14d80a036c6a614672df0a88f6f8f7 Author: Todd Lipcon Date: Fri Mar 11 13:18:32 2011 -0800 Amend MAPREDUCE-2178. Fix compilation failure due to unchecked return code on gcc 4.4.4 Reason: Fix test on Ubuntu Maverick Author: Todd Lipcon Ref: CDH-2813 commit da757539930beecd990188d5b0e2796f8fbb3953 Author: Todd Lipcon Date: Fri Mar 11 13:10:28 2011 -0800 MAPREDUCE-2376. Allow test-task-controller to specify the user to test as Can now specify a username in the TC_TEST_USERNAME environment variable in order for this test to pass when running as a userid < 1000. Reason: fix build on Cloudera hudson where uid = 101 Author: Todd Lipcon Ref: CDH-2811 commit 69fc8b16f4f098ad215582fdfc3efea26e54464f Author: Todd Lipcon Date: Tue Mar 8 11:52:33 2011 -0800 HADOOP-7156. Workaround for unsafe implementations of getpwuid_r Adds a new configuration hadoop.work.around.non.threadsafe.getpwuid which can be used to enable a mutex around this call to workaround the thread-unsafe behavior. Reason: RHEL 6.0 and some other systems have thread-unsafe implementations of this libc call. This causes JVM crashes during the shuffle where this call is made frequently from many threads. Author: Todd Lipcon Ref: CDH-2725 commit 50194947583182a237e14c08a968de770cd3f969 Author: Todd Lipcon Date: Wed Mar 9 14:20:28 2011 -0800 MAPREDUCE-2372. TaskLogAppender mechanism shouldn't be set in log4j.properties Reason: fixes cleanup tasks to log to proper directory even if using a CDH2 log4j.properties Author: Todd Lipcon Ref: CDH-2793 commit 8f4bc5f77bb496928529d1d56fe5831d35c89d83 Author: Todd Lipcon Date: Tue Mar 8 13:24:22 2011 -0800 MAPREDUCE-2371. Fix TaskLogsTruncater to not need to call obtainLogsDirOwner Reason: fixes unnecessary fork in child tasks which causes higher ulimit requirements compared to CDH2 Author: Todd Lipcon Ref: CDH-2784 commit c48cec8fa73f8aaa3a565a4f57985b93157e6caf Author: Todd Lipcon Date: Wed Mar 9 14:20:51 2011 -0800 MAPREDUCE-2373. When tasks exit with a nonzero exit status, task runner should log the stderr as well as stdout Reason: assists debugging of task failures Author: Todd Lipcon Ref: CDH-2794 commit ec8790e50f212782f59ec904210e6cd07a62eb8e Author: Todd Lipcon Date: Wed Mar 9 14:38:33 2011 -0800 MAPREDUCE-2374. Don't use PrintWriter API for writing taskjvm.sh Reason: PrintWriter obscures errors. Also seems to fix a race condition which caused "Text file busy" errors launching taskjvm.sh on some QA clusters Author: Todd Lipcon Ref: CDH-2794 commit 6037aff7bb49c057b9661d83fb7c89dfd3694738 Author: Todd Lipcon Date: Tue Mar 8 15:55:11 2011 -0800 HADOOP-7154. Set MALLOC_ARENA_MAX in default config Reason: RHEL 6.0 support Author: Todd Lipcon Ref: CDH-2721 commit 462e80e19c2ab2e40aa6ca4b590580de9b9a4a1b Author: Todd Lipcon Date: Tue Mar 8 11:02:49 2011 -0800 HADOOP-7172. SecureIO should not check owner on non-secure clusters that have no native support Reason: Fix shuffle performance regression when native libraries are not installed Author: Todd Lipcon Ref: CDH-2779 commit 97c67eea39f2d15ecb7a479efda60204fc46e4c5 Author: Todd Lipcon Date: Mon Mar 7 12:25:32 2011 -0800 Amend MAPREDUCE-2234. Previous patch resulted in too many ls -l calls during heartbeats The previous commit under this JIRA changed the checkLocalDirs function to use the checkDir() function that takes a permission. This is fine at startup, but is expensive since it results in an `ls -l` fork for every local directory. This happens on every heartbeat and is not necessary. This patch amends the function to only use this form of checkDir() at start time, and otherwise just use the less expensive native java calls. Author: Todd Lipcon Ref: CDH-2780 commit b6f34f37281d49de97e7d41e55ffbed596036067 Author: Todd Lipcon Date: Mon Mar 7 23:06:10 2011 -0800 HADOOP-7173. Remove unused fstat() call from NativeIO Reason: Remove unused code after HADOOP-7115 Author: Todd Lipcon Ref: CDH-2779 commit df154b1e141761112f667455baab2b4620f1b465 Author: Todd Lipcon Date: Mon Mar 7 22:36:45 2011 -0800 HADOOP-7115. Reapply final patch to add a cache to username resolution Also fixes a bug where a user not found would trigger an assertion error and crash the JVM. Author: Devaraj Das Ref: CDH-2779 commit cc57649b0c17113dde2fc8b350206dbeb159c9e3 Author: Todd Lipcon Date: Mon Mar 7 22:23:14 2011 -0800 Revert "HADOOP-7115. Reduces the number of calls to getpwuid_r and getpwgid_r, by implementing a cache in NativeIO." This reverts commit 3ef31bcc86610d496976b4de9ada82e73f47f162. commit d9541f7113f0e678af0819f45876bbcd454b20d5 Author: Todd Lipcon Date: Thu Mar 3 15:19:25 2011 -0800 Amend MAPREDUCE-2323. Fix unregistering of fair scheduler metrics updater during fairsched termination Fixes an occasional test failure where different test cases in the same JVM were causing each other to fail. Author: Todd Lipcon Ref: CDH-2677 commit c7c90372c078febe77344c656b512c3d927c6a71 Author: Todd Lipcon Date: Tue Mar 8 10:16:20 2011 -0800 HDFS-1625. TestDataNodeMXBean fails if disk space usage changes during test run Reason: flaky test Author: Tsz Wo (Nicholas), SZE Ref: CDH-2783 commit 1eed2c5af334077a27d5007b006db345de9c4d0f Author: Andrew Bayer Date: Tue Mar 8 12:50:50 2011 -0800 CLOUDERA-BUILD. Changing releases repo to point to staging area. commit 63c98116795d0b7908d2de335e7fcd53449bc514 Author: Todd Lipcon Date: Mon Mar 7 17:36:15 2011 -0800 HADOOP-7167. Allow using a file to exclude certain tests from the build. Reason: ability to exclude known-flaky tests on golden Hudson Author: Todd Lipcon Ref: CDH-2777 commit c388c047ece60f560187add9446de412292a583c Author: Andrew Bayer Date: Mon Feb 28 11:00:42 2011 -0800 CLOUDERA-BUILD. Fixing KITCHEN-815. * Invoking mvn before anything else now for property generation, etc. * Adding ant-contrib jar to support that. commit bd69a6ea66f4aa6905a7347a94e6cc351bfb235a Author: Andrew Bayer Date: Mon Mar 7 15:57:15 2011 -0800 CLOUDERA-BUILD. Fixing contrib paths. commit f7a7a032b7f4300084951720331cf6732756b5b2 Author: Andrew Bayer Date: Mon Mar 7 13:51:21 2011 -0800 CLOUDERA-BUILD. Removing source:jar from do-release-build commit c3174e9c80710d30fa832394709174fc8d7f6e6b Author: Andrew Bayer Date: Mon Mar 7 12:44:36 2011 -0800 CLOUDERA-BUILD. Source jars weren't being generated due to change to use existing jars for artifacts. commit 01b42afec6b6e878ed7805aac2e9a95c779d9840 Author: Andrew Bayer Date: Mon Mar 7 10:50:47 2011 -0800 CLOUDERA-BUILD. Simplifying repository setup. commit 43f756d9569ac009dbae2c84064b29e8163aaa19 Author: Andrew Bayer Date: Thu Mar 3 11:41:36 2011 -0800 CLOUDERA-BUILD. DISTRO-109 - Use original jars as Maven artifacts rather than exploding/rebuilding. commit 74127ea6ddff6b107dc0e2a7a72365482e33a5c0 Author: Andrew Bayer Date: Sun Mar 6 18:40:26 2011 -0800 CLOUDERA-BUILD. Adding relativePath to parent POMs. commit 8d34833e7e62ebd73a6ce4868a150172e46b9701 Author: Andrew Bayer Date: Thu Mar 3 22:12:20 2011 -0800 CLOUDERA-BUILD. Cleanup. commit 0e6382feae06e358114932b0f5136862311cca6a Author: Todd Lipcon Date: Thu Mar 3 15:42:06 2011 -0800 HADOOP-6943. The GroupMappingServiceProvider interface should be public Reason: organizations may want to implement this interface for their needs Author: Aaron T. Myers Ref: CDH-2263 commit 85731af89c0e110d0219cdb4f6ea1cf09eb2e53a Author: Tom White Date: Wed Mar 2 15:48:21 2011 -0800 MAPREDUCE-2351. mapred.job.tracker.history.completed.location should support an arbitrary filesystem URI Reason: Limitation Author: Tom White Ref: CDH-2714 commit 25ece8066682682f6fdd595845dbf71555aef5bb Author: Todd Lipcon Date: Tue Mar 1 13:17:58 2011 -0800 Amend MAPREDUCE-2178. Update tests for fixed configuration checking code Author: Todd Lipcon Ref: CDH-2755 commit d62a49fc3196810f096fdbbd0ca4f48af976d5df Author: Tom White Date: Tue Mar 1 09:50:14 2011 -0800 MAPREDUCE-1845. FairScheduler.tasksToPeempt() can return negative number Reason: Bug Author: Scott Chen Ref: CDH-1555 commit 46ff62f304a755dc4d47f595194cf2c6d01faab5 Author: Tom White Date: Mon Feb 28 13:32:52 2011 -0800 HADOOP-7011. KerberosName.main(...) throws NPE Reason: Useful for debugging Author: Aaron T. Myers Ref: CDH-2673 commit 3fd1dd275427012b93dab53d8e9b3c78aed1fc6f Author: Andrew Bayer Date: Fri Feb 25 15:31:51 2011 -0800 CLOUDERA-BUILD. Add source jars to Maven process, and add hadoop-mrunit to Mavenization. * This is for KITCHEN-866 - and I discovered that in CDH2, we'd been deploying hadoop-mrunit, but hadn't been in CDH3B4. So I've added that. commit bc52432d9832d82bc4f60166c7c718e65fe63359 Author: Andrew Bayer Date: Fri Feb 25 15:31:32 2011 -0800 CLOUDERA-BUILD. Rolling back attempt to speed up build by pre-caching artifacts in Maven repo - ended up breaking in non-Maven context. commit e4596923b767eb163e141e41d5058c983e95f885 Author: Andrew Bayer Date: Wed Feb 23 09:53:00 2011 -0800 CLOUDERA-BUILD. Fixing reactor repo specification. commit 80e7cd19dd23f552efd0bdf1f8b0509aa6b4b3d3 Author: Andrew Bayer Date: Mon Feb 21 10:32:23 2011 -0800 CLOUDERA-BUILD. Using local Maven repo as primary first in chain. Tweaks to pre-fetch dependencies into ~/.m2/repository before ant build is run, with Ivy configured to get from there before trying Maven Central. commit af111808a1edd957a56fe77d1ba2fdc4233cafda Author: Jenkins Date: Sat Feb 19 00:28:02 2011 -0800 CLOUDERA-BUILD. Preparing for cdh3u0 development. commit 3aa7c91592ea1c53f3a913a581dbfcdfebe98bfe Author: Jenkins Date: Sat Feb 19 00:27:52 2011 -0800 CLOUDERA-BUILD. Preparing for CDH3B4 release. commit dd51c56ab63cb12bc207647f314ab99e1e8da32b Author: Todd Lipcon Date: Thu Feb 17 16:37:05 2011 -0800 Amend HADOOP-7070. Fix spurious warning message when running on machine with no krb5.conf The issue is that UGI.initialize would call KerberosName.setConfiguration before setting its own flag to indicate it was initialized. Then, if there was no krb5.conf, the class initializer of KerberosName would call back into UGI.isSecurityEnabled, causing initialize() to be run a second time. This bug doesn't exist upstream. Reason: spurious warnings Author: Todd Lipcon Ref: CDH-2688 commit 1ff4b594c6f9926cf49842672740a229bf06491d Author: Todd Lipcon Date: Wed Feb 16 11:15:12 2011 -0800 Amend MAPREDUCE-2178. Add log message when task JVM fails to fork Author: Todd Lipcon Ref: CDH-2671 commit f57c22b8ec079abc5a051551bce9b1209fa3e6a3 Author: Todd Lipcon Date: Tue Feb 15 23:32:32 2011 -0800 MAPREDUCE-2332. Improve error message when userlogs dir has bad ownership Patch differs from trunk patch on account of MR-2178 Reason: common souce of user error Author: Todd Lipcon Ref: CDH-2670 commit 211e7bb1ea1fecc5894d53815c70be8b68c46643 Author: Todd Lipcon Date: Tue Feb 15 19:03:04 2011 -0800 MAPREDUCE-2331. Cover task graph servlet in fair scheduler system test Reason: improve jcarder coverage Author: Todd Lipcon Ref: CDH-2660 commit 4e93ef108e3ea798f22ef901f090999fe44a8888 Author: Todd Lipcon Date: Tue Feb 15 19:02:54 2011 -0800 MAPREDUCE-2180. Add coverage of Fair Scheduler servlet to system test Reason: improve jcarder coverage for possible deadlocks Author: Todd Lipcon Ref: CDH-2660 commit 279a018f693a5721d7228e7c801327dda0aecb81 Author: Bruno Mahé Date: Tue Feb 15 15:25:19 2011 -0800 CLOUDERA-BUILD. Installation script needs to be adapted for the new naming scheme. Reason: Our mavenization effort changes our artifacts names Author: Bruno Mahé Ref: KITCHEN-833 commit cbada181614e3a32c9bbc2bc5e274798aa94217e Author: Tom White Date: Mon Feb 14 14:57:21 2011 -0800 CLOUDERA-BUILD. TestLocalMRNotification times out in CDH3. commit 2ac40e32af497c4c0d69c5921bd1504356b11086 Author: Todd Lipcon Date: Tue Feb 15 10:47:43 2011 -0800 Amend MAPREDUCE-1441. Reapply trimming of whitespace in mapred.local.dir configurations Reason: User bug report - regression from b2 to b3 Author: Todd Lipcon Ref: CDH-2662 commit 061eb38e4b442cf3f97fcb45a3059384fd74d036 Author: Todd Lipcon Date: Tue Feb 15 10:17:37 2011 -0800 CLOUDERA-BUILD. Fix a bug where HADOOP_DAEMON_DETACHED leaked into the environment of children This fixes a problem reported on the cdh-user list where tasks that forked out to call bin/hadoop ended up only catching the first 10 lines of output. Tested by writing a streaming script that catted a large text file off HDFS - verified bug is fixed. Author: Todd Lipcon Ref: CDH-2661 commit 88e89c048d8f6f346667e64b782b7daf91d8a019 Author: Todd Lipcon Date: Sun Feb 13 21:06:17 2011 -0800 Amend HADOOP-7093. Revert incompatible change in semantics of HttpServer The original backport pulled in part of HADOOP-6461, which changed the way the "webapps" directory is located on the classpath. This broke HBase's ability to locate its UIs. In order to avoid having to patch HBase in CDH, this patch reverts that part of the change and works around the issue in the tests a different way. Reason: Should work with upstream HBase Author: Todd Lipcon Ref: CDH-2635 commit e0934a30d7f6a3adb2b9b2f534eefda9a4ece41d Author: Todd Lipcon Date: Mon Feb 14 23:31:13 2011 -0800 HADOOP-7140. IPC Reader threads should stop when server stops Reason: bug preventing TT from shutting down when build version is incompatible Author: Todd Lipcon Ref: CDH-2634 commit 3ad9f29cdcc14fbf41c6642b746ee04afaa92ff5 Author: Todd Lipcon Date: Sat Feb 12 19:24:07 2011 -0800 MAPREDUCE-2323. Add metrics to the fair scheduler Reason: Necessary for CMON, useful for monitoring Author: Todd Lipcon Ref: OPSAPS-2076 commit 3f5313383362c86a2df8be55d2c524d82f9fac85 Author: Todd Lipcon Date: Sun Feb 13 20:14:09 2011 -0800 Amend MAPREDUCE-2242. Reapply after MAPREDUCE-2178. Reason: fix environment escaping Author: Todd Lipcon Ref: CDH-2572 commit 541407b9f144228f2b7934decc114c59b769e481 Author: Todd Lipcon Date: Sat Feb 12 13:28:13 2011 -0800 Amend MAPREDUCE-2178. Revert incompatible API change to FileUtil.chmod Reverts a change which removed InterruptedException from FileUtil.chmod's signature. Though the function never throws InterruptedException, this removal causes compilation failures for any clients who try to catch this exception (incl Pig) Reason: fix Pig build failure Author: Todd Lipcon Ref: CDH-2633 commit a4778bbf1c461b56828f9810ea44c5893f929150 Author: Todd Lipcon Date: Fri Feb 11 17:58:47 2011 -0800 CLOUDERA-BUILD. Re-bootstrap native builds with maintainer mode Also includes bootstrap.sh where missing commit b329fa59501af3bef287aa0bdaa4c517cd41ad04 Author: Todd Lipcon Date: Fri Feb 11 21:17:06 2011 -0800 CLOUDERA-BUILD. Add AM_MAINTAINER_MODE to all configure.ac commit 50329213ff9f712bc07922a212c0931d20a31de6 Author: Konstantin Boudnik Date: Fri Feb 11 17:59:15 2011 -0800 HADOOP-6879. Provide SSH based (Jsch) remote execution API for system tests Reason: missing dependency breaks system tests build Author: Konstantin Boudnik Ref: CDH-2622 commit 4b44138e14ad63a6b54a962e16c8f1fd922b3a80 Author: Todd Lipcon Date: Fri Feb 11 00:38:48 2011 -0800 CLOUDERA-BUILD. task-controller configuration directory should be inferred from task-controller location Searches at ../../conf/ for task-controller.cfg Author: Todd Lipcon Ref: CDH-2623 commit a6f5e7109f538e2b9374c6518c80b0575e2dfa9f Author: Todd Lipcon Date: Fri Feb 11 15:06:42 2011 -0800 Amend HADOOP-5489. hdfsproxy-env.sh.template was updated but not hdfsproxy-env.sh Reason: avoid local modifications to src tree on build Author: Todd Lipcon Ref: CDH-2588 commit cc3ba6c2c33ea827e6a54cda2759d03e7e2da4c1 Author: Todd Lipcon Date: Wed Jan 19 15:11:58 2011 -0800 HADOOP-7114. FsShell should dump errors at debug level Reason: easier to debug exceptions thrown in FsShell Author: Todd Lipcon Ref: CDH-2624 commit 5a57891c772488d8b02bcf54f4247f8fffa81d1f Author: Todd Lipcon Date: Fri Feb 11 13:06:26 2011 -0800 Amend MAPREDUCE-2178. Remove AC_SYS_LARGEFILE from configure.ac This flag allows opening of files >2GB, but the task-controller doesn't need to do this. The removal is important because some RHEL5 systems have an fts.h which is incompatible with the resultant CFLAG when building 32-bit. Reason: RHEL5 32-bit build Author: Todd Lipcon Ref: CDH-2623 commit df540fdaa94d96a2a1bc2685774ae44b145bfa98 Author: Todd Lipcon Date: Fri Feb 11 10:56:52 2011 -0800 Amend MAPREDUCE-1493. Fix a typo in HTML markup on jobdetailshistory (typo made in original backport, not upstream) Reason: fix invalid HTML Author: Todd Lipcon Ref: CDH-2622 commit e70a9985960b7ac9e2f6bf3826e93f6f8c44e46a Author: Todd Lipcon Date: Wed Feb 9 14:58:03 2011 -0800 HADOOP-5913. Add support for starting/stopping queues. Author: Rahul K Singh Ref: CDH-2622 commit 1da8cc2964b488a55091e7b8b8f3a494ba4c1772 Author: Todd Lipcon Date: Fri Feb 11 12:11:24 2011 -0800 MAPREDUCE-2321. Check for NativeIO at TT startup Reason: Easier failure diagnosis for secure TT Author: Todd Lipcon Ref: CDH-2623 commit 4b1697b297d13990e17c3b3eaaf508686a2e78a5 Author: Todd Lipcon Date: Tue Feb 1 14:05:34 2011 -0800 MAPREDUCE-2289. Fix job staging directory to get automatically chmodded to correct permissions if incorrect Reason: fixes failures in TestFairSchedulerSystem Author: Todd Lipcon Ref: CDH-2626 commit 5d44075f3ac224bf9a259b0731035734d9c152a2 Author: Todd Lipcon Date: Thu Feb 3 09:23:55 2011 -0800 Ammend MAPREDUCE-2234. TaskTracker should fail on startup if log dir isn't writable Reapply after MAPREDUCE-2178 backport. Reason: Easier diagnosis of misconfigured TT permissions Author: Todd Lipcon Ref: CDH-2500 commit a2b4149afd53d59fd9a279117c6917e4c83583a3 Author: Todd Lipcon Date: Thu Feb 10 19:40:24 2011 -0800 HDFS-1318, MAPREDUCE-2330. Add MXBeans for JT, TT, DN, NN Author: Tanping Wang, Luke Lu Ref: CDH-2622 commit ee5c73991b43fd49a1a4eed599d2d52065054209 Author: Todd Lipcon Date: Fri Feb 11 12:17:35 2011 -0800 Amend MAPREDUCE-2178. Check result of chdir Reason: necessary to pass -Werror on more recent gcc Author: Todd Lipcon Ref: CDH-2623 commit 717544d462bc56188d165008bc1d841bc1c03904 Author: Todd Lipcon Date: Fri Feb 11 12:16:08 2011 -0800 Amend MAPREDUCE-2178. Check argc *after* checks for perms, etc Reason: Fix error messages during taskcontroller setup Author: Todd Lipcon Ref: CDH-2623 commit b8f3851b9604b8c1156c4a9df5c6a8b532676104 Author: Todd Lipcon Date: Fri Feb 11 12:11:15 2011 -0800 Amend MAPREDUCE-2178. Fix racy check for config file perms Reason: Security fix Author: Todd Lipcon Ref: CDH-2623 commit fa6aca09466301c65d8d8e5d92c43e50f46683ad Author: Todd Lipcon Date: Mon Feb 7 10:05:49 2011 -0800 Amend MAPREDUCE-2103. Reapply "task-controller permissions checks too stringent" after MAPREDUCE-2173 Reason: match documentation Author: Todd Lipcon Ref: CDH-2623 commit 7361ea92c13d6ca986e332acef70c2d8983c2f4c Author: Todd Lipcon Date: Thu Feb 10 21:15:48 2011 -0800 Amend MAPREDUCE-2265. Restore sbin location for task controller install Reason: reapply after YDH 0.20.100 merge Author: Todd Lipcon Ref: CDH-2623 commit 0ed0d5e311f4f0c57ab6bafa39d324e19dd15b53 Author: Todd Lipcon Date: Mon Feb 7 09:45:39 2011 -0800 Amend MAPREDUCE-967. Reapply behavior which was clobbered by MAPREDUCE-2178 (TT should not unpack job jars unnecessarily) Author: Todd Lipcon Ref: CDH-2623 commit c2b050e2e466b4e43a8458fd05c72e289ed2d563 Author: Todd Lipcon Date: Mon Feb 7 09:55:35 2011 -0800 CLOUDERA-BUILD. Integrate task-controller changes from MAPREDUCE-2178 into Cloudera build commit ac1fd5519d4cd7ddc5cf740d4b4459232523fb12 Author: Todd Lipcon Date: Wed Feb 9 16:54:58 2011 -0800 MAPREDUCE-2178. Write task initialization to avoid race conditions leading to privilege escalation and resource leakage by performing more actions as the user. Author: Owen O'Malley, Devaraj Das, Chris Douglas Ref: CDH-2622 commit bb004aae8abc4f7e772adda6a75f433cf7cb198d Author: Todd Lipcon Date: Wed Jan 19 18:02:52 2011 -0800 HDFS-1597. Fix assertion in TestEditLogRace Reason: Sporadic test failure Author: Todd Lipcon Ref: CDH-2559 commit dbaa8cd7a1a81de7700dcec4517dbf2012906641 Author: Todd Lipcon Date: Wed Jan 26 21:24:38 2011 -0800 HDFS-1601. Pipeline ACKs are sent as lots of tiny TCP packets Reason: HBase performance Author: Todd Lipcon Ref: CDH-2627 commit 0d086cde04450cbb5a5f6d39a345aafcdadaa511 Author: Todd Lipcon Date: Wed Feb 2 16:59:52 2011 -0800 HDFS-1114. Reduce NameNode memory usage by an alternate hash table Author: Tsz Wo (Nicholas) Sze Reason: reduce memory usage in the NameNode Ref: CDH-2622 commit 921d337cfa66dcc22207f3fb42e385aff4e229d0 Author: Todd Lipcon Date: Wed Feb 2 16:59:52 2011 -0800 HDFS-1119. Introduce a GSet interface to BlocksMap. commit 216d29555d3fb62aca7362a5611bbc5ec7846b6a Author: Todd Lipcon Date: Wed Feb 2 16:59:52 2011 -0800 HDFS-599. Allow NameNode to have a separate port for service requests from client requests. Reason: Allows port-based QoS to prioritize DN RPCs over client RPCs, also increases fairness Author: Dmytro Molkov Ref: CDH-2622 commit 53c9961d5b350c96200a3b85c2302a2b569e6fa8 Author: Todd Lipcon Date: Wed Feb 2 16:59:52 2011 -0800 HDFS-1298. Add support in HDFS for new statistics added in FileSystem to track the file system operations. Author: Suresh Srinivas Ref: CDH-2622 commit f1625663dc6008b89af3ff80e19d64f4717f1a9b Author: Todd Lipcon Date: Wed Feb 2 16:59:52 2011 -0800 HDFS-1315. Add fsck event to audit log and remove other audit log events corresponding to FSCK listStatus Author: Suresh Srinivas Ref: CDH-2622 commit 3d026b0a1706483a4860ad80fc17b103448ac1b0 Author: Todd Lipcon Date: Wed Feb 2 16:59:52 2011 -0800 HDFS-1383. Better error messages in HFTP Author: Tsz Wo (Nicholas) Sze Ref: CDH-2622 commit f148732a5c983877fb62ecfe5815eb445a192573 Author: Todd Lipcon Date: Wed Feb 2 16:59:52 2011 -0800 HDFS-1061. Memory footprint optimization for INodeFile object. Author: Bharath Mundlapudi Ref: CDH-2622 commit edda8a863002796aa282fa26d74f8843eac4b728 Author: Todd Lipcon Date: Wed Feb 2 16:59:52 2011 -0800 HDFS-1307 Add start time, end time and total time taken for FSCK to FSCK report. Author: Suresh Srinivas Ref: CDH-2622 commit b2cfa8caaa27a75a4452d9e26d3f3a169e13730e Author: Todd Lipcon Date: Wed Feb 2 16:59:52 2011 -0800 HDFS-1085. HFTP read may fail silently on the client side if there is an exception on the server side. Author: Tsz Wo (Nicholas) Sze Ref: CDH-2622 commit 66beb0bfe053b9c0fa02f7ac82310081fa6da2cd Author: Todd Lipcon Date: Wed Feb 2 16:59:52 2011 -0800 HADOOP-6713. The RPC server Listener thread is a scalability bottleneck. Author: Dmytro Molkov Ref: CDH-2622 commit 807e918943e9f17d4ab7912bdb9cc90970c02ef6 Author: Todd Lipcon Date: Wed Feb 2 16:59:52 2011 -0800 HADOOP-6859. Introduce additional statistics to FileSystem to track file system operations. Author: Suresh Srinivas Ref: CDH-2622 commit 8dd45e436896108d8806e5a555621ea6b346912f Author: Todd Lipcon Date: Wed Feb 2 16:59:52 2011 -0800 HADOOP-6899. RawLocalFileSystem#setWorkingDir() does not work for relative names Author: Sanjay Radia Ref: CDH-2622 commit 7a31be4853d46090c7bd7798bdb7cd41915b421c Author: Todd Lipcon Date: Wed Feb 2 18:05:06 2011 -0800 HADOOP-6669. Respect compression configuration when creating DefaultCodec Author: Koji Noguchi Ref: CDH-2622 commit b2586915b911182f60e949de3dd340ae8e8099ca Author: Todd Lipcon Date: Thu Feb 10 20:59:06 2011 -0800 CLOUDERA-BUILD. Re-bootstrap native commit 1fb15b9ee1b9edd5961b5972da4062117b4709e5 Author: Todd Lipcon Date: Thu Feb 10 20:58:22 2011 -0800 CLOUDERA-BUILD. Native build for JNI group mapping code Original JNI patch is against Yahoo's distro which has divergent build files. commit bb55a89bf7a3decd9846989f31d93cb4ed8588b5 Author: Todd Lipcon Date: Wed Feb 2 16:59:51 2011 -0800 HADOOP-6864. Provide a JNI-based implementation of ShellBasedUnixGroupsNetgroupMapping Author: Boris Shkolnik Ref: CDH-2622 commit 2780f0d352553b1a5c177fe20afdea223bd1e405 Author: Todd Lipcon Date: Wed Feb 2 16:59:51 2011 -0800 HADOOP-6818. Provides a JNI implementation of group resolution. Author: Devaraj Das Ref: CDH-2622 commit 562d6a6d79943f4c132e9db773898db533b4dbfd Author: Todd Lipcon Date: Wed Feb 2 16:59:51 2011 -0800 MAPREDUCE-1545. Add 'first-task-launched' to job-summary Author: Luke Lu Ref: CDH-2622 commit 4595403c6e7b1e594ea5759784aaa65eb6d46786 Author: Todd Lipcon Date: Wed Feb 2 16:59:51 2011 -0800 MAPREDUCE-2023 TestDFSIO read test may not read specified bytes. Author: Hong Tang Ref: CDH-2622 commit 4cbfcd923d102ce6bcccb5dcddc1ed124f42bb8f Author: Todd Lipcon Date: Wed Feb 9 17:20:29 2011 -0800 MAPREDUCE-2005. Improvements to TestDelegationTokenRenewal Reason: improve test printouts Author: Boris Shkolnik Ref: CDH-2622 commit 6cceb85a5f8743aaef4a98957e31f6930a013cdd Author: Todd Lipcon Date: Wed Feb 2 16:59:51 2011 -0800 MAPREDUCE-1961. ConcurrentModificationException when shutting down Gridmix Author: Hong Tang Ref: CDH-2622 commit 16d9cf021a1989467b1372d3c2a050e6c4606230 Author: Todd Lipcon Date: Wed Feb 2 16:59:51 2011 -0800 MAPREDUCE-339. JobTracker should give preference to failed tasks over virgin tasks so as to terminate the job ASAP if it is Author: Devaraj Das Ref: CDH-2622 commit 81bd8d5735c682d69d59712a7333a70d081a4216 Author: Todd Lipcon Date: Wed Feb 2 16:59:51 2011 -0800 MAPREDUCE-1936. Make Gridmix3 more customizable. Author: Hong Tang Ref: CDH-2622 commit 1b9bc9af319325bd26e1530ce18527ca8f74dafd Author: Todd Lipcon Date: Wed Feb 2 16:59:51 2011 -0800 MAPREDUCE-1778 CompletedJobStatusStore initialization should fail if {mapred.job.tracker.persist.jobstatus.dir} is unwritable Author: Krishna Ramachandran Ref: CDH-2622 commit e0104169bfac10c2760fcf133b01fd2b710208cb Author: Todd Lipcon Date: Wed Feb 2 16:59:51 2011 -0800 MAPREDUCE-1868 Add read timeout on userlog pull Author: Krishna Ramachandran Ref: CDH-2622 commit a6b1ad67cf903e1e56963dcc64d7d7599321d386 Author: Todd Lipcon Date: Wed Feb 2 16:59:51 2011 -0800 MAPREDUCE-1850. Include job submit host information (name and ip) in jobconf and jobdetails display Author: Krishna Ramachandran Ref: CDH-2622 commit 4098a214be838238a9879f7f978e34e89b736986 Author: Todd Lipcon Date: Wed Feb 2 18:01:15 2011 -0800 HDFS-1626. Make block invalidate limit configurable Author: Tsz Wo (Nicholas) Sze Ref: CDH-2622 commit f1b4799fad93b4f02ee29ce5ef5fc217ff72e377 Author: Todd Lipcon Date: Wed Feb 9 14:53:16 2011 -0800 MAPREDUCE-2328. Add configs for memory-related configurations to mapred-default.xml Author: Yahoo Eng Ref: CDH-2622 commit f879e570e2fe88776d15de04a8597898d06f3f77 Author: Todd Lipcon Date: Thu Feb 10 14:27:45 2011 -0800 HDFS-1364. Makes long running HFTP-based applications do relogins if necessary. Author: Jitendra Pandey Ref: CDH-2622 commit ce9aa5ef9dfc5c4fb8e85f9e9e47e67a4b724296 Author: Todd Lipcon Date: Wed Feb 9 17:10:49 2011 -0800 CLOUDERA-BUILD. Increase Xmx for compiling fault injection tests Ref: CDH-2622 commit f19b644c987305988a36e7d6038b16a9768cb084 Author: Todd Lipcon Date: Wed Feb 9 17:31:35 2011 -0800 Amend MAPREDUCE-2096. MapTask SpillRecord usage doesn't need username. Author: Yahoo Eng Ref: CDH-2622 commit c76f57dcd993063cf960fb42e65219edd5230432 Author: Todd Lipcon Date: Wed Feb 9 17:31:34 2011 -0800 Amend MAPREDUCE-1100. Change log messages in ReduceTask from info to debug level Reason: reduces log size for large reduce tasks Author: Yahoo Eng Ref: CDH-2622 commit 674795fc9a60d06dbea41bd9d5a133439822a62b Author: Todd Lipcon Date: Thu Feb 10 14:13:46 2011 -0800 Amend HADOOP-6706. Improve retry behavior for RPC clients Author: Kan Zhang Ref: CDH-2622 commit bfa0b28baad26de8315ec1f9282728913863c3e7 Author: Todd Lipcon Date: Thu Feb 10 14:27:45 2011 -0800 Partial HADOOP-6965. Refactor getTGT and getRefreshTime out of anonymous class, add synchronized block around relogin Author: Jitendra Pandey Ref: CDH-2622 commit e8b759460ab487e98093292be7fa90afa65f47ec Author: Todd Lipcon Date: Thu Feb 10 14:27:45 2011 -0800 Partial HADOOP-6471. Use StringBuilder in StringUtils.join Author: Yahoo Eng Ref: CDH-2622 commit 3ef31bcc86610d496976b4de9ada82e73f47f162 Author: Todd Lipcon Date: Wed Feb 2 20:03:53 2011 -0800 HADOOP-7115. Reduces the number of calls to getpwuid_r and getpwgid_r, by implementing a cache in NativeIO. Author: Devaraj Das Ref: CDH-2622 commit d2032071037eb33c562d97b16e0cd291f4e3f23b Author: Todd Lipcon Date: Wed Feb 2 16:59:51 2011 -0800 MAPREDUCE-1521. Protection against incorrectly configured reduces Author: Mahadev Konar Ref: CDH-2622 commit 6bc623041a1c0d511250bcfdcae85a7b084b0d5f Author: Todd Lipcon Date: Wed Feb 2 19:35:12 2011 -0800 HDFS-1153. Verify dfsnodelist input for correctness Author: Ravi Phulari Ref: CDH-2622 commit bf655f10661132486cb40ee098fdedbbb5937892 Author: Todd Lipcon Date: Wed Feb 2 19:30:24 2011 -0800 Partial MAPREDUCE-2055. Cache counters in retired job info Does not apply entirety of upstream JIRA as described. Simply caches Counters in the retired job info. Author: Krishna Ramachandran Ref: CDH-2622 commit 3ae2cde7b036603b8aa19e2ab31994dd3209eded Author: Todd Lipcon Date: Wed Feb 2 18:35:44 2011 -0800 MAPREDUCE-1960. Add ability to limit size of jobconf Author: Mahadev Konar Ref: CDH-2622 commit 7429b6597999c1a867926b61d5075bdf85a1be6d Author: Todd Lipcon Date: Wed Feb 2 18:04:02 2011 -0800 Amend HDFS-457. Include new test TestDataNodeVolumeFailure Ref: CDH-2622 Author: Boris Shkolnik commit af1598cf2f8ce26c43f74b0be684662287e34095 Author: Todd Lipcon Date: Wed Feb 2 18:03:25 2011 -0800 HDFS-1101. TestDiskError should check all nodes in cluster for test case Reason: Test failure Author: Chris Douglas Ref: CDH-2622 commit 2c5115af8426e44b9de804b80fcc9502d64efadd Author: Todd Lipcon Date: Thu Feb 10 15:16:56 2011 -0800 MAPREDUCE-1118. Enhance the JobTracker web-ui to ensure tabular columns are sortable, also added a /scheduler servlet to CapacityScheduler for enhanced UI for queue information. Author: Krishna Ramachandran Ref: CDH-2622 commit ba185a27aa4bb1bd965e6aa32a9b5bf3e8388f91 Author: Todd Lipcon Date: Wed Feb 2 17:38:39 2011 -0800 MAPREDUCE-1872, MAPREDUCE-517. Capacity scheduler improvements plus minor framework changes to support - JobInProgress changes to support locality decisions - JobQueueJobInProgressListener.JobSchedulingInfo now has equals() method for Author: Arun Murthy Ref: CDH-2622 commit 83a6619c2656f543e046521f515d97fd70d647bb Author: Todd Lipcon Date: Wed Feb 2 17:37:15 2011 -0800 MAPREDUCE-1774. Additions to Herriot Testing to test Gridmix, Streaming, Task Controllers Includes: MAPREDUCE-1758 Building blocks for the herriot test cases MAPREDUCE-1827 [Herriot] Task Killing/Failing tests for a streaming job. MAPREDUCE-2053 [Herriot] Test Gridmix file pool for different input file sizes based on pool minimum size MAPREDUCE-2033 [Herriot] Gridmix generate data tests with various submission policies and different user resolvers. ... and others from YDH Reason: QA / YDH merge Ref: CDH-2622 commit 5dcc0777f30ae030e20e5e1e3512a0ed6a90e7fc Author: Eli Collins Date: Sun Feb 6 13:22:31 2011 -0800 DISTRO-90. FUSE can pick up the wrong libjvm.so. Reason: Bug Author: Eli Collins Ref: DISTRO-90 commit f8e6600fbdc454600990e4f3732462e9b56e0b1b Author: Eli Collins Date: Sun Feb 6 13:37:24 2011 -0800 MAPREDUCE-2256. FairScheduler fairshare preemption from multiple pools may preempt all tasks from one pool causing that pool to go below fairshare. You have a cluster with 600 map slots and 3 pools. Fairshare for each pool is 200 to start with. Fairsharepreemption timeout is 5 mins. 1) Pool1 schedules 300 map tasks first 2) Pool2 then schedules another 300 map tasks 3) Pool3 demands 300 map tasks but doesn't get any slot as all slots are taken. 4) After 5 mins pool3 should preempt 200 map-slots. Instead of peempting 100 slots each from pool1 and pool2, the bug would cause it to preempt all 200 slots from pool2 (last started) causing it to go below fairshare. This is happening because the preemptTask method is not reducing the tasks left from a pool while preempting the tasks. The above scenario could be an extreme case but some amount of excess preemption would happen because of this bug. Reason: Bug Author: Priyo Mustafi Ref: CDH-2593 commit cce41bfecdffd8f37b5a9ae571a827e8042b39c4 Author: Eli Collins Date: Sun Feb 6 13:12:41 2011 -0800 CLOUDERA-BUILD. tar file has incorrect permissions for jsvc and task-controller. Reason: Bug Author: Eli Collins Ref: CDH-2553 commit fa3b91e008607ff69bd2796f025680aacc97bd11 Author: Eli Collins Date: Sat Feb 5 16:21:19 2011 -0800 DISTRO-44. Hadoop core POM missing jackson dependency. Reason: Bug Author: Eli Collins Ref: DISTRO-44 commit f40f6bef0808d34b0632bd759b7916946b6a500c Author: Eli Collins Date: Sun Feb 6 14:01:28 2011 -0800 HADOOP-5489. hadoop-env.sh still refers to java1.5. Reason: Bug Author: Steve Loughran Ref: CDH-2588 commit e3356ca6f8a2ee616f610da19fc141d7578a905d Author: Andrew Bayer Date: Thu Jan 27 15:55:01 2011 -0800 CLOUDERA-BUILD. Changes to support CDH Mavenization. commit 7e7c0e2d4fe19559a728d2c0860f406124c578e3 Author: Todd Lipcon Date: Mon Jan 31 17:55:47 2011 -0800 Amend MAPREDUCE-1716. Fix test case to wait for up to 20 seconds to verify truncation Reason: truncation is done in a separate thread at JVM finish time, which may come after the job is complete Author: Todd Lipcon Ref: CDH-2579 commit f6ffedb4441ec43ef7d81fe483807115e98aca41 Author: Todd Lipcon Date: Wed Jan 26 13:31:56 2011 -0800 HADOOP-6882. Update the patch level of Jetty to 6.1.26 Reason: Address XSS and many other upstream bugs Author: Owen O'Malley Ref: CDH-2564 commit 545bcc1060833f76eab19fa0425f890cb3f9d2cb Author: Todd Lipcon Date: Fri Jan 28 13:39:43 2011 -0800 MAPREDUCE-2242. Fix environment escaping in LinuxTaskController Reason: Support env variables with "s Author: Todd Lipcon Ref: CDH-2572 commit c5df4748c04337af74ca80a84a03e15ba2de2f0e Author: Todd Lipcon Date: Thu Jan 27 13:01:24 2011 -0800 HDFS-1353. Remove getBlockLocations optimization that blew out LocatedBlocks response size Reason: Address OOME found by QA Author: Jakob Homan Ref: CDH-2573 commit 8bc90cb06955b191c5d4370ca75b3b14aabc9657 Author: Todd Lipcon Date: Fri Jan 28 14:33:31 2011 -0800 HADOOP-5050. TestDFSShell.testFilePermissions should not assume umask setting. Reason: test failure on machines with different umask Author: Jakob Homan Ref: CDH-2574 commit 0d4eb1a867620813affdfd3291cb618d6fce63ca Author: Todd Lipcon Date: Fri Jan 28 10:33:02 2011 -0800 HADOOP-7122. Shell commands leak Timers when timeout expires Reason: Thread leak seen on JT Author: Todd Lipcon Ref: CDH-2568 commit 2ad8c54fecae73213da7c74da9f90ba953f9f9c5 Author: Todd Lipcon Date: Wed Jan 26 17:30:18 2011 -0800 MAPREDUCE-2253. Servlets should specify content type Reason: Fix display in browsers Author: Todd Lipcon Ref: DISTRO-72 commit 51399a0f149292ee18138646488a8070c8b7f34c Author: Todd Lipcon Date: Tue Jan 25 15:03:29 2011 -0800 HADOOP-7118. Fix NullPointerException in Configuration.writeXml Reason: Bug fix Author: Todd Lipcon Ref: CDH-2558 commit be89980babbc50eb7e1ccce9b583fff0ae24cf80 Author: Tom White Date: Wed Jan 26 10:06:06 2011 -0800 MAPREDUCE-2082. Race condition in writing the jobtoken password file when launching pipes jobs Reason: security Author: Jitendra Nath Pandey Ref: CDH-2562 commit c5645ced5c2b32c0657ba3ca60643165c28173ff Author: Todd Lipcon Date: Fri Jan 14 00:42:43 2011 -0800 MAPREDUCE-1085. For tasks, "ulimit -v -1" is being run when user doesn't specify mapred.child.ulimit Reason: spurious errors in logs Author: Todd Lipcon Ref: CDH-2560 commit b02ac3f86f9d929316edd10855721b67459192ba Author: Todd Lipcon Date: Thu Jan 20 13:12:06 2011 -0800 MAPREDUCE-2277. Fix TestCapacitySchedulerWithJobTracker intermittent failure Reason: test failure Author: Todd Lipcon Ref: CDH-2547 commit 6b63d73a1917a6c0529158c3bb78ec2ec16ad7ce Author: Todd Lipcon Date: Thu Jan 20 16:05:01 2011 -0800 HDFS-1589. Dont start secure cluster with insecure ports Reason: security Author: Todd Lipcon Ref: CDH-2557 commit 8b4374bfa12b1a1ed8cc8e0ab209ad763becf791 Author: Todd Lipcon Date: Wed Jan 19 14:22:55 2011 -0800 HADOOP-3953. Implement sticky bit for directories in HDFS. Reason: security on /tmp Author: Jakob Homan Ref: CDH-2091 commit 2bec46c2f46e42a35a69fdbd6f37f8979599e83d Author: Todd Lipcon Date: Wed Jan 19 14:48:57 2011 -0800 Amend HADOOP-5643. Remove PermissionChecker class accidentally left around This class was supposed to be removed by HADOOP-5643 but accidentally was left in the tree. Unreferenced except in one place - now updated to refer to the new implementation. Reason: clean up - noticed during sticky bit backport Author: Todd Lipcon Ref: CDH-2091 commit 562be1407b9e3c2d8907daaa9500ac96364c9fa2 Author: Todd Lipcon Date: Tue Jan 18 10:12:08 2011 -0800 MAPREDUCE-2238. Avoid racy permissions handling Reason: leaving undeletable dirs in userlogs directory Author: Todd Lipcon commit 2df0683fe8b9a6f1c7dc9f9ec49697960b473add Author: Todd Lipcon Date: Tue Jan 18 09:46:30 2011 -0800 HADOOP-7110. Use JNI to implement chmod for performance Reason: fork can be rather slow, chmod is common Author: Todd Lipcon commit efda213ca9682c9ee555b6c9582eb039cfefc122 Author: Todd Lipcon Date: Tue Jan 18 09:43:52 2011 -0800 Revert "HADOOP-6304. Use java.io.File.set{Readable|Writable|Executable} where possible in RawLocalFileSystem" This reverts commit 13e93cafe8d4b1e8b741c1873118cdba0313a564. commit b715fdffb59ad674e16d31db09b75884ddd2e0fa Author: Tom White Date: Mon Jan 24 17:41:41 2011 -0800 HADOOP-5836. Bug in S3N handling of directory markers using an object with a trailing "/" causes jobs to fail Reason: Bug fix Author: Ian Nowland Ref: DISTRO-76 commit 516adbfc45e739130bdbb047e45f068a38e72988 Author: Todd Lipcon Date: Wed Jan 19 00:27:37 2011 -0800 HDFS-1330. Make RPCs to DataNodes timeout. Reason: Customer request Author: Hairong Kuang Ref: CDH-2044 commit 1e3ffff9722ebd775b870a4c914f202930bb525e Author: Todd Lipcon Date: Wed Jan 19 00:18:37 2011 -0800 HADOOP-6889. Make RPC to have an option to timeout. Reason: Customer request Author: Hairong Kuang Ref: CDH-2044 commit fadd26e431dc879d9611f22f2974d4eab30d7efa Author: Tom White Date: Fri Jan 7 16:06:03 2011 -0800 MAPREDUCE-1382. MRAsyncDiscService should tolerate missing local.dir Reason: Makes it possible for jobtracker and tasktracker to share config file and have different volumes. Author: Zheng Shao Ref: CDH-2395, DISTRO-36 commit eb118d65f792dd3947b886ea7f2c971556d496cf Author: Tom White Date: Tue Jan 18 13:35:06 2011 -0800 MAPREDUCE-787. -files, -archives should honor user given symlink path Reason: bug fix Author: Amareshwari Sriramadasu Ref: CDH-2538 commit 2b0e1289ccbdb9c6837e4ab11fdf73fa8980571c Author: Tom White Date: Tue Jan 18 13:33:40 2011 -0800 MAPREDUCE-572. If #link is missing from uri format of -cacheArchive then streaming does not throw error. Reason: bug fix Author: Amareshwari Sriramadasu Ref: CDH-2538 commit a144f415c0e14d1b4d42c72ccf5c97dc8f8423e8 Author: Todd Lipcon Date: Tue Jan 18 14:17:56 2011 -0800 Amend HADOOP-6539. Roll back some doc changes that snuck in from trunk Reason: referenced features not backported into CDH3 Author: Todd Lipcon Ref: CDH-2541 commit 5ebec5b74ea0b6fe9270cc40f770bf4cf4f7d4a7 Author: Todd Lipcon Date: Thu Jan 13 17:33:24 2011 -0800 HADOOP-7093. Servlets should default to text/plain. Reason: fix /stacks and /metrics to be usable again Author: Todd Lipcon Ref: DISTRO-72 commit 185d654adfa40db3978a2f552feec95748589c89 Author: Todd Lipcon Date: Sat Dec 18 18:28:04 2010 -0800 HDFS-1560. DataNode should set permissions on its data dirs rather than failing to start. Also, should default to 700 Reason: Easier setup, better security Author: Todd Lipcon Ref: CDH-2530 commit 390cedb3ba0ec9bf7e4859f89c3e10dd40be2763 Author: Todd Lipcon Date: Fri Jan 14 17:39:21 2011 -0800 Amend MAPREDUCE-1092. Enable asserts for tests by default Reason: reapply patch accidentally reverted by Herriot merge Author: Todd Lipcon Ref: CDH-520 commit 61bf38c1c0b31ef18b93ed225c8367ab4d5d7f96 Author: Todd Lipcon Date: Tue Jan 11 15:46:43 2011 -0800 DISTRO-73. Fix filesystem leak when userlog location has different FS URI than JT No upstream JIRA since this was fixed upstream by MAPREDUCE-157 Reason: Thread leak reported on cdh-user Author: Todd Lipcon Ref: DISTRO-73 commit 5c54c0cae529a17fe30d17642b868f2609c0731b Author: Todd Lipcon Date: Thu Jan 13 11:54:25 2011 -0800 Amend MAPREDUCE-1784. Include TestIFile unit test Reason: missed in prior commit Author: Eli Collins Ref: CDH-862 commit b48ee52a2c451a673765c67141448fa9cdc7e37a Author: Todd Lipcon Date: Thu Jan 13 11:56:04 2011 -0800 HADOOP-7101. UserGroupInformation.getCurrentUser() fails when called from non-Hadoop JAAS context Reason: Hadoop access fails running from within JMX-created JAAS context Author: Todd Lipcon Ref: CDH-2525 commit 329ae61a7987d576c0d73a395f773fa820594ea4 Author: Eli Collins Date: Fri Jan 7 10:47:45 2011 -0800 HADOOP-7089. Fix link resolution logic in hadoop-config.sh. The link resolution logic in bin/hadoop-config.sh fails when when executed via a symlink, from the root directory. We can replace this logic with cd -P and pwd -P, which should be portable across Linux, Solaris, BSD, and OSX. Reason: Bug Author: Eli Collins Ref: DISTRO-9 commit 0f0f7b996033179d70f3750b3d1d0ff4a1b1aef3 Author: Eli Collins Date: Wed Jan 5 11:32:26 2011 -0800 CLOUDERA-BUILD. Fix documentation urls that use "current". Reason: Bug Author: Eli Collins Ref: CDH-2405 commit bd69ffce6f04c6d4f3685f55403b5d57191057d9 Author: Todd Lipcon Date: Tue Jan 11 15:53:13 2011 -0800 MAPREDUCE-1178. Fix ClassCastException in MultipleInputs by adding a DelegatingRecordReader. Reason: bug fix Author: Amareshwari Sriramadasu and Jay Booth. Ref: CDH-2513 commit 6ff69b095f390fba1e8ba3c315a93889a94de481 Author: Todd Lipcon Date: Tue Jan 11 15:56:18 2011 -0800 MAPREDUCE-655. Change KeyValueLineRecordReader and KeyValueTextInputFormat to use new mapreduce api. Reason: Required for MultipleInputs Author: Amareshwari Sriramadasu Ref: CDH-2513 commit c1ec4018591d3e2bbb6fa8f664f9355a76e94ad5 Author: Todd Lipcon Date: Tue Jan 11 15:48:01 2011 -0800 MAPREDUCE-369. Change org.apache.hadoop.mapred.lib.MultipleInputs to use new mapreduce API. Amended to not deprecate the old API. Reason: Customer request, low risk Author: Amareshwari Sriramadasu. Ref: CDH-2513 commit de6b20455e53435d6079b0ed9b0a005bc0c435ff Author: Konstantin Boudnik Date: Tue Jan 11 13:43:02 2011 -0800 HADOOP-7072 Remove java5 dependencies from build Description: Reason: test is affected. Author: cos Ref: CDH-2485 commit 5b2e26fd1cfa592931dc9606d6cb81aaf9a5712d Author: Tom White Date: Mon Jan 10 16:31:24 2011 -0800 HADOOP-5170. Reverted: "Set max map/reduce tasks on a per-job basis, either per-node or cluster-wide" Reason: Patch not accepted upstream. See MAPREDUCE-698 and MAPREDUCE-704. Author: Tom White Ref: CDH-789 commit 51a15afdd3f2b33e9c6573bfa9d002034edaaaf7 Author: Tom White Date: Mon Jan 10 13:49:10 2011 -0800 HADOOP-5476. calling new SequenceFile.Reader(...) leaves an InputStream open, if the given sequence file is broken Reason: Fix file handle leak, as requested on Hive list. Author: Michael Tamm Ref: DISTRO-28 commit bda05051c5ad4c56d210427bbe6445c3db66573e Author: Todd Lipcon Date: Fri Jan 7 14:20:01 2011 -0800 MAPREDUCE-2234. If Localizer can't create task log directory, it should fail on the spot. Reason: Make common source of support tickets easier to diagnose Author: Todd Lipcon Ref: CDH-2500 commit b57d9d0a60f8d871511750465ad94dd18a103656 Author: Todd Lipcon Date: Sat Dec 18 15:44:25 2010 -0800 MAPREDUCE-2219. Fix JT startup to not require mapred.system.dir inside a dir that it owns Reason: Easier permissions Author: Todd Lipcon Ref: CDH-2499 commit b5f1e39c0561d262829ae4cce546773a418db96e Author: Todd Lipcon Date: Fri Jan 7 14:07:54 2011 -0800 HADOOP-7070. Delegate calls up to parent UserGroupInformation Reason: Fix login behavior underneath glassfish or other JAAS-using containers Author: Todd Lipcon Ref: DISTRO-66 commit a3421bf550672c6615541e1f73a5e0add9fcc158 Author: Todd Lipcon Date: Fri Jan 7 13:59:13 2011 -0800 HDFS-1542. Add test for HADOOP-7082, a deadlock writing Configuration to HDFS. Author: Todd Lipcon Ref: CDH-2498 commit d0fcd663498ab6af0ae550ea6ace527ac7f7eae3 Author: Todd Lipcon Date: Fri Jan 7 13:56:43 2011 -0800 HADOOP-7082. Configuration.writeXML should not hold lock while outputting. Reason: Avoid deadlock submitting jobs Author: Todd Lipcon Ref: CDH-2498 commit 5d85605d7f324d9bb5751bf9e1733170dd97a911 Author: Tom White Date: Thu Jan 6 12:21:40 2011 -0800 CLOUDERA-BUILD. Part of MAPREDUCE-157 to fix doubly-escaped job history links Reason: Bug fix Author: Tom White Ref: CDH-2283 commit 73c38ae4211b732cae575d7f52f233e2cf6f909e Author: Todd Lipcon Date: Wed Jan 5 14:39:10 2011 -0800 MAPREDUCE-1734. Un-deprecate the old MapReduce API in the 0.20 branch. Reason: Old APIs will remain through at least 0.23 Author: Todd Lipcon Ref: CDH-2494 commit 4882770efb2a9eb52ae51d5b35e6ba3a2737c44e Author: Todd Lipcon Date: Sun Dec 19 17:38:57 2010 -0800 MAPREDUCE-1906. Allow heartbeat interval minimum to be configured Author: Todd Lipcon Ref: CDH-2319 commit f9f9182ecf6d208fd28b23941b5e851e1efedec7 Author: Eli Collins Date: Mon Jan 3 21:35:29 2011 -0800 HADOOP-6578. Configuration should trim whitespace around a lot of value types. Reason: Improvement Author: Michele Catasta Ref: CDH-2266 commit 4214d3e60326a9b41e84f85895aca325d634c304 Author: Konstantin Boudnik Date: Mon Dec 20 12:16:32 2010 -0800 CDH-2381. org.apache.hadoop.cli.TestCLI.testAll (from TestCLI) failing in golden CDH3-Hadoop Hudson job Description: Reason: test is affected. Author: cos Ref: CDH-2381 commit 4ad53f3de801a1a670d658d4d933d9576b99445c Author: Eli Collins Date: Fri Dec 17 00:00:55 2010 -0800 MAPREDUCE-1938. Ability for having user's classes take precedence over the system classes for tasks' classpath. It would be nice to have the ability in MapReduce to allow users to specify for their jobs alternate implementations of classes that are already defined in the MapReduce libraries. For example, an alternate implementation for CombineFileInputFormat. Reason: New feature Author: Devaraj Das Ref: DISTRO-64 commit b0ed02a3d621bbf994f8fb5dc1d86a451afe490d Author: Tom White Date: Mon Dec 13 15:00:20 2010 -0800 MAPREDUCE-1699. JobHistory shouldn't be disabled for any reason Reason: Bug Author: Arun C Murthy Ref: CDH-1691 commit ba2c7a5b99915ca1431e3024dd80ac359c8005a1 Author: Tom White Date: Tue Dec 14 17:41:09 2010 -0800 MAPREDUCE-1853. MultipleOutputs does not cache TaskAttemptContext Reason: Bug Author: Torsten Curdt Ref: CDH-2010 commit 43fb37a6b9693003cc9ea1161bc080e5309b1973 Author: Eli Collins Date: Thu Dec 9 08:54:22 2010 -0800 MAPREDUCE-1621. Streaming's TextOutputReader.getLastOutput throws NPE if it has never read any output. If TextOutputReader.readKeyValue() has never successfully read a line, then its bytes member will be left null. Thus when logging a task failure, PipeMapRed.getContext() can trigger an NPE when it calls outReader_.getLastOutput(). Reason: Bug Author: Amareshwari Sriramadasu Ref: CDH-855 commit af9ef1fcde9aa7ed6d84481837dd5c3e6e4ecc14 Author: Todd Lipcon Date: Tue Dec 7 18:29:42 2010 -0800 MAPREDUCE-1784. IFile should check for null compressor. Reason: Avoid NPE Author: Eli Collins Ref: CDH-862 commit 50796a1b13f77ef5c2e098f6a651bf52c05cd2f7 Author: Alejandro Abdelnur Date: Tue Nov 30 14:03:00 2010 +0800 CDH-2234 adding Oozie needed config to Hadoop config example-confs/ Description: adding Oozie needed config to Hadoop config example-confs/ Reason: to enable zero config for Oozie out of the box Author: Alejandro Ref: CDH-2234 commit 39b1d616bd1d7cf88cb057d1fd70b0d9b17a9992 Author: Todd Lipcon Date: Wed Nov 24 14:19:20 2010 -0800 Amend HADOOP-6978. Add AC_SYS_LARGEFILE to native build to fix issue with large files Author: Owen O'Malley Ref: CDH-2009 commit 552ebe400b6d94b02d8a3ffebb61b433f7e13aa1 Author: Eli Collins Date: Fri Nov 12 23:18:51 2010 -0800 HDFS-1250. Namenode accepts block report from dead datanodes. Reason: Bug Author: Suresh Srinivas Ref: CDH-2277 commit fa4ca629131059ade47618d0ed201c4ddc3abe72 Author: Eli Collins Date: Fri Nov 12 19:46:24 2010 -0800 HADOOP-6813. Add a new newInstance method in FileSystem that takes a "user" as argument. Reason: Improvement Author: Devaraj Das Ref: CDH-648 commit f8b9f2f2e062b33c752c53e5aa3f871f08fa359c Author: Eli Collins Date: Wed Nov 10 14:58:26 2010 -0800 HADOOP-6985. Suggest that HADOOP_OPTS be preserved in hadoop-env.sh.template. Reason: Improvement Author: Ramkumar Vadali Ref: CDH-2271 commit 78b9e608a82c69e59950d4be585fc17e79c8eeca Author: Eli Collins Date: Wed Nov 3 16:32:40 2010 -0700 CLOUDERA-BUILD. Remove the MySQL Connector/J library. See SQOOP-97. commit 835e4b2f8d5f5b8de9eaaf6b2585a62224574323 Author: Todd Lipcon Date: Tue Oct 19 15:19:34 2010 -0700 HDFS-1464. Fix reporting of 2NN address when dfs.secondary.http.address is default Reason: regression due to HDFS-1080 Author: Todd Lipcon Ref: CDH-2226 commit 62a9a1327165a1a363639c2f21b79be61616f7b3 Author: Todd Lipcon Date: Thu Oct 14 16:21:34 2010 -0400 HADOOP-6663. Fix decompression of empty compressed files Author: Kang Xiao Ref: CDH-2215 commit 98c55c28258aa6f42250569bd7fa431ac657bdbd Author: Todd Lipcon Date: Fri Oct 8 17:07:56 2010 -0700 CLOUDERA-BUILD. Fix ownership of .out files when starting daemons as root Author: Todd Lipcon commit 16ba98db9791a1a24aff066ae884c64abb4b589a Author: Todd Lipcon Date: Fri Oct 8 14:10:50 2010 -0700 CLOUDERA-BUILD. Use su instead of sudo for dropping root privileges. This fixes an issue on EC2, where some AMIs don't properly support sudo. commit 9616bfbd1f2dd2686a29f47c62fff08d955a7ac8 Author: Todd Lipcon Date: Thu Oct 7 23:11:25 2010 -0700 HADOOP-6995. Allow wildcards to be used in ProxyUsers configurations Author: Todd Lipcon Ref: CDH-648 commit 374e10963329ec08d861774d056d1c5ee673f4c8 Author: Todd Lipcon Date: Fri Oct 8 12:43:11 2010 -0700 Amend MAPREDUCE-2096. Fix IndexOutOfBoundsException truncating logs when tasks produced no log output Author: Todd Lipcon Ref: CDH-648 commit 49e808c8751615fe154061d456f171f8bb582504 Author: Todd Lipcon Date: Thu Oct 7 18:10:47 2010 -0700 CLOUDERA-BUILD. Add symlinks to built HADOOP_HOME like hadoop-core.jar -> hadoop-core-0.20.2+NNN.jar This helps other projects create symlinks into the installed hadoop-home without having to declare a dependency on a particular patchlevel of the jar. commit 4904e0fbd60c5f043bb1451ca4e3be012be8cf59 Author: Todd Lipcon Date: Thu Oct 7 14:50:46 2010 -0700 Amend HDFS-1260. Add some sanity checking on FSDataset Reason: Help debug errors seen in the wild Author: Todd Lipcon Ref: CDH-913 commit b919f0a99b2ac3b48a32b0906d19c2b306f7a554 Author: Todd Lipcon Date: Wed Oct 6 15:38:19 2010 -0700 CLOUDERA-BUILD. Don't use HADOOP_IDENT_STRING to set user This was a misuse of this variable - it should only determine the name of the log/pid files commit eed3bc71002f4cbf3fd0aaeef7016cb80cf61a4a Author: Todd Lipcon Date: Wed Oct 6 18:05:30 2010 -0700 CLOUDERA-BUILD. Amend bin/hadoop changes to properly start tasktracker and jobtracker with sudo commit 8bb561e0dc46995cca059b5de334b3b790b8ae17 Author: Todd Lipcon Date: Wed Oct 6 17:29:17 2010 -0700 Amend MAPREDUCE-2096. fsError() when called from within MR should not do authorization Reason: Fix incorrect authorization exception Author: Todd Lipcon / Devaraj Das Ref: CDH-648 commit ce67cd87f21543348ca5c137dee3ff0dc7f338dd Author: Aaron T. Myers Date: Tue Oct 5 21:03:50 2010 -0700 HADOOP-6988. Add support for reading multiple hadoop delegation token files Author: Aaron T. Myers Reason: So Hue can submit jobs authenticated against both the JT and NN. Ref: CDH-648 commit ca36717c2b3bc9d610ba2a049b98f798b9d8c1c1 Author: Todd Lipcon Date: Tue Oct 5 15:33:11 2010 -0700 CLOUDERA-BUILD. No need to restrict jsvc usage to secure clusters Reason: It is simpler to always start the DN as root and let it drop privileges when jsvc is available. This is OK even if kerberos auth is off. Author: Todd Lipcon commit 60a6eece06bde26516649bdcbed4096dd734503e Author: Todd Lipcon Date: Mon Oct 4 18:01:39 2010 -0700 CLOUDERA-BUILD. Send SecurityAudit logs to the console unless running through hadoop-daemon.sh Reason: Fixes issue where clients would try to write SecurityAuth.audit logs Author: Todd Lipcon commit dccf120c3796312b1a67481daaa0366b13d471fe Author: Todd Lipcon Date: Mon Oct 4 16:32:24 2010 -0700 CLOUDERA-BUILD. Amend Task Controller for sbin-located task-controller Reason: earlier commit moved task-controller to an sbin directory, this updates the java side Author: Todd Lipcon commit c7f9a8ece8b63fa571420b0c1e40044177b8e42d Author: Todd Lipcon Date: Mon Oct 4 16:11:04 2010 -0700 CLOUDERA-BUILD. Redo hadoop and hadoop-daemon.sh scripts to be more compatible with packaging Author: Todd Lipcon commit d56be41bb9648f721ba6714827ccfbf503af7d84 Author: Todd Lipcon Date: Mon Oct 4 12:32:09 2010 -0700 Amend MAPREDUCE-2103. task-controller does not require setgid permissions commit 7689035d99d720f374c543697016ef23fec7f4f8 Author: Todd Lipcon Date: Mon Oct 4 11:49:36 2010 -0700 CLOUDERA-BUILD. Update example secure config commit eebf85c655d085b5cc49860d5ac59078a99e2349 Author: Eli Collins Date: Mon Oct 4 12:59:03 2010 -0700 DISTRO-29. Switch Hue thrift plugin port to 10090 to avoid conflicting with HBase. Reason: Improvement Author: Eli Collins Ref: CDH-1815. commit a93572183d61bcc9523206450a017c8908795009 Author: Todd Lipcon Date: Sun Oct 3 22:52:24 2010 -0700 Amend MAPREDUCE-2096. Fix issue where JVM authorization was incorrectly triggered Reason: TaskRunner calls TaskTracker.reportDiagonsticInfo directly at one point, so the current user is the MR user, rather than the Job. This patch changes the TaskRunner to call to an unauthorized version of the function. Author: Todd Lipcon Ref: CDH-648 commit 7fb2c9a498db04a93aeee6fe7f2beb4abdf7489f Author: Todd Lipcon Date: Sun Oct 3 22:51:21 2010 -0700 MAPREDUCE-2103. task-controller shouldn't require o-r permissions Author: Todd Lipcon Ref: CDH-648 commit 9a17aaf708514474dff8be5706c798b4c1d5199f Author: Todd Lipcon Date: Sun Oct 3 18:00:24 2010 -0700 CLOUDERA-BUILD. jsvc and task-controller should install into a platform-specific dir commit 766c6c6e77514164afbd5f14ca171419106d93de Author: Todd Lipcon Date: Sun Oct 3 17:45:22 2010 -0700 CLOUDERA-BUILD. do-release-build should build task controller commit 81762d84ddc11fb5268c2ae92feb47d9e1197f1a Author: Eli Collins Date: Mon Sep 20 09:37:25 2010 -0700 HDFS-1377. Quota bug for partial blocks allows quotas to be violated. There may be a delta in FSDirectory#replaceNode even with identical blocks because INode#diskspaceConsumed rounds up the size of the last block if newnode is under construction. This causes us to incorrectly reduce the space consumed for quota accounting. Looking at uses of this functions oldnode and newnode should always have the same blocks, therefore we should not expect a delta here. Reason: Bug Author: Eli Collins Ref: CDH-2092 commit 78625c0dfb4e0f819f79ce29d215097e87790012 Author: Todd Lipcon Date: Wed Sep 29 22:44:25 2010 -0700 HADOOP-6408. Add a servlet at /conf to display running configuration Reason: Easier debugging and support Author: Todd Lipcon Ref: CDH-2175 commit 91fa1dfdd74ebac1e88da1d3adb644cf5fe84e7a Author: Todd Lipcon Date: Wed Sep 29 00:44:57 2010 -0700 HADOOP-6496. HttpServer sends wrong content-type for CSS files (and others) Author: Todd Lipcon Reason: Fixes styling on web UIs commit 9309cf6f1851cc1b379028235b79cc2cf9fe1774 Author: Bruno Mahé Date: Mon Sep 27 19:43:02 2010 -0700 DISTRO-38. Autotools cannot find libssl on fedora Description: Some GNU/Linux distribution have changed the DSO-linking semantics of the gcc compiler. Previously ld would attempt to implicitly satisfy link requirements and therefore implictely add libcrypto when linking to libssl. The dependency on libcrypto must now be explicitely stated on these platform when linking to libssl. See https://fedoraproject.org/wiki/Features/ChangeInImplicitDSOLinking and https://fedoraproject.org/wiki/UnderstandingDSOLinkChange Reason: Bug Author: Bruno Mahé Ref: DISTRO-38 commit daa2fd5e76c63c9d9efa11225383fd5496442862 Author: Bruno Mahé Date: Mon Sep 27 16:53:47 2010 -0700 CDH-2137. Jsvc requires to set the architecture flag to the link command Reason: Bug Author: Bruno Mahé commit 66e1ba8787ef26f68cc3ec125efd85a776748c36 Author: Todd Lipcon Date: Wed Sep 29 13:13:30 2010 -0700 Amend MAPREDUCE-2096. Rebootstrap native Reason: Previous libtoolize wasn't run with --copy, so broken link was in repo Author: Todd Lipcon Ref: CDH-648 commit 90167ef041f15f351ac6357212c477c682373e05 Author: Todd Lipcon Date: Mon Sep 27 16:38:35 2010 -0700 HADOOP-6907, HADOOP-6938, HADOOP-6905. Fix RPC client behavior to use a per-connection configuration. Author: Kan Zhang Ref: CDH-648 commit 3f2759c884c496ef71a75db9d436ebfe61e04111 Author: Todd Lipcon Date: Mon Sep 27 22:35:31 2010 -0700 MAPREDUCE-1288. DistributedCache may localize a private file for multiple users Reason: bug fix when multiple users add the same "private" file to their distributed caches Author: Devaraj Das Ref: CDH-648 commit c109efb9579e830587e5f7c1762c816d5d241b71 Author: Todd Lipcon Date: Mon Sep 27 16:10:43 2010 -0700 HDFS-1301. TestHDFSProxy needs to use the server side conf for ProxyUser settings. Reason: Fix failing unit test after HADOOP-6815 application Author: Boris Shkolnik Ref: CDH-648 commit a1cdd7b028bfd5aaf6bdbfe18122b9a0fb44ed12 Author: Todd Lipcon Date: Fri Sep 17 10:50:38 2010 -0700 CLOUDERA-BUILD. Upgrade Jackson to 1.5.2 to avoid conflicts with Avro and HBase Author: Todd Lipcon commit 11d842c61eb63c156e1c3f753d795868bbd2fa0a Author: Todd Lipcon Date: Thu Sep 23 13:03:53 2010 -0700 HADOOP-6815. refreshSuperUserGroupsConfiguration should use server side configuration for the refresh Author: Boris Shkolnik Ref: CDH-648 commit 6b3856a94ca4748cc8e891cdd11473c03f821ee4 Author: Todd Lipcon Date: Mon Sep 6 12:54:33 2010 -0700 MAPREDUCE-2096. Secure local filesystem IO from symlink vulnerabilities Reason: security vulnerability that could be exploited to gain access to other user's job credentials, task output, etc. Author: Todd Lipcon Ref: CDH-2009 commit 0b213def5dbb9dc7a90009a3446a913ea15f5ee7 Author: Aaron T. Myers Date: Fri Sep 24 11:47:16 2010 -0700 HADOOP-6951. Distinct minicluster services (e.g. NN and JT) overwrite each other's service policies Description: Make ServiceAuthorizationManager's map of service ACLs instance-specific, instead of static. Reason: To make HUE's tests work against CDH3. Author: Aaron T. Myers Ref: CDH-648 commit 85565602b4cebbd91829a0d434e86edd8990fcbc Author: Eli Collins Date: Mon Sep 20 22:14:32 2010 -0700 DISTRO-32. Make the default example configuration support Hue. Reason: Improvement Author: Eli Collins Ref: CDH-1815 commit 0248b41179a0baf9dd7e4120137f0c24b7251e95 Author: Eli Collins Date: Tue Sep 21 13:54:40 2010 -0700 DISTRO-1. Add /usr/lib/jvm/default-java to HADOOP_HOME detection. Reason: Improvement Author: Eli Collins Ref: CDH-1979 commit 55958019974d56fb1b66e209b49c22efe4a4aa95 Author: Todd Lipcon Date: Thu Sep 16 16:52:39 2010 -0700 CLOUDERA-BUILD. Change pom templates to use com.cloudera.hadoop groupId commit 6931d93bec73254f13ba08cbe49589a747eb399d Author: Todd Lipcon Date: Wed Sep 8 21:51:47 2010 -0400 HADOOP-6946. SecurityUtil's ticket-fetching should call UGI.getCurrentUser rather than directly accessing JAAS This fixes a bug where a daemon could call login() and thus set the loginUser(), and then still have a null Subject, leading to an inability to fetch TGTs. This impacted, for example, the "-checkpoint force" start-up option of the 2NN. Reason: Fix 2NN startup with forced checkpoint Author: Todd Lipcon Ref: CDH-648 commit 5fe725b1a48326bf606dadfc636586904aa861c4 Author: Todd Lipcon Date: Mon Sep 6 11:41:10 2010 -0700 HDFS-1378. Track and report file offsets in cases of edit log replay failure. Author: Todd Lipcon commit c3b6e1fadf01e1955fff7361cb7872ff4fd997ab Author: Todd Lipcon Date: Wed Sep 15 14:47:45 2010 -0700 Amend HADOOP-6656. Renewal thread should shut down if it fails to renew Reason: fixes tight infinite loop that heavily loads KDC Author: Todd Lipcon commit 593f3831671202afef2555243f37ca8f7ac2c46c Author: Todd Lipcon Date: Wed Sep 8 14:30:44 2010 -0700 Amend HDFS-895. Fix races between close() and sync() commit a9adf89fd17aa3199c4c4f26d7a2d5f8ccffc84d Author: Todd Lipcon Date: Fri Sep 10 16:15:12 2010 -0700 Amend HADOOP-6539. Fix docs to remove mention of sticky bit feature not backported Author: Todd Lipcon Ref: CDH-648 commit 4da1b0da8e176f6d7cf5bdc13786f37e254b6eda Author: Todd Lipcon Date: Fri Sep 10 15:45:34 2010 -0700 HDFS-1387. Update HDFS permissions guide to reflect security Reason: documentation Ref: CDH-648 Author: Todd Lipcon commit 10db8fc860cd1c5de28d204b0efecb37476f0483 Author: Todd Lipcon Date: Thu Sep 16 15:04:18 2010 -0700 HDFS-1404. Incorrect logic in TestNodeCount causes test failures Reason: Fix occasional red build Author: Todd Lipcon commit 283d6b8d3d1c0ffece93bdf4046b09972b0f44a3 Author: Eli Collins Date: Thu Sep 16 00:05:09 2010 -0700 HADOOP-6950. Suggest that HADOOP_CLASSPATH should be preserved in hadoop-env.sh.template. Reason: Improvement Author: Philip Zeyliger Ref: CDH-2135 commit b7679d80577d1d3625f520fc01787b4f75faab1d Author: Todd Lipcon Date: Wed Sep 15 22:43:18 2010 -0700 MAPREDUCE-2073. TestTrackerDistributedCacheManager should be explicit about test environment requirements Reason: Assist testing Author: Todd Lipcon Ref: CDH-648 commit 892b49d1fd8725323dfbbb19269ec16debe05c57 Author: Eli Collins Date: Wed Sep 15 20:08:16 2010 -0700 HDFS-1267. fuse-dfs does not compile. Reason: Bug Author: Devaraj Das Ref: CDH-2134 commit 98f1914cc0c6f91f8c0e3aa8cea8e7609b49c901 Author: Eli Collins Date: Wed Sep 15 20:00:17 2010 -0700 HDFS-1000. Updates libhdfs to the new API for UGI. Reason: Bug Author: Devaraj Das Ref: CDH-648 commit 5966f146fdb0202c0ffd66d3ec3f0c7c4def6afe Author: Eli Collins Date: Wed Sep 15 19:55:36 2010 -0700 Revert "HDFS-1000. libhdfs needs to be updated to use the new UGI" Description: This is being reverted to apply a newer version of the patch. Author: Devaraj Das Ref: UNKNOWN commit d0b28bf2a7ebeff419c7226310aaff7de290af22 Author: Eli Collins Date: Fri Sep 10 09:41:32 2010 -0700 HADOOP-6881. The efficient comparators aren't always used except for BytesWritable and Text. Reason: Bug Author: Owen O'Malley Ref: CDH-2112 commit cdb501c28dcdeec73ccf92a886bf943f665a5693 Author: Todd Lipcon Date: Fri Sep 3 17:09:58 2010 -0700 HDFS-446. Improvements to Offline Image Viewer. Author: Jakob Homan Ref: CDH-2106 commit 83da6170d68e29c1ae7881c2606af59a2145a8aa Author: Todd Lipcon Date: Fri Sep 3 17:09:05 2010 -0700 HDFS-461. Tool to analyze file size distribution in HDFS. Author: Konstantin Shvachko Ref: CDH-2106 commit dc0e28f08c2df37a6b99614a5c764fc4037032a0 Author: Todd Lipcon Date: Fri Sep 3 17:03:28 2010 -0700 HADOOP-5752. Add a new hdfs image processor, Delimited, to oiv. Author: Jakob Homan Reason: Hue Headlamp app Ref: CDH-2106 commit 7362cada95bd07ff3b034f5c7fb15b42365c2d06 Author: Todd Lipcon Date: Fri Sep 3 16:55:59 2010 -0700 HADOOP-5467. Add offline image viewer tool for HDFS filesystem images Author: Jakob Homan Reason: Necessary for Hue Headlamp application Ref: CDH-2106 commit b94821f874983e64c78fc93d95539a4f262dca78 Author: Todd Lipcon Date: Fri Sep 3 15:56:19 2010 -0700 HADOOP-6939. Fix inconsistent lock ordering in AbstractDelegationTokenSecretManager Reason: Fix potential deadlock Author: Todd Lipcon Ref: CDH-648 commit ae58865f3d65faa78707c79536c16e5b7ce40c16 Author: Todd Lipcon Date: Thu Sep 2 16:44:47 2010 -0700 MAPREDUCE-2051. Add a fair share scheduler system test Reason: Helps identify deadlocks or races in fair scheduler Author: Todd Lipcon Ref: CDH-1823 commit b839ebbb2f517eb57930dbe8ed40afc5307dbe3a Author: Eli Collins Date: Thu Sep 2 16:07:47 2010 -0700 MAPREDUCE-1280. Eclipse Plugin does not work with Eclipse Ganymede (3.4). Reason: Bug Author: Alex Kozlov Ref: CDH-537 commit 62661841b0687c431f4a066323b8ebb959b90612 Author: Todd Lipcon Date: Wed Sep 1 09:51:41 2010 -0700 DISTRO-27. Fix CombineFileInputFormat incompatible API change - Revert CombineFileInputFormat to branch-0.20 r990003 - Reapply following patches to old-API CombineFileInputFormat: - MAPREDUCE-1480. Apply more correct progress indication to old-API CombineFileRecordReader - MAPREDUCE-1423. Improve performance of CombineFileInputFormat when multiple pools are configured - Resuscitate old-API test for CombineFileInputFormat Author: Todd Lipcon et al Reason: Fix hive integration issue Ref: DISTRO-27 commit e8d93d35b92d602d8095657bb08a949bfb5aeea8 Author: Eli Collins Date: Mon Aug 30 11:14:20 2010 -0700 HADOOP-5861. s3n files are not getting split by default. Reason: Bug Author: Tom White Ref: CDH-2011 commit 527b0ee624e8a02b357f2a2a1a31fa798f832d35 Author: Eli Collins Date: Fri Aug 27 16:26:17 2010 -0700 HADOOP-6925. BZip2Codec incorrectly implements read(). Description: HADOOP-4012 added an implementation of read() in BZip2InputStream that doesn't work correctly when reading bytes > 0x80. This causes EOFExceptions when working with BZip2 compressed data inside of sequence files in some datasets. Reason: Bug Author: Todd Lipcon Ref: CDH-2068 commit 8f374b1eff2a54fd05590b935c3179c9b686fc0b Author: Eli Collins Date: Fri Aug 27 09:30:42 2010 -0700 HADOOP-6928. Fix BooleanWritable comparator in 0.20. Description: The RawComparator for BooleanWritable was fixed as part of HADOOP-5699 in 0.21 and trunk. The fix should be pushed back into 0.20. Reason: Bug Author: Owen O'Malley Ref: CDH-2063 commit 0dee7a8262a12b12e448a4342b636842646c16d0 Author: Eli Collins Date: Thu Aug 26 20:15:54 2010 -0700 HADOOP-6833. IPC leaks call parameters when exceptions thrown. Reason: Bug Author: Todd Lipcon Ref: CDH-2063 commit e7c81789d095a30fb8abf93557d10b84ea66eaea Author: Todd Lipcon Date: Mon Jun 28 13:37:33 2010 -0700 CLOUDERA-BUILD. Add sample configuration for a secure cluster based on YDH's sample Ref: CDH-648 commit fc5270e00c648eb20737918eb689a0d4c4200e98 Author: Todd Lipcon Date: Wed Aug 25 14:14:13 2010 -0700 Amend HDFS-1260. Fix case where FSDataset's volume map could become inconsistent with disk storage commit 5e76abac366a112c5d221750332ba2c272f319d0 Author: Todd Lipcon Date: Wed Aug 25 12:14:02 2010 -0700 MAPREDUCE-2034. Fix TestSubmitJob to verify actual IOException text commit ab1f3a96c00e2eb53569c1ba682d73ed10bfb4b5 Author: Todd Lipcon Date: Wed Aug 25 11:33:35 2010 -0700 Amend HADOOP-6762. Fix gridmix test failures when JobMonitor RPC is interrupted Reason: HADOOP-6762 added a new exception cause when outbound RPCs are Interrupted. This patch fixes gridmix to be aware of InterruptedExceptions. commit ecc1a3b745384b0f925cb6efc7b6775240ad9195 Author: Todd Lipcon Date: Mon Aug 2 17:47:05 2010 -0700 HDFS-1164. Fix problem in TestHdfsProxy when user running tests doesn't belong to 'users' group Author: Todd Lipcon Reason: fix broken unit test Ref: CDH-648 commit bef7c171a5fd2663b5d16bdbba4477ee54947df6 Author: Todd Lipcon Date: Mon Aug 2 17:42:36 2010 -0700 HDFS-1313. HdfsProxy changes from HDFS-481 missed in y20.1xx Author: Rohini Palaniswamy Reason: Changes accidentally ommitted from HDFS-481 YDH backport, fixes hdfsproxy Ref: YDH commit 67048e890eff6c9cd548dcdc980f5ff3072234cc Author: Todd Lipcon Date: Fri Jul 2 22:53:21 2010 -0700 MAPREDUCE-1682. Fix speculative execution to ensure tasks are not scheduled after job failure. Author: Arun C Murthy Reason: Fixes potential wasted task slots Ref: YDH commit d9b7bd0ff1b74a579761d1bd8d9130c7adb9e80c Author: Todd Lipcon Date: Fri Jul 2 16:49:11 2010 -0700 MAPREDUCE-1914. Ensure unique sub-directories for artifacts in the DistributedCache are cleaned up. Author: Dick King Reason: Without patch, distributed cache accumulates directories until reaching dirent limit (32K) after which the TT fails. Ref: YDH commit eb44564b61a0467aa2891fd3a434eda20ac30d7b Author: Todd Lipcon Date: Fri Jul 2 16:34:35 2010 -0700 MAPREDUCE-1538. Add a limit on the number of artifacts in the DistributedCache to ensure we cleanup aggressively. Author: Scott Chen Reason: Without patch, subdirectory count in cache grows without bound. Ref: YDH commit f9051921efb8d76b0dcd0eed27fd15600635caf0 Author: Todd Lipcon Date: Tue Jul 27 19:08:31 2010 -0700 MAPREDUCE-2035. Fix task controller build to use -Wall, fix warnings commit 9d3d402301267201d771becef005864e48ea5b82 Author: Todd Lipcon Date: Mon Jun 28 12:05:46 2010 -0700 MAPREDUCE-1900. MapReduce daemons should close FileSystems that are not needed anymore Patch: https://issues.apache.org/jira/secure/attachment/12448230/mapred-fs-close.patch Patch: https://issues.apache.org/jira/secure/attachment/12448509/fs-close-delta.patch Author: Kan Zhang Reason: Secured MR daemons often open DFS instances on behalf of a given user, which then end up stored in the FS Cache data structure. This patch allows those cache entries to be collected, preventing possible OOME scenario. Ref: CDH-648 commit 945dc2bdacb99855b66ce70b3024a6f0f8b9f2d6 Author: Todd Lipcon Date: Wed Jun 23 14:37:10 2010 -0700 HADOOP-6832. Add a static user plugin for web auth for external users. Author: Owen O'Malley Reason: Security Ref: CDH-648 commit 00b53896f9de47f36fe8ea5a4ffaa13a85877a3c Author: Todd Lipcon Date: Thu Apr 29 12:51:22 2010 -0700 HDFS-1007. Update HFTPFileSystem to use delegation tokens to support security. Patch: https://issues.apache.org/jira/secure/attachment/12443223/1007-bugfix.patch Patch: https://issues.apache.org/jira/secure/attachment/12446280/hdfs-1007-long-running-hftp-client.patch Patch: https://issues.apache.org/jira/secure/attachment/12446362/hdfs-1007-securityutil-fix.patch Author: Devaraj Das Reason: Security Ref: CDH-648 commit 221b3e83ec620bb4903946574fe0b250db58fc8a Author: Todd Lipcon Date: Fri May 28 15:14:24 2010 -0700 HDFS-1178. The NameNode servlets should not use RPC to connect to the NameNode. Author: Owen O'Malley Reason: Cleanup Ref: YDH commit 8de996f7fac526c605e1931744e7937f90471e88 Author: Todd Lipcon Date: Thu May 20 23:04:35 2010 -0700 MAPREDUCE-1807. Re-factor TestQueueManager to not timeout Author: Dick King Reason: Fix failing unit test Ref: YDH commit f831304f9adfd7668283310e73ea66185674adc6 Author: Todd Lipcon Date: Thu May 20 11:51:43 2010 -0700 HADOOP-6781. Security audit log shouldn't have exceptions in it. Patch: https://issues.apache.org/jira/secure/attachment/12445092/HADOOP-6781-BP20.patch Author: Boris Shkolnik Ref: YDH commit 5459775249f78827a23863322002f5b0695a04d7 Author: Todd Lipcon Date: Wed May 19 14:08:04 2010 -0700 HADOOP-6776. UserGroupInformation.createProxyUser's javadoc is broken Patch: https://issues.apache.org/jira/secure/attachment/12444980/6776.patch Author: Devaraj Das Ref: YDH commit 07b56fcda093c41a142668171cf7bc953c9e4db8 Author: Todd Lipcon Date: Tue May 18 17:01:12 2010 +0530 Amend MAPREDUCE-1664. Bug fix to enable queue admins to view jobs. Patch: https://issues.apache.org/jira/secure/attachment/12444782/1664.qAdminsJobView.20S.v1.6.patch. Author: Ravi Gummadi Reason: bug fix to prior patch commit e81e1a34349a1d6a35faddeed4d7ff2087a6a48c Author: Todd Lipcon Date: Mon May 17 10:56:25 2010 -0700 HDFS-1157. Modifications introduced by HDFS-1150 are breaking aspect's bindings Patch: https://issues.apache.org/jira/secure/attachment/12444716/hdfs-1157.patch Author: Konstantin Boudnik Ref: YDH commit a8b230d68070e829a2717805c8d3f7c995bf0ae0 Author: Todd Lipcon Date: Fri May 14 22:05:38 2010 -0700 HDFS-1130. Authorize access to default HDFS servlets with a DFS administrator ACL Patch: https://issues.apache.org/jira/secure/attachment/12444565/hdfs-1130.3.patch Author: Devaraj Das Ref: CDH-648 commit f87ec798d3f48c701cdb24372ad15e7d269e580f Author: Todd Lipcon Date: Tue Jul 27 16:59:02 2010 -0700 Amend HDFS-1150. Allow the requirement of DNs on low ports to be relaxed by a config Reason: simplifies testing, and allows running with other security methods Author: Todd Lipcon Ref: CDH-648 commit 8ad787821eaf906170fc913c5030b147b0bd6e80 Author: Todd Lipcon Date: Fri May 14 16:37:07 2010 -0700 HDFS-1150. Let DataNodes bind to privileged ports in order to verify their identity better to clients Patch: https://issues.apache.org/jira/secure/attachment/12444541/HDFS-1150-Y20S-ready-8.patch Patch: https://issues.apache.org/jira/secure/attachment/12444811/hdfs-1150-bugfix-1.patch Patch: https://issues.apache.org/jira/secure/attachment/12444864/hdfs-1150-bugfix-1.2.patch Patch: https://issues.apache.org/jira/secure/attachment/12445111/HDFS-1150-BF-Y20-LOG-DIRS-2.patch Author: Jakob Homan Reason: The DataXceiverProtocol does not provide mutual authentication. Binding the DNs to a low port number makes it harder for an attacker to impersonate a DN. Ref: CDH-648 commit 20f55449358f96fd20f74f3f92c24dce763158e1 Author: Todd Lipcon Date: Fri May 14 12:58:38 2010 +0530 MAPREDUCE-1716. Truncate logs of finished tasks to prevent node thrash due to excessive logging Patch: https://issues.apache.org/jira/secure/attachment/12444476/patch-log-truncation-bugs-20100514.txt Author: Vinod K V Ref: YDH commit 055c06d1fba4cead09fe70e1cc56873f51927dfb Author: Todd Lipcon Date: Thu May 13 23:54:54 2010 -0700 MAPREDUCE-1442. Fixed regex in job-history related to parsing Counter values. Patch: https://issues.apache.org/jira/secure/attachment/12444349/mr-1442-y20s-v1.patch Author: Luke Lu Reason: Avoid StackOverflowError when JobHistory parses a really long line Ref: YDH commit 24a779aa3f4ee27c596f26c0a524433422c91689 Author: Todd Lipcon Date: Thu May 13 19:08:04 2010 -0700 HADOOP-6760. WebServer shouldn't increase port number in case of negative port setting caused by Jetty's race Patch: https://issues.apache.org/jira/secure/attachment/12444455/HADOOP-6760.0.20.patch Author: Konstantin Boudnik Ref: YDH commit a1bd71986a43df303166c7a6a3bf6d3e38d2f908 Author: Todd Lipcon Date: Thu May 13 14:04:39 2010 -0700 HDFS-1146. Add Javadoc for getDelegationTokenSecretManager in FSNamesystem Patch: https://issues.apache.org/jira/secure/attachment/12444261/HDFS-1146-y20.1.patch Author: Jitendra Nath Pandey Ref: CDH-648 commit 7a38053228f4b8368c183a6658d93df2a216d4ab Author: Todd Lipcon Date: Sun May 9 09:19:41 2010 -0700 MAPREDUCE-1744. Fixed DistributedCache apis to take a user-supplied FileSystem to allow for better proxy behaviour for Oozie. Amended to not deprecate any methods, since their future in the next major release has not been decided yet. Patch: https://issues.apache.org/jira/secure/attachment/12444060/MAPREDUCE-1744.patch Author: Dick King commit a696ed00e06ec9cac5b0f0e53a7fc6bcafb6c69f Author: Todd Lipcon Date: Sun May 9 01:30:36 2010 -0700 MAPREDUCE-1733. Authentication between pipes processes and java counterparts. Patch: https://issues.apache.org/jira/secure/attachment/12444054/MR-1733-y20.3.patch Author: Jitendra Nath Pandey Reason: Security Ref: CDH-648 commit 37bbc27c772ba033670be8d6323ca1a9191d34a7 Author: Todd Lipcon Date: Fri May 7 16:23:32 2010 -0700 HADOOP-6756. Clean up and document configuration keys in CommonConfigurationKeys.java Patch: https://issues.apache.org/jira/secure/attachment/12444008/jira.HADOOP-6756-0.20-1.patch Patch: https://issues.apache.org/jira/secure/attachment/12444017/jira.HADOOP-6756-0.20-1-FS_DEFAULT_NAME_KEY.patch Author: Erik Steffl Ref: YDH commit 5127aafd818ec2c57d02481f221bff3534e12d14 Author: Todd Lipcon Date: Fri May 7 15:08:14 2010 -0700 HDFS-1136. FileChecksumServlets.RedirectServlet doesn't carry forward the delegation token Patch: https://issues.apache.org/jira/secure/attachment/12443986/HDFS-1136-BP20-2.patch Author: Boris Shkolnik Reason: Security Ref: CDH-648 commit 6951f1e5bbac61b1a1cc3587b712598ed993c729 Author: Todd Lipcon Date: Fri May 7 23:50:08 2010 +0530 MAPREDUCE-1759. Improve exception message for unauthorized user doing killJob, killTask, setJobPriority Patch: https://issues.apache.org/jira/secure/attachment/12443983/1759.20S.1.patch. Author: Ravi Gummadi Reason: Security Ref: CDH-648 commit 8a02b7518c90634ec256fa836757919f600fa0e9 Author: Todd Lipcon Date: Fri May 7 23:13:24 2010 +0530 HADOOP-6715. Fix AccessControlList.toString behavior when ACL is set to "*" Patch: https://issues.apache.org/jira/secure/attachment/12443982/6715.20S.6.patch Author: Ravi Gummadi Reason: Security Ref: CDH-648 commit 057e7fc4942d0aa9a36f5b0ae2dd73d516fcf8ba Author: Todd Lipcon Date: Fri May 7 13:15:22 2010 +0530 HADOOP-6757. Fix NPE when streaming jobs launch further Hadoop clients Patch: https://issues.apache.org/jira/secure/attachment/12443934/BZ-3620565-v1.0.patch Author: Amar Kamat Ref: YDH commit b24898d75d8239697d215475dfe1b305ada90e5f Author: Todd Lipcon Date: Fri May 7 12:37:30 2010 +0530 HADOOP-6631. FileUtil.fullyDelete should continue deleting after partial failure Patch: https://issues.apache.org/jira/secure/attachment/12443931/HADOOP-6631-20100506-ydist.final.txt Author: Ravi Gummadi Ref: YDH commit 2b0d6bb28d3733a2c42ebbbb4bfc294044c8e619 Author: Todd Lipcon Date: Fri May 7 11:52:42 2010 +0530 MAPREDUCE-1754, HADOOP-6748. Replace mapred.permissions.supergroup with an ACL instead of single group Patch: https://issues.apache.org/jira/secure/attachment/12443928/patch-1754-ydist.txt. Author: Amareshwari Sriramadasu Reason: Security Ref: CDH-648 commit cf50e5a91e7159f7f06518162a7400cc681c7f08 Author: Todd Lipcon Date: Thu May 6 11:27:20 2010 -0700 HADOOP-6701. Incorrect exit codes for "dfs -chown", "dfs -chgrp" Patch: https://issues.apache.org/jira/secure/attachment/12442987/HADOOP-6701-v20.patch Author: Ravi Phulari Ref: YDH commit 463557b922ac3579aa130d01f780ab3c2e32b70f Author: Todd Lipcon Date: Wed May 5 23:38:06 2010 +0000 HADOOP-6640. FileSystem.get() does RPC retries within a static synchronized block Patch: https://issues.apache.org/jira/secure/attachment/12443759/getFS_yahoo20s.patch Author: Hairong Kuang Reason: Fixes potential performance issue in multithreaded environment Ref: YDH commit 781ae842245a6fb948de72d39658b28eab7c2cfe Author: Todd Lipcon Date: Wed May 5 14:09:32 2010 -0700 HDFS-1006. Secure image transfer between NN and 2NN Patch: https://issues.apache.org/jira/secure/attachment/12443766/hdfs-1006-bugfix-1.patch Author: Boris Shkolnik Reason: Security Ref: CDH-648 commit 497fefb3b9132b912a569577bc643c20a273707e Author: Todd Lipcon Date: Wed May 5 09:15:13 2010 -0700 HADOOP-6745. Add JavaDoc to Server.RpcMetrics, UGI Patch: https://issues.apache.org/jira/secure/attachment/12443726/HADOOP-6745-BP20-2.patch Author: Boris Shkolnik Reason: Security Ref: CDH-648 commit 51d7be14f892334ea1ad399a34d91ea1bd10e804 Author: Todd Lipcon Date: Wed May 5 11:00:56 2010 +0530 MAPREDUCE-1707. Fix potential NPE in TaskRunner Patch: https://issues.apache.org/jira/secure/attachment/12443680/MAPREDUCE-1707-20100504-ydist.txt Author: Vinod K V Ref: YDH commit 2647aae4985ca29e222567d8bfc77e2569fde81c Author: Todd Lipcon Date: Tue May 4 19:14:10 2010 +0000 HDFS-1104. Change fsck to not update block access times Patch: https://issues.apache.org/jira/secure/attachment/12443523/fsckATime_Yahoo0.20.patch Author: Hairong Kuang Reason: prevents a possible NN OOME during fsck Ref: YDH commit 392bd0e68be1621406d3556303c30f00d6dfd019 Author: Todd Lipcon Date: Mon May 3 18:58:34 2010 -0700 Amend HADOOP-6332. Add large-scale test framework "Herriot" that runs against real clusters. Reason: Phase two of Herriot framework Patch: https://issues.apache.org/jira/secure/attachment/12443539/6332-phase2.patch Patch: https://issues.apache.org/jira/secure/attachment/12443668/6332-phase2.fix1.patch Patch: https://issues.apache.org/jira/secure/attachment/12443788/6332-phase2.fix2.patch Author: Konstantin Boudnik Ref: YDH commit c04645dd48af89b8191e090efd15f8520946a606 Author: Todd Lipcon Date: Fri Apr 30 14:34:53 2010 -0700 HADOOP-6693. Add metrics to track kerberos login activity Patch: https://issues.apache.org/jira/secure/attachment/12443326/HADOOP-6693.rel20.1.patch Author: Suresh Srinivas Reason: Security Ref: YDH commit 4bce823b6ae54b527cfa25e7802e9a1c83f5b5d2 Author: Todd Lipcon Date: Thu Apr 29 14:19:05 2010 -0700 HADOOP-6710. Symbolic umask for file creation is not consistent with posix Patch: https://issues.apache.org/jira/secure/attachment/12443134/hadoop-6710.rel20.patch Author: Suresh Srinivas Ref: YDH commit e34d5e43768d0f91c0eb847071de7f9b8ec5b323 Author: Todd Lipcon Date: Mon Feb 22 14:24:21 2010 +0530 MAPREDUCE-670 and HDFS-1022. Add a fast "commit test" target Patch: https://issues.apache.org/jira/secure/attachment/12436553/mapreduce-670-y20.patch Author: Jothi Padmanabhan Ref: YDH commit 60b22e764e8d44570f900632b30b37df22855d0d Author: Todd Lipcon Date: Tue Apr 27 23:46:55 2010 -0700 MAPREDUCE-1711. Gridmix should provide an option to submit jobs to the same queues as specified in the trace. Patch: https://issues.apache.org/jira/secure/attachment/12443040/MR-1711-yhadoop-20-1xx-7.patch. Author: rahul k singh Ref: YDH commit 24093ee575601dc3cfbb67ac63d742fab1f40f2f Author: Todd Lipcon Date: Tue Apr 27 14:52:55 2010 -0700 MAPREDUCE-1687. Stress submission policy does not always stress the cluster. (htang) Patch: https://issues.apache.org/jira/secure/attachment/12442692/mr-1687-yhadoop-20.1xx-20100423-2.patch. Author: rahul k singh Ref: YDH commit 41079a8480984ef8ac84dfe8c97930f0631afc07 Author: Todd Lipcon Date: Sun Apr 25 12:06:08 2010 -0700 MAPREDUCE-1641. Fix DistributedCache to ensure same files cannot be put in both the archives and files sections. Author: Dick King Ref: YDH commit 13c09b6981b1f5bd8e58b3586d67e4652ab716c8 Author: Todd Lipcon Date: Sat Apr 24 00:22:59 2010 +0530 MAPREDUCE-1664. Fix job and queue ACLs to interact in a more useful manner. Patch: https://issues.apache.org/jira/secure/attachment/12442697/1664.20S.3.4.patch Patch: https://issues.apache.org/jira/secure/attachment/12443139/M1664y20s-testfix.patch Patch: https://issues.apache.org/jira/secure/attachment/12444043/mr-1664-20-bugfix.patch Author: Ravi Gummadi Reason: Security Ref: CDH-648 commit 2d09f15358560a61169e64c488f5e6fa7aff1d7f Author: Todd Lipcon Date: Fri Apr 23 15:46:55 2010 +0530 MAPREDUCE-1397. Fix a possible NPE when a killTask command races with a JVM exit Patch: https://issues.apache.org/jira/secure/attachment/12442657/patch-1397-ydist.txt Author: Amareshwari Sriramadasu Ref: YDH commit 7c0d1d3221c5558979942f8aaba3f14f29c5f7b4 Author: Todd Lipcon Date: Fri Apr 9 15:38:50 2010 -0700 HADOOP-6670. Use the UserGroupInformation's Subject as the criteria for equals and hashCode. Author: Owen O'Malley Reason: Security bug fix Ref: CDH-648 commit 82b157a5b805c10485712ddb108aa8248ad0df0c Author: Todd Lipcon Date: Thu Apr 22 10:13:04 2010 -0700 HADOOP-6716. System won't start in non-secure mode when kerb5.conf (edu.mit.kerberos on Mac) is not present Patch: https://issues.apache.org/jira/secure/attachment/12442487/HADOOP-6716-BP20-3.patch Author: Boris Shkolnik Ref: CDH-648 commit d8b98cd21187c2d012f022f5fbd3511281f4adbf Author: Todd Lipcon Date: Thu Apr 22 16:47:53 2010 +0530 MAPREDUCE-1607. Fix possible cleanup task failure in LinuxTaskController Patch: https://issues.apache.org/jira/secure/attachment/12442538/patch-1607-ydist.txt Author: Amareshwari Sriramadasu Ref: CDH-648 commit d8fe4d58aa93d151e55cd3e427a4b90ce470e7c9 Author: Todd Lipcon Date: Thu Apr 22 10:23:26 2010 +0530 MAPREDUCE-1533. JobTracker performance improvements. Author: Dick King Reason: Simple CPU usage optimizations in JT and Capacity Scheduler Ref: YDH commit 1b1896055e04da90194893c90ac7e2a3787e5077 Author: Todd Lipcon Date: Tue Apr 20 10:55:56 2010 -0700 MAPREDUCE-1701. Fix a problem with exception handling in delegation token renewals. Patch: https://issues.apache.org/jira/secure/attachment/12442239/MAPREDUCE-1701-BP20-1.patch Author: Boris Shkolnik Ref: CDH-648 commit ea5f8c7922fc5a66e26deb4da55c6088b006e1f0 Author: Todd Lipcon Date: Mon Apr 19 16:38:48 2010 -0700 HDFS-1096. Allow refresh of superuser proxy group mappings Patch: https://issues.apache.org/jira/secure/attachment/12442244/HDFS-1096-BP20-7.patch Author: Boris Shkolnik Ref: CDH-648 commit 877288c5ab55849a08a89ce342a8d5984f18a6df Author: Todd Lipcon Date: Tue Apr 20 01:21:33 2010 +0530 HDFS-1012. HDFSProxy support for fully qualified HDFS path in addition to simple unqualified path Patch: https://issues.apache.org/jira/secure/attachment/12441034/HDFS-1012-bp-y20s.patch Author: Srikanth Sundarrajan Ref: YDH commit 4aa8da2788a0794be5469227140509cc87d29c47 Author: Todd Lipcon Date: Tue Apr 20 01:13:14 2010 +0530 HDFS-1011. Improve Logging in HDFSProxy to include cluster name associated with the request Patch: https://issues.apache.org/jira/secure/attachment/12441031/HDFS-1011-bp-y20s.patch Author: Ramesh Sekaran Ref: YDH commit 689eb75cdd88b4b7a080ab3883f2a317cfb2c664 Author: Todd Lipcon Date: Tue Apr 20 01:08:10 2010 +0530 HDFS-1010. HDFSProxy: Retrieve group information from UnixUserGroupInformation instead of LdapEntry Patch: https://issues.apache.org/jira/secure/attachment/12439437/HDFS-1010-bp-y20s.patch Author: Srikanth Sundarrajan Ref: YDH commit cb39a8280712bb507e159151de5e62895d79268d Author: Todd Lipcon Date: Tue Apr 20 01:01:07 2010 +0530 HDFS-481. Allow HdfsProxy to securely impersonate the real user Patch: https://issues.apache.org/jira/secure/attachment/12442210/HDFS-481-NEW.patch Patch: https://issues.apache.org/jira/secure/attachment/12442280/HDFS-481-bp-y20s.patch Author: Srikanth Sundarrajan Ref: CDH-648 commit 0bb2bf837469bdafa5df1b14ed1fb2991070c9d0 Author: Todd Lipcon Date: Mon Apr 19 13:15:50 2010 +0530 MAPREDUCE-1657. Fix incorrect error message when trying to view already-deleted logs of a task. Patch: https://issues.apache.org/jira/secure/attachment/12442135/MR1657.20S.1.patch Author: Ravi Gummadi Ref: CDH-648 commit 1beeed7552f5e83ef24f8926450bb81d7be02b8d Author: Todd Lipcon Date: Mon Apr 19 11:00:05 2010 +0530 MAPREDUCE-1692. Remove TestStreamedMerge from the streaming tests Patch: https://issues.apache.org/jira/secure/attachment/12442134/patch-1692-ydist.txt. Author: Amareshwari Sriramadasu Reason: Test no longer applicable Ref: YDH commit eb456ec5bc5a3afca9a854e9f4708892c8a6e2f5 Author: Todd Lipcon Date: Fri Apr 16 17:35:14 2010 -0700 HDFS-1081. Improve performance of block access token implementation Patch: https://issues.apache.org/jira/secure/attachment/12442023/HADOOP-1081-Y20-2.patch Reason: Reduce number of calls to expensive HMAC functions to reduce NN CPU usage Author: Jakob Homan Ref: CDH-648 commit 457cf14f7168bacd185c03c9afc8527210e06410 Author: Todd Lipcon Date: Fri Apr 16 14:35:38 2010 -0700 MAPREDUCE-1656. JobStory should provide queue info. Patch: https://issues.apache.org/jira/secure/attachment/12441905/mr-1656-yhadoop-20.1xx.patch. Author: Hong Tang Ref: YDH commit c2b68855b127c7dc532ce836fa60dc5c1836f6ec Author: Todd Lipcon Date: Fri Apr 16 14:10:31 2010 -0700 MAPREDUCE-1317. Reducing memory consumption of rumen objects. Contributed by Hong Tang. Patch: https://issues.apache.org/jira/secure/attachment/12442004/mapreduce-1317-yhadoo-20.1xx.patch. Patch: https://issues.apache.org/jira/secure/attachment/12443927/3623945-yahoo-20-1xx.patch Author: Hong Tang Ref: YDH commit cbc852fdeccf007260c6e81a15280272a6f90def Author: Todd Lipcon Date: Fri Apr 16 12:45:26 2010 -0700 HADOOP-6706. Improve relogin behavior for RPC clients Patch: https://issues.apache.org/jira/secure/attachment/12441782/6706.bp20.patch Patch: https://issues.apache.org/jira/secure/attachment/12442253/6706.bp20.1.patch Author: Devaraj Das Reason: Security Ref: CDH-648 commit 1444a9469340822ed0af92ee9ac780c7b9835c26 Author: Todd Lipcon Date: Fri Apr 23 09:36:33 2010 -0700 HADOOP-6718. Client does not close connection when an exception happens during SASL negotiation Patch: https://issues.apache.org/jira/secure/attachment/12442614/6718-bp20.patch Author: Devaraj Das Ref: CDH-648 commit 12bcbba89226fdce99733f366e8eaacd09d95ab7 Author: Todd Lipcon Date: Fri Apr 16 15:36:16 2010 +0530 MAPREDUCE-1617. Fix unit test failures due to IPv6-related issues Patch: https://issues.apache.org/jira/secure/attachment/12441951/mr-1617-v1.3.patch. Author: Luke Lu Reason: fix unit test Ref: YDH commit d60df2ded7690aed311726e6493ca88d578b882b Author: Todd Lipcon Date: Sat Feb 20 12:17:18 2010 -0800 HADOOP-6545. Cached FileSystem objects can lead to wrong token being used in setting up connections Patch: https://issues.apache.org/jira/secure/attachment/12436456/6545-bp20.patch Author: Devaraj Das Ref: CDH-648 commit 110cd5235ba960e003ac94824a29a8b0ac36a031 Author: Todd Lipcon Date: Sat Apr 24 09:23:55 2010 -0700 MAPREDUCE-1718. Fix a bug in HFTPFileSystem so that delegation tokens function correctly Patch: https://issues.apache.org/jira/secure/attachment/12442726/MAPREDUCE-1718-BP20-2.patch Author: Boris Shkolnik Reason: Security Ref: CDH-648 commit f37d58671d0e9d601cfd446e6966b3a906d95029 Author: Todd Lipcon Date: Fri Apr 16 14:43:28 2010 +0530 MAPREDUCE-587. Fix TestStreamingExitStatus failure case on OSX Patch: https://issues.apache.org/jira/secure/attachment/12414990/MAPREDUCE-587-v1.0.patch. Author: Amar Kamat Ref: YDH commit eb3c35987b4434c85fb0203c866a7f8fd56674aa Author: Todd Lipcon Date: Wed Apr 14 13:20:06 2010 +0530 MAPREDUCE-1985. Fix for java.lang.ArrayIndexOutOfBoundsException in analysejobhistory.jsp of jobs with 0 maps Author: Vinod Kumar Ref: YDH commit 5bebf947cd534ee350844c3626d11dac315372ed Author: Todd Lipcon Date: Tue Apr 13 10:39:44 2010 -0700 MAPREDUCE-1680. Add a metric to track number of heartbeats processed by the JobTracker Patch: https://issues.apache.org/jira/secure/attachment/12441621/mapreduce-1680--2010-04-08.patch. Author: Dick King Ref: YDH commit 87c3e693adb3912b5c2755cf7e31f9c1b9973273 Author: Todd Lipcon Date: Mon Apr 12 16:48:37 2010 -0700 MAPREDUCE-1683. Removes JNI calls to get jvm current/max heap usage in ClusterStatus by default. Patch: https://issues.apache.org/jira/secure/attachment/12441563/MAPREDUCE-1683_yhadoop_20_S.patch Patch: https://issues.apache.org/jira/secure/attachment/12441978/MAPREDUCE-1683_part2_yhadoop_20_10.patch Reason: Performance improvement Ref: YDH commit 6ddae27ba50b6895509839bb89a7a8e2a0550284 Author: Todd Lipcon Date: Mon Apr 12 13:03:29 2010 -0700 HADOOP-6687. User object in the subject in UGI should be reused in case of a relogin. Patch: https://issues.apache.org/jira/secure/attachment/12440979/HADOOP-6687-y20.2.patch Author: Jitendra Nath Pandey Ref: YDH commit f0f38e93276cd5ca8a12c26a0cf138c65fef1951 Author: Todd Lipcon Date: Mon Apr 12 11:10:09 2010 +0530 MAPREDUCE-1635. Fix ResourceEstimator after MAPREDUCE-842 Patch: https://issues.apache.org/jira/secure/attachment/12441448/patch-1635-ydist.txt Author: Amareshwari Sriramadasu Ref: YDH commit 56acf64d1453e7de0c87d58bb4565dc1748de5a8 Author: Todd Lipcon Date: Sat Apr 10 17:19:45 2010 +0530 MAPREDUCE-1526. Gridmix: Cache the job related information while submitting the job to avoid many RPC calls to JobTracker. Patch: https://issues.apache.org/jira/secure/attachment/12440983/1594-yhadoop-20-1xx-1-5.patc://issues.apache.org/jira/secure/attachment/12441333/1526-yhadoop-20-101-4.patch Author: rahul k singh Ref: YDH commit 14c524cb4f78e52ee1d916fa92dfd2665a3a2527 Author: Todd Lipcon Date: Wed Apr 7 09:31:55 2010 -0700 HADOOP-6674. Turn off SASL checksums for RPCs. (jitendra via omalley) Patch: https://issues.apache.org/jira/secure/attachment/12442640/HADOOP-6674-y20.1.bugfix.patch Author: Jitendra Nath Pandey Reason: Performance Improvement in Secure RPC Ref: CDH-648 commit c145357a86dcabe072a3f93775c8f45161452841 Author: Todd Lipcon Date: Fri Apr 2 17:30:00 2010 -0700 HADOOP-5958. Replace fork of DF with library call. Author: Aaron Kimball Ref: YDH commit 2de11fbbf173eef5b35a3ae10777c87728492355 Author: Todd Lipcon Date: Fri Apr 9 15:31:59 2010 -0700 HDFS-999. Secondary namenode should login using kerberos if security is configured. Author: Boris Shkolnik Ref: CDH-648 commit 9d2cc8f9f81c8a39f4c8e1d7e00765ee29808145 Author: Todd Lipcon Date: Wed Apr 7 10:56:55 2010 +0530 MAPREDUCE-1594. Support sleep jobs in gridmix Patch: https://issues.apache.org/jira/secure/attachment/12440983/1594-yhadoop-20-1xx-1-5.patch Author: rahul k singh Ref: YDH commit af4ddb7f8866cf5fdcee45a91a7f10cb2e70c51a Author: Todd Lipcon Date: Tue Apr 6 17:26:10 2010 -0700 HDFS-955. Fix bug where FSImage.saveFSImage could lose edits Patch: https://issues.apache.org/jira/secure/attachment/12440925/saveNamespace-0.20.patch Author: Konstantin Shvachko Ref: YDH commit d229a5fc83e4722162912a60eae40121f2e504e6 Author: Todd Lipcon Date: Tue Apr 6 17:01:51 2010 -0700 HDFS-1007. Update HFTP to use delegation tokens Patch: https://issues.apache.org/jira/secure/attachment/12440931/HDFS-1007-BP20-fix-3.patch Author: Devaraj Das Ref: CDH-648 commit 46bb5af895747384e2698de9a628b8bea86093d0 Author: Todd Lipcon Date: Mon Apr 5 16:28:56 2010 -0700 HDFS-1080. SecondaryNameNode image transfer should use the defined http address rather than local ip address Patch: https://issues.apache.org/jira/secure/attachment/12440810/HDFS-1080-Y20.patch Author: Jakob Homan Ref: YDH commit 46683a5902d3c698e08dde2b1e2464da7636b809 Author: Todd Lipcon Date: Fri Apr 2 17:48:05 2010 -0700 HADOOP-6539. Update various pieces of documentation Patch: https://issues.apache.org/jira/secure/attachment/12440665/C6539-2-y20s.patch Author: Corinne Chandel Ref: YDH commit 9637fcdf4f49a8a6f8674f9ca047b33ab34dba0d Author: Todd Lipcon Date: Thu Apr 1 16:35:15 2010 -0700 HADOOP-6682. Fix incorrect hostname normalization for hostnames starting with [a-f] Author: Jakob Homan Ref: YDH commit 976302c14eeebc784e85d4af4746d45d39803a70 Author: Todd Lipcon Date: Fri Mar 26 11:24:22 2010 -0700 HADOOP-6661. Add documentation on how to securely impersonate other users Patch: https://issues.apache.org/jira/secure/attachment/12439897/HADOOP-6661-y20.2.patch Author: Jitendra Nath Pandey Ref: CDH-648 commit 25c7e968255a8fdacaaaca0358ce8894d7d925a3 Author: Todd Lipcon Date: Wed Mar 24 17:27:04 2010 -0700 MAPREDUCE-1624. Document job credentials and delegation tokens Patch: https://issues.apache.org/jira/secure/attachment/12439738/job-creds.2.patch Author: Devaraj Das Ref: CDH-648 commit c0dbf7361b383dd2e179eb33a5beeb7d6e52fc10 Author: Todd Lipcon Date: Tue Mar 23 00:20:20 2010 -0700 HADOOP-6656. Renew Kerberos TGT when 80% of the renew lifetime has been used up. (omalley) Author: Devaraj Das Reason: Security framework needs to renew Kerberos tickets while the process is running Ref: CDH-648 commit 9ecece0b50dbc523a1b9a7c48bb6330ae6e8c93e Author: Todd Lipcon Date: Sun Mar 21 14:10:53 2010 -0700 HADOOP-6653. Protect against NPE in setupSaslConnection when the real user is NULL. Author: Owen O'Malley Ref: CDH-648 commit a1871a06d05bcaa41c68673b68339a10c5f5e183 Author: Todd Lipcon Date: Sat Mar 20 17:06:36 2010 -0700 HADOOP-6652. Remove redundant cache in ShellBasedUnixGroupsMapping Patch: https://issues.apache.org/jira/secure/attachment/12439372/groups.patch Author: Devaraj Das Ref: CDH-648 commit ba1b824bfccac3386041eab7dbf29f5c7d4b8662 Author: Todd Lipcon Date: Sat Mar 20 15:22:59 2010 -0700 HADOOP-6649. The login object should be moved to the subject in the UGI. Patch: https://issues.apache.org/jira/secure/attachment/12439344/HADOOP-6649-y20.1.patch Patch: https://issues.apache.org/jira/secure/attachment/12439391/HADOOP-6649-y20.1.1.patch Author: Jitendra Nath Pandey Ref: CDH-648 commit 68b5ce77348d26ed4c694a505e7ad00f4ebfc650 Author: Todd Lipcon Date: Fri Mar 19 19:22:15 2010 -0700 HADOOP-6637. Add benchmark for overhead of RPC session establishment Patch: https://issues.apache.org/jira/secure/attachment/12439348/miniRPCBenchmark-20-100.patch Author: Konstantin Shvachko Ref: CDH-648 commit 5f84248c0faa67eddf505c5afa7b9f320b23e35c Author: Todd Lipcon Date: Fri Mar 19 17:12:53 2010 -0700 HADOOP-6648. Credentials must ignore null tokens that can be generated when using HFTP to talk to insecure clusters. Author: Devaraj Das Ref: CDH-648 commit 635b88303ce6ef229308b9787469cc5b04b4fe3c Author: Todd Lipcon Date: Fri Mar 19 14:05:16 2010 -0700 HADOOP-6647. Service authorization should compare short names, not full names. Patch: https://issues.apache.org/jira/secure/attachment/12439325/HADOOP-6647-BP20.patch Author: Boris Shkolnik Ref: CDH-648 commit ea78340db1b2eb3333a5f79fad076285f82917e7 Author: Todd Lipcon Date: Sat Mar 20 00:01:23 2010 +0530 MAPREDUCE-1612. job conf file is not accessible from job history web page Patch: https://issues.apache.org/jira/secure/attachment/12439310/jobconf_history_jsp.fix.20S.patch Author: Ravi Gummadi Ref: YDH commit 94abbf7bee9701230cff703aac7d740ff0333176 Author: Todd Lipcon Date: Fri Mar 19 22:58:21 2010 +0530 MAPREDUCE-1611. Add service authorization to the AdminOperationsProtocol Patch: https://issues.apache.org/jira/secure/attachment/12439295/MAPREDUCE-1611-20100319-ydist.txt. Author: Amar Kamat Ref: CDH-648 commit 51c966ff9e07ff017eb1960cac87d8035e4549c6 Author: Todd Lipcon Date: Fri Mar 19 10:10:35 2010 -0700 HADOOP-6644. Fix incorrect code style in util.Shell Patch: https://issues.apache.org/jira/secure/attachment/12439243/HADOOP-6644-BP20.patch Author: Boris Shkolnik Ref: YDH commit ef89b9ba5cacecaf5fd292b8cf2e6ebe347e85b7 Author: Todd Lipcon Date: Fri Mar 19 19:49:22 2010 +0530 MAPREDUCE-1609. TaskTracker.localizeJob should not set permissions on job log directory recursively Patch: https://issues.apache.org/jira/secure/attachment/12439278/MAPREDUCE-1609-20-1.patch Author: Amareshwari Sriramadasu Ref: CDH-648 commit 9d3a600899712a637109816b5f50b99257ceb79c Author: Todd Lipcon Date: Fri Mar 19 14:27:30 2010 +0530 MAPREDUCE-1610. Forrest documentation should be updated to reflect the changes in MAPREDUCE-856 Patch: https://issues.apache.org/jira/secure/attachment/12439252/MAPREDUCE-1610-20.patch Author: Ravi Gummadi Ref: CDH-648 commit aa3c9671cd7cfa6818c913f293d2605b87431e60 Author: Todd Lipcon Date: Fri Mar 19 01:10:57 2010 -0700 Amend MAPREDUCE-1532. Add more informative logging messages for configuration-authentication mismatch Patch: https://issues.apache.org/jira/secure/attachment/12439248/1532-bp20.4.2.patch Author: Devaraj Das Ref: CDH-648 commit 8f3777a09e9cc3e260e68c1a177908204b0dad8c Author: Todd Lipcon Date: Fri Mar 19 13:26:36 2010 +0530 MAPREDUCE-1417. Forrest documentation should be updated to reflect the changes in MAPREDUCE-744 Patch: https://issues.apache.org/jira/secure/attachment/12439247/MAPREDUCE-1417-20.patch Author: Ravi Gummadi Ref: CDH-648 commit adbf650d942aff9d372289110672916fcb4574a7 Author: Todd Lipcon Date: Fri Mar 19 10:03:46 2010 +0530 HADOOP-6634. AccessControlList uses full-principal names to verify acls causing queue-acls to fail Patch: https://issues.apache.org/jira/secure/attachment/12439238/HADOOP-6634-20100317-ydist.1.txt Author: Vinod K V Ref: CDH-648 commit 89e10647b3c733c16124b648a6bfe187670390da Author: Todd Lipcon Date: Thu Mar 18 17:02:47 2010 -0700 HADOOP-6642. Fix javac, javadoc, findbugs warnings Patch: https://issues.apache.org/jira/secure/attachment/12439225/C6642-1y20.patch Author: Chris Douglas Ref: CDH-648 commit 5abf4f644d7fc869858f896708b49f211ddf17d4 Author: Todd Lipcon Date: Thu Mar 18 15:42:18 2010 -0700 HDFS-1044. Cannot submit mapreduce job from secure client to unsecure sever Patch: https://issues.apache.org/jira/secure/attachment/12439220/HDFS-1044-BP20-6.patch Author: Boris Shkolnik Ref: CDH-648 commit 0f77b062de438f4d138da08d4028a7afe7a233bd Author: Todd Lipcon Date: Thu Mar 18 15:23:21 2010 -0700 HADOOP-6638. Try to relogin in a case of failed RPC connection (expired tgt) only in case the subject is loginUser or proxyUgi.realUser. Patch: https://issues.apache.org/jira/secure/attachment/12439080/HADOOP-6638-BP20.patch Author: Boris Shkolnik Ref: CDH-648 commit cfbc26a6211204e5e87d9986019fc97110662127 Author: Todd Lipcon Date: Thu Mar 18 03:58:08 2010 -0700 HADOOP-6632. Support for using different Kerberos keys for different instances of Hadoop services Patch: https://issues.apache.org/jira/secure/attachment/12439144/HADOOP-6632-Y20S-22.patch Patch: https://issues.apache.org/jira/secure/attachment/12439307/6632.mr.patch Author: Kan Zhang Ref: CDH-648 commit 5b6834d2f255fec50d44ec50bec38b0004055c7d Author: Todd Lipcon Date: Thu Mar 18 02:45:31 2010 -0700 HADOOP-6526. Need mapping from long principal names to local OS user names Patch: https://issues.apache.org/jira/secure/attachment/12439139/HADOOP-6526-y20.4.patch Patch: https://issues.apache.org/jira/secure/attachment/12442917/3595485.patch Author: Owen O'Malley Reason: Security Ref: YDH commit 87c00a2594f42c8d96479ba339800e5224136902 Author: Todd Lipcon Date: Thu Mar 18 12:05:01 2010 +0530 MAPREDUCE-1604. Document Job ACLs in forrest Patch: https://issues.apache.org/jira/secure/attachment/12439114/patch-1604-ydist.txt Author: Amareshwari Sriramadasu Ref: CDH-648 commit c69f8f7977591ac297cf5501374c4f40acfea7ee Author: Todd Lipcon Date: Wed Mar 17 20:32:34 2010 -0700 HDFS-1045. In secure clusters, re-login is necessary for https clients before opening connections Patch: https://issues.apache.org/jira/secure/attachment/12439110/HDFS-1045-Y20.patch Author: Jakob Homan Ref: CDH-648 commit 0f37bc72df436c30fbb3a1c826b2040f8118570b Author: Todd Lipcon Date: Wed Mar 17 14:13:39 2010 -0700 HDFS-6603. Clarify a comment in SecurityUtil Patch: https://issues.apache.org/jira/secure/attachment/12439078/fix_comment_y20.patch Author: Jakob Homan Ref: CDH-648 commit b10afd0f9b75e7deb5045e19a9a954769c0925e6 Author: Todd Lipcon Date: Fri Feb 26 18:40:08 2010 +0000 HDFS-985. HDFS should issue multiple RPCs for listing a large directory Patch: http://issues.apache.org/jira/secure/attachment/12437088/iterativeLS_yahoo1.patc Patch: http://issues.apache.org/jira/secure/attachment/12437499/testFileStatus.patch Patch: https://issues.apache.org/jira/secure/attachment/12439066/directoryBrowse_0.20yahoo_2.patch. Reason: Performance of large directory access Author: Hairong Kuang Ref: YDH commit 353f15176ed481f268f37e33d4fc9f1745f44afe Author: Todd Lipcon Date: Thu Mar 18 00:06:22 2010 +0530 MAPREDUCE-1543. Log messages of JobACLsManager should use security logging of HADOOP-6586 Patch: https://issues.apache.org/jira/secure/attachment/12439057/mapreduce-1543-y20s-3.patch Author: Luke Lu Ref: CDH-648 commit 02d4404d39e60322ea13a2ce0160635df4ef3155 Author: Todd Lipcon Date: Wed Mar 17 10:30:20 2010 -0700 MAPREDUCE-1606. TestJobACLs may timeout as there are no slots for launching JOB_CLEANUP task Patch: https://issues.apache.org/jira/secure/attachment/12439054/MR1606.20S.1.patch Author: Ravi Gummadi Ref: CDH-648 commit aa56ca096e5bba16a873600c57d29586896432ce Author: Todd Lipcon Date: Wed Mar 17 09:12:13 2010 -0700 HADOOP-6633. Normalize property names for JT/NN kerberos principal names in configuration Patch: https://issues.apache.org/jira/secure/attachment/12438949/HADOOP-6633-BP20-2.patch Author: Boris Shkolnik Ref: CDH-648 commit ec983446e7a1600a95fcf60bd92205f5b9318d99 Author: Todd Lipcon Date: Wed Mar 17 00:21:35 2010 -0700 HADOOP-6613. RPC server should check for version mismatch before authentication method Patch: https://issues.apache.org/jira/secure/attachment/12437831/HADOOP-6613-Y20S-1.patch Author: Kan Zhang Ref: CDH-648 commit d8de7f9dbe8b94d380849f2f5f4ff357ce50e1a6 Author: Todd Lipcon Date: Wed Mar 17 11:57:13 2010 +0530 HADOOP-5592. Fix typo in streaming documentation Patch: https://issues.apache.org/jira/secure/attachment/12436671/patch-5592-ydist.txt Author: Corinne Chandel Ref: YDH commit 3a91217a9802a4abe89a50dc0dcb8425cd42c060 Author: Todd Lipcon Date: Wed Mar 17 11:50:39 2010 +0530 MAPREDUCE-813. Address errors in streaming docs. Patch: https://issues.apache.org/jira/secure/attachment/12436672/patch-813-ydist.txt Author: Corinne Chandel Ref: YDH commit d170cf2c73554beec3dbc43d79cd3d15f7b4d99c Author: Todd Lipcon Date: Wed Mar 17 11:37:43 2010 +0530 MAPREDUCE-927. Cleanup of task-logs should happen in TaskTracker instead of the Child Patch: https://issues.apache.org/jira/secure/attachment/12439009/patch-927-5-dist.txt Author: Amareshwari Sriramadasu Ref: YDH commit 2acd7103481363aeadb4593738cc8e6f8f11f483 Author: Todd Lipcon Date: Tue Mar 16 15:46:02 2010 -0700 HDFS-1039. Service should be set in the token in JspHelper.getUGI Patch: https://issues.apache.org/jira/secure/attachment/12438896/HDFS-1039-y20.2.patch Patch: https://issues.apache.org/jira/secure/attachment/12439603/HDFS-1039-y20.2.1.patch Author: Jitendra Nath Pandey Ref: CDH-648 commit d31d5e1b2eef3d92c60781087aba2857900d2273 Author: Todd Lipcon Date: Tue Mar 16 12:01:28 2010 -0700 MAPREDUCE-1599. MRBench reuses jobConf and credentials therein. Patch: https://issues.apache.org/jira/secure/attachment/12438844/MR-1599-y20.1.patch Author: Jitendra Nath Pandey Ref: CDH-648 commit f4ad3c7d410bfb5d86053fcbc91b4275fe0fd74c Author: Todd Lipcon Date: Thu Mar 11 17:28:05 2010 -0800 HDFS-1036. in DelegationTokenFetch dfs.getURI returns no port Patch: https://issues.apache.org/jira/secure/attachment/12438549/HDFS-1036-BP20.patch Patch: https://issues.apache.org/jira/secure/attachment/12438585/HDFS-1036-BP20-1.patch Patch: https://issues.apache.org/jira/secure/attachment/12438856/fetchdt_doc.patch Author: Boris Shkolnik Ref: CDH-648 commit e4974cce6a221e1e4b8206145746e26c83fd9253 Author: Todd Lipcon Date: Thu Mar 11 20:17:52 2010 -0800 HDFS-1038. Fix bug causing NPE in nn_browsedfscontent.jsp when security is disabled Patch: https://issues.apache.org/jira/secure/attachment/12438570/HDFS-1038-y20.1.patch Author: Jitendra Nath Pandey Ref: CDH-648 commit 596d28594c9bd32116c6510e6607308bc1a762e8 Author: Todd Lipcon Date: Thu Mar 11 16:58:55 2010 -0800 HADOOP-6627. "Bad Connection to FS" message in FSShell should print message from the exception Patch: https://issues.apache.org/jira/secure/attachment/12438455/HADOOP-6627-BP20.patch Author: Boris Shkolnik Ref: CDH-648 commit 6d83d8407d13f0e4b8b26f74bfbbd592b4c906ee Author: Todd Lipcon Date: Thu Mar 11 14:38:56 2010 -0800 HDFS-1033. In secure clusters, NN and SNN should verify that the remote principal during image and edits transfer Patch: https://issues.apache.org/jira/secure/attachment/12438477/HDFS-1033-Y20.patch Author: Jakob Homan Ref: CDH-648 commit 3d43a9035ebaf0c449647c5f35fbd37444708d9f Author: Todd Lipcon Date: Thu Mar 11 10:48:27 2010 -0800 MAPREDUCE-1522. FileInputFormat may change the file system of an input path Patch: https://issues.apache.org/jira/secure/attachment/12437994/M1522-1v20.patch Author: Tsz Wo (Nicholas), SZE Ref: CDH-648 commit 897cd8d3d578c3cee70f039050f3cdd800daafb1 Author: Todd Lipcon Date: Wed Mar 10 15:45:12 2010 +0530 MAPREDUCE-1100. User's task-logs filling up local disks on the TaskTrackers Patch: https://issues.apache.org/jira/secure/attachment/12438394/patch-1100-fix-ydist.2.txt Author: Vinod K V Ref: YDH commit 6cbba23fe597ada4f109fc92ecbffc3d01dcc8ac Author: Todd Lipcon Date: Wed Mar 10 15:17:17 2010 +0530 MAPREDUCE-1422. Changing permissions of files/dirs under job-work-dir may be needed sothat cleaning up of job-dir in all mapred-local-directories succeeds always Patch: https://issues.apache.org/jira/secure/attachment/12438393/mapreduce-1422-y20s.patch Author: Amar Kamat Ref: CDH-648 commit 1cb57487f49bd7fd14c7575a1e8f5842b3f24e35 Author: Todd Lipcon Date: Tue Mar 9 23:39:13 2010 -0800 HDFS-992. Re-factor block access token implementation to conform to the generic Token interface in Common Patch: https://issues.apache.org/jira/secure/attachment/12438371/h992-BK-0.20-07.1.patch Author: Kan Zhang Ref: CDH-648 commit 543dcb4cdef6028146e5f8104123c8ac84e11e6b Author: Todd Lipcon Date: Wed Mar 10 11:20:33 2010 +0530 MAPREDUCE-890. After HADOOP-4491, the user who started mapred system is not able to run job. Patch: https://issues.apache.org/jira/secure/attachment/12438369/MR890.20S.patch Author: Ravi Gummadi Ref: CDH-648 commit 7b32dc80001de8b93c52041e15ab29bc52d5a68d Author: Todd Lipcon Date: Tue Mar 9 17:07:36 2010 -0800 HADOOP-6598. Remove verbose logging from the Groups class Patch: https://issues.apache.org/jira/secure/attachment/12438059/HADOOP-6598-BP20.patch Patch: https://issues.apache.org/jira/secure/attachment/12438562/HADOOP-6598-BP20-Fix.patch Author: Boris Shkolnik Ref: CDH-648 commit 8c19066cdc5f6605e65d07127649a77309f2943a Author: Todd Lipcon Date: Tue Mar 9 12:54:42 2010 -0800 HADOOP-6620. NPE if renewer is passed as null in getDelegationToken Patch: https://issues.apache.org/jira/secure/attachment/12438072/HADOOP-6620-y20.1.patch Author: Jitendra Nath Pandey Ref: CDH-648 commit 447441ec4a592225dca7d3bf42a6c6c2f977b2dc Author: Todd Lipcon Date: Mon Mar 8 22:58:43 2010 +0530 Amend MAPREDUCE-1435. symlinks in cwd of the task are not handled properly after MAPREDUCE-896 Reason: fixes chmod during cleanup to not make private files group-readable, adds tests Patch: https://issues.apache.org/jira/secure/attachment/12438172/MR-1435-y20s-1.txt Author: Ravi Gummadi Ref: CDH-648 commit eb5a68ab5a10bd85527cd4495a17687a826af698 Author: Todd Lipcon Date: Sun Mar 7 23:14:19 2010 -0800 HADOOP-6612. Protocols RefreshUserToGroupMappingsProtocol and RefreshAuthorizationPolicyProtocol will fail with security enabled Patch: https://issues.apache.org/jira/secure/attachment/12437809/HADOOP-6612-BP20.patch Author: Boris Shkolnik Ref: CDH-648 commit c62d6f0ed1c6d9e48db909acade2683830ebf37c Author: Todd Lipcon Date: Fri Mar 5 15:18:03 2010 -0800 MAPREDUCE-1566. Mechanism to import tokens and secrets from a file in to the submitted job. Patch: https://issues.apache.org/jira/secure/attachment/12438122/mr-1566-1.patch (bugfixes for testcases on top of the patch committed earlier) Patch: https://issues.apache.org/jira/secure/attachment/12438376/mr-1566-1.1.patch Author: Jitendra Nath Pandey Ref: CDH-648 commit 1c42a6fc1ed560fd31c0622ecb272e48a56d70a1 Author: Todd Lipcon Date: Fri Mar 5 18:42:28 2010 -0800 HADOOP-6603. Provide workaround for issue with Kerberos not resolving cross-realm principal Patch: https://issues.apache.org/jira/secure/attachment/12437826/HADOOP-6603-Y20S-4.patch Author: Jitendra Nath Pandey Ref: CDH-648 commit 072d68d2af1235eaacb49975f78d3457cec60938 Author: Todd Lipcon Date: Fri Mar 5 15:41:57 2010 +0530 MAPREDUCE-1421. LinuxTaskController tests failing on trunk after the commit of MAPREDUCE-1385 Patch: https://issues.apache.org/jira/secure/attachment/12437985/patch-1421-1-ydist.txt Author: Amareshwari Sriramadasu Ref: CDH-648 commit 69059680d9e0df4004ac7199c2c54ea658c35173 Author: Todd Lipcon Date: Fri Mar 5 05:58:25 2010 +0000 HDFS-814. Add an api to get the visible length of a DFSDataInputStream. Patch: http://issues.apache.org/jira/secure/attachment/12437934/getLength-yahoo-0.20.patch Patch: http://issues.apache.org/jira/secure/attachment/12438026/privateInputStream.patch Author: Tsz Wo (Nicholas), SZE Ref: YDH commit e5a03085722a84a8ee0419ba8d91cd022023c0a0 Author: Todd Lipcon Date: Thu Mar 4 19:05:52 2010 -0800 HDFS-1023. Allow http server to start as regular principal if https principal not defined. Patch: https://issues.apache.org/jira/secure/attachment/12437962/HADOOP-1023-Y20-1.patch Patch: https://issues.apache.org/jira/secure/attachment/12437962/HADOOP-1023-Y20-1.patch Patch: https://issues.apache.org/jira/secure/attachment/12438241/HDFS-1023-Y20-Update-2.patch Author: Jakob Homan Ref: CDH-648 commit 5dcd47a711f530fb989350569dd5553b93e62490 Author: Todd Lipcon Date: Thu Mar 4 00:55:53 2010 -0800 HDFS-1015. Intermittent failure in TestSecurityTokenEditLog Patch: https://issues.apache.org/jira/secure/attachment/12437830/HDFS-1015-y20.1.patch Author: Jitendra Nath Pandey Ref: CDH-648 commit 23cdcc2b187566333ae2201cd9706655f55ebf15 Author: Todd Lipcon Date: Wed Mar 3 19:14:35 2010 -0800 HDFS-1020. The canceller and renewer for delegation tokens should be long names. Patch: https://issues.apache.org/jira/secure/attachment/12437838/HDFS-1020-y20.2.patch Author: Jitendra Nath Pandey Ref: CDH-648 commit 522fd421225fbab7258a2977e4d40c0c13179376 Author: Todd Lipcon Date: Wed Mar 3 19:09:49 2010 -0800 HDFS-1019. Incorrect default values for delegation tokens in hdfs-default.xml Patch: https://issues.apache.org/jira/secure/attachment/12437832/HDFS-1019-y20.1.patch Author: Jitendra Nath Pandey Ref: CDH-648 commit 7187e0d7a8367e5e072b5e589db5becae0e6eb1d Author: Todd Lipcon Date: Wed Mar 3 18:55:35 2010 -0800 Amend MAPREDUCE-1430. JobTracker should be able to renew delegation tokens for the jobs Patch: https://issues.apache.org/jira/secure/attachment/12437822/1430-bp20-bugfix.patch Author: Devaraj Das Ref: YDH commit 52f7ba19fcd24172f1576b7c19db1c45427fe85d Author: Todd Lipcon Date: Wed Mar 3 18:53:33 2010 -0800 MAPREDUCE-1559. The DelegationTokenRenewal timer task should use the jobtracker's credentials to create the filesystem Patch: https://issues.apache.org/jira/secure/attachment/12437821/mr-1559.patch Author: Devaraj Das Ref: CDH-648 commit cab178d48ab27c74a44009b9d37e686580172794 Author: Todd Lipcon Date: Wed Mar 3 18:50:32 2010 -0800 MAPREDUCE-1550. UGI.doAs should not be used for getting the history file of jobs Patch: https://issues.apache.org/jira/secure/attachment/12437835/1550-2.patch Patch: https://issues.apache.org/jira/secure/attachment/12437870/1550-2.1.patch Author: Devaraj Das Ref: CDH-648 commit 4fd9491b5aeaea1f598f4314203aa5047576f137 Author: Todd Lipcon Date: Wed Mar 3 16:48:17 2010 -0800 HADOOP-6609. Fix UTF8 to use a thread local DataOutputBuffer instead of a static that was causing a deadlock in RPC. (omalley) Author: Owen O'Malley Ref: YDH commit a4c9f34046770120a47ba864eaaaab0ecc952d86 Author: Todd Lipcon Date: Wed Mar 3 10:50:34 2010 -0800 HDFS-1017. browsedfs jsp should call JspHelper.getUGI rather than using createRemoteUser() Patch: https://issues.apache.org/jira/secure/attachment/12437683/HDFS-1017-Y20-2.patch Author: Jakob Homan Ref: CDH-648 commit 5802343b91fc15ee719c2cffabf6a3f9f01f4007 Author: Todd Lipcon Date: Wed Mar 3 09:45:14 2010 +0530 MAPREDUCE-899. When using LinuxTaskController, localized files may become accessible to unintended users if permissions are misconfigured. Patch: https://issues.apache.org/jira/secure/attachment/12437670/mr-899-20.patch Author: Amareshwari Sriramadasu Ref: CDH-648 commit d763323e21122018c746515655c9fdff00547635 Author: Todd Lipcon Date: Wed Mar 3 00:36:08 2010 +0000 HDFS-204. Revive number of files listed metrics Patch: http://issues.apache.org/jira/secure/attachment/12437576/getFileNum-yahoo20.patch Author: Jitendra Nath Pandey Ref: YDH commit ba0fc48b0d49ba1c03ef50ddb289580e61e0d689 Author: Todd Lipcon Date: Tue Mar 2 23:04:42 2010 +0000 HADOOP-6569. FsShell#cat should avoid calling unecessary getFileStatus before opening a file to read Patch: http://issues.apache.org/jira/secure/attachment/12437633/optimizeCat-yahoo2.patch Author: Hairong Kuang Ref: YDH commit ebc7ac47eb93eeab3f8b8987ac74a56b0e40a982 Author: Todd Lipcon Date: Mon Mar 1 17:25:45 2010 -0800 HDFS-1014. Error in reading delegation tokens from edit logs. Patch: https://issues.apache.org/jira/secure/attachment/12437547/HDFS-1014-y20.1.patch Author: Jitendra Nath Pandey Ref: CDH-648 commit e5ec38e7ece3c2e9d7c64ae8bad4fe60e2a75088 Author: Todd Lipcon Date: Mon Mar 1 01:05:29 2010 -0800 HDFS-1006. getImage/putImage http requests should be https for the case of security enabled. Patch: https://issues.apache.org/jira/secure/attachment/12437467/HDFS-1006-Y20.1.patch Author: Boris Shkolnik Ref: CDH-648 commit 0c9d8c2dd9693f0a7317a139b5911a32c85aab61 Author: Todd Lipcon Date: Mon Mar 1 00:27:00 2010 -0800 HDFS-1005. Fsck security Patch: https://issues.apache.org/jira/secure/attachment/12437435/HDFS-1005-BP20.patch Patch: https://issues.apache.org/jira/secure/attachment/12438474/HDFS-1005-BP20-1.patch Author: Boris Shkolnik Ref: CDH-648 commit 5a1ea2b0a3050a0a8fdae64902067504d6f8c8eb Author: Todd Lipcon Date: Mon Mar 1 00:09:00 2010 -0800 HDFS-1007. HFTP needs to be updated to use delegation tokens Patch: https://issues.apache.org/jira/secure/attachment/12437458/distcp-hftp.2.patch Patch: https://issues.apache.org/jira/secure/attachment/12437464/distcp-hftp.2.1.patch Patch: https://issues.apache.org/jira/secure/attachment/12438384/distcp-hftp-2.1.1.patch Author: Devaraj Das Ref: CDH-648 commit 4bcd449f60c72fb3058a6b3ff1316b7bc10514e8 Author: Todd Lipcon Date: Sat Feb 27 04:04:26 2010 -0800 HDFS-992. Re-factor block access token implementation to conform to the generic Token interface in Common Patch: https://issues.apache.org/jira/secure/attachment/12437340/h992-BK-0.20-07.patch Author: Kan Zhang Ref: CDH-648 commit 039917c679539092585e96f590fae59052a3cae6 Author: Todd Lipcon Date: Sat Feb 27 03:26:42 2010 -0800 MAPREDUCE-1528. TokenStorage should not be static Patch: https://issues.apache.org/jira/secure/attachment/12437339/MAPREDUCE-1528_yhadoop20.patch Author: Jitendra Nath Pandey Ref: CDH-648 commit 40c1a9177db24c0fd589ad8312a31a715f4cf8ad Author: Todd Lipcon Date: Sat Feb 27 03:19:02 2010 -0800 HADOOP-6584. Provide Kerberized SSL encryption for webservices Patch: https://issues.apache.org/jira/secure/attachment/12437337/HADOOP-6584-Y20-4.patch Patch: https://issues.apache.org/jira/secure/attachment/12437337/HADOOP-6584-Y20-4.patch Patch: https://issues.apache.org/jira/secure/attachment/12437768/HADOOP-6584-FixJavadoc-Y20.patch Author: Jakob Homan Ref: CDH-648 commit ee57d93dde68a072985a44e025594bbb8c9340b0 Author: Todd Lipcon Date: Sat Feb 27 16:41:17 2010 +0530 MAPREDUCE-1493. Authorization for job-history pages Patch: https://issues.apache.org/jira/secure/attachment/12437336/MAPREDUCE-1493-20100227.3-ydist.txt Author: Vinod K V Ref: CDH-648 commit e6e5a1bd6943f228f5271627a200350bff525cba Author: Todd Lipcon Date: Sat Feb 27 15:44:10 2010 +0530 MAPREDUCE-1455. Authorization for servlets Patch: https://issues.apache.org/jira/secure/attachment/12437322/1455.20S.2.patch Patch: https://issues.apache.org/jira/secure/attachment/12437379/1455.20S.2.fix.patch Author: Ravi Gummadi Ref: CDH-648 commit 3ad925fb44f4e98e986b5648e2c593d1feabafd4 Author: Todd Lipcon Date: Sat Feb 27 15:33:40 2010 +0530 MAPREDUCE-1307. Introduce the concept of Job Permissions Patch: https://issues.apache.org/jira/secure/attachment/12437331/MAPREDUCE-1307-20100227-ydist.txt Author: Vinod K V Ref: CDH-648 commit 06b43c680556c94120774e2c22279086654f50ee Author: Todd Lipcon Date: Sat Feb 27 15:03:15 2010 +0530 HADOOP-6568. Authorization for default servlets Patch: https://issues.apache.org/jira/secure/attachment/12437323/HADOOP-6568-20100226.1-ydist.patch Author: Vinod K V Ref: CDH-648 commit cf43306ef9d6a7b1a3d3287a898c05cbd51361b6 Author: Todd Lipcon Date: Sat Feb 27 00:04:02 2010 -0800 HADOOP-6589. A framework to enable better error messages when rpc connections fail to authenticate. (Kan Zhang via omalley) Author: Kan Zhang Ref: CDH-648 commit ea1922be01c027410bdfdb3776b79cbb2328162d Author: Todd Lipcon Date: Fri Feb 26 22:45:11 2010 -0800 HADOOP-6600. mechanism for authorization check for inter-server protocols Patch: https://issues.apache.org/jira/secure/attachment/12437320/HADOOP-6600-4-BP20.patch Patch: https://issues.apache.org/jira/secure/attachment/12437534/HADOOP-6600-BP20-fix.patch Author: Boris Shkolnik Ref: CDH-648 commit 877787c7557f3c2bb824f526d5b036c734bfc5e1 Author: Todd Lipcon Date: Fri Feb 26 21:43:27 2010 -0800 HADOOP-6580,HDFS-993,MR-1516. UGI should contain authentication method. Patch: https://issues.apache.org/jira/secure/attachment/12437317/HADOOP-6580-0_20.5.patch Author: Jitendra Nath Pandey Ref: CDH-648 commit 2a717fe40d0a2514647f25457e6a8f344dff7941 Author: Todd Lipcon Date: Fri Feb 26 21:22:54 2010 -0800 HADOOP-6573, HDFS-984, MR-1537. Delegation Tokens should be persisted. Patch: https://issues.apache.org/jira/secure/attachment/12437292/HDFS-984-0_20.4.patch Author: Jitendra Nath Pandey Ref: CDH-648 commit 5ba4559365dec19e119d041ddf7f6f65b7fc0c56 Author: Todd Lipcon Date: Fri Feb 26 19:26:55 2010 -0800 HDFS-994, HADOOP-6594. Provide methods for obtaining delegation token from Namenode for hftp and other uses Patch: https://issues.apache.org/jira/secure/attachment/12436748/HADOOP-6594.patch Author: Jakob Homan Ref: CDH-648 commit c697b870f3d87e17438115a037e9fd831e757e3f Author: Todd Lipcon Date: Fri Feb 26 18:27:28 2010 -0800 HADOOP-6586. Log authentication and authorization failures and successes Patch: https://issues.apache.org/jira/secure/attachment/12437302/HADOOP-6586-8-BP20-1.patch Author: Boris Shkolnik Ref: CDH-648 commit ac2d2869ecccd91fdacab7544eca00767f8d64a1 Author: Todd Lipcon Date: Fri Feb 26 11:28:36 2010 -0800 HDFS-991. Use delegation token to authenticate to the hdfs servlets. Author: Owen O'Malley Ref: CDH-648 commit 8b1175a8139fc28041c67f095212d7224ac44b2e Author: Todd Lipcon Date: Fri Feb 26 16:50:29 2010 -0800 HADOOP-6599. Split RPC metrics into summary and detailed metrics Patch: https://issues.apache.org/jira/secure/attachment/12437251/hadoop-6599.rel20.patch Author: Suresh Srinivas Ref: YDH commit 12f51ccde0651d7ed7e2458dfd498f0b8856d70a Author: Todd Lipcon Date: Fri Feb 26 13:14:06 2010 -0800 HDFS-998. The servlets should quote server generated strings sent in the response Patch: http://issues.apache.org/jira/secure/attachment/12436835/H998-0y20.patch Author: Chris Douglas Ref: CDH-648 commit 0d1bfe678e356b154d237e34d4e4379a84614158 Author: Todd Lipcon Date: Fri Feb 26 13:12:41 2010 -0800 MAPREDUCE-1454. The servlets should quote server generated strings sent in the response Patch: http://issues.apache.org/jira/secure/attachment/12436834/M1454-0y20.patch Patch: https://issues.apache.org/jira/secure/attachment/12437591/M1454-1y20.patch Author: Chris Douglas Ref: CDH-648 commit 36710ad2bc76df6df01ef540df21fedb9b702160 Author: Todd Lipcon Date: Thu Feb 25 18:28:07 2010 -0800 HDFS-1000. libhdfs needs to be updated to use the new UGI Patch: https://issues.apache.org/jira/secure/attachment/12437071/hdfs-1000-bp20.4.patch Author: Devaraj Das Ref: CDH-648 commit a9847d0ea9f695dca2d3dca6e81a8cd1f29665fc Author: Todd Lipcon Date: Thu Feb 25 18:25:29 2010 -0800 MAPREDUCE-1532. Delegation token is obtained as the superuser Patch: https://issues.apache.org/jira/secure/attachment/12437096/1532-bp20.4.patch Author: Devaraj Das Ref: CDH-648 commit c44edf05330198b89ec994b2543e1fe459dd30bd Author: Todd Lipcon Date: Sun Feb 21 22:10:38 2010 -0800 MAPREDUCE-1430. JobTracker should be able to renew delegation tokens for the jobs Patch: https://issues.apache.org/jira/secure/attachment/12436542/1430-dd4-BP20.patch Author: Boris Shkolnik Ref: CDH-648 commit 6e62f0d9c2d57096e4dfa937ebeab8c76b354e63 Author: Todd Lipcon Date: Thu Feb 25 10:37:41 2010 -0800 HADOOP-6596. Add a version field to the serialization of the AbstractDelegationTokenIdentifier. Author: Owen O'Malley Ref: CDH-648 commit 74cc8e6e9836a9daab24553b41efca110f50411a Author: Todd Lipcon Date: Thu Feb 25 10:29:08 2010 -0800 HADOOP-5561. Add javadoc.maxmemory to build.xml to allow larger memory. Author: Jakob Homan Ref: YDH commit 03e35485989da9f5d60dacdcfc67e8566b8590f8 Author: Todd Lipcon Date: Thu Feb 25 10:14:22 2010 -0800 HADOOP-6579. Add a mechanism for encoding and decoding Tokens in to url-safe strings. Also change commons-codec library to 1.4. Author: Owen O'Malley Ref: CDH-648 commit a04e3abfd58010fd99877910c0f15bbbebb1b45a Author: Todd Lipcon Date: Thu Feb 25 20:48:43 2010 +0530 MAPREDUCE-1354. Incremental enhancements to the JobTracker for better scalability Patch: https://issues.apache.org/jira/secure/attachment/12437010/mr-1354-y20.patch Author: Dick King Ref: YDH commit 69ec78a2c5d4faa8700cf45f7560fe1d10517ddf Author: Todd Lipcon Date: Wed Feb 24 17:16:50 2010 -0800 HDFS-999. Secondary namenode should login using kerberos if security is configured Patch: https://issues.apache.org/jira/secure/attachment/12436938/HDFS-999-BP20.patch Author: Boris Shkolnik Ref: CDH-648 commit ac14a11d60fcbd9cc660fc8b48a894cd6774b102 Author: Todd Lipcon Date: Thu Feb 25 00:16:28 2010 +0530 MAPREDUCE-1466. FileInputFormat should save #input-files in JobConf Patch: https://issues.apache.org/jira/secure/attachment/12436886/MAPREDUCE-1466_yhadoop20-3.patch Author: Luke Lu Ref: YDH commit 50ddfaab96c196a2342e7807423d31f5bd10a18e Author: Todd Lipcon Date: Wed Feb 24 17:04:46 2010 +0530 MAPREDUCE-1403. Save file-sizes of each of the artifacts in DistributedCache in the JobConf Patch: https://issues.apache.org/jira/secure/attachment/12436842/MAPREDUCE-1403_yhadoop20-2.patch Author: Arun C Murthy Ref: YDH commit 645b54a053bf565ef2a0f36be8c5a02f80a2775a Author: Todd Lipcon Date: Tue Feb 23 23:35:49 2010 -0800 HADOOP-6566. Hadoop daemons should not start up if the ownership/permissions on the directories used at runtime are misconfigured Patch: https://issues.apache.org/jira/secure/attachment/12436814/HADOOP-6566_yhadoop20.patch Author: Arun C Murthy Ref: CDH-648 commit 2943513d74b4a8c1763eccab854b96a57caec7f4 Author: Todd Lipcon Date: Tue Feb 23 18:04:41 2010 -0800 MAPREDUCE-1520. TestMiniMRLocalFS fails on trunk Patch: https://issues.apache.org/jira/secure/attachment/12436695/patch-1520-20S.txt Author: Amareshwari Sriramadasu Ref: CDH-648 commit e82c781785f284d926591a91df837021f5c68fcc Author: Todd Lipcon Date: Tue Feb 23 17:45:16 2010 -0800 HADOOP-6543. Allow authentication-enabled RPC clients to connect to authentication-disabled RPC servers Patch: https://issues.apache.org/jira/secure/attachment/12436797/6543-bp20.0.patch. Patch: https://issues.apache.org/jira/secure/attachment/12436807/6543-bp20.1.patch Author: Kan Zhang Ref: CDH-648 commit 48788d6ab1852ee1f47b56ee6f097022c30ec409 Author: Todd Lipcon Date: Tue Feb 23 17:29:15 2010 -0800 MAPREDUCE-1505. Cluster class should create the rpc client only when needed Patch: https://issues.apache.org/jira/secure/attachment/12436628/MAPREDUCE-1505_yhadoop20.patch Author: Dick King Ref: YDH commit 8a4f25cf0d3631d190ecfa81e17a80bfbb019c69 Author: Todd Lipcon Date: Tue Feb 23 16:58:04 2010 -0800 HADOOP-6549. TestDoAsEffectiveUser should use ip address of the host for superuser ip check Patch: https://issues.apache.org/jira/secure/attachment/12436794/HADOOP-6549-0_20.1.patch Author: Jitendra Nath Pandey Ref: CDH-648 commit 7bc77f6677512242d19b75c643f21a457087d29a Author: Todd Lipcon Date: Tue Feb 23 16:18:31 2010 -0800 HDFS-786. Implement getContentSummary(..) in HftpFileSystem Patch: https://issues.apache.org/jira/secure/attachment/12436792/h786_20100223_0.20.patch Author: Tsz Wo (Nicholas), SZE Ref: YDH commit ff8f5ea311eb73564805f083a379147fd5aa6d47 Author: Todd Lipcon Date: Tue Feb 23 20:28:10 2010 +0000 HDFS-946. NameNode should not return full path name when lisitng a directory or getting the status of a file Patch: http://issues.apache.org/jira/secure/attachment/12436753/HdfsFileStatus-yahoo20.patch. Patch: http://issues.apache.org/jira/secure/attachment/12436769/HdfsFileStatusProxy-Yahoo20.patch Ref: YDH commit a3fe5640d9cdab14cc7080093b2dbb7933d640d9 Author: Todd Lipcon Date: Tue Feb 23 23:04:04 2010 +0530 MAPREDUCE-1398. TaskLauncher remains stuck on tasks waiting for free nodes even if task is killed. Patch: https://issues.apache.org/jira/secure/attachment/12436724/mr-1398-y20.patch Author: Amareshwari Sriramadasu Ref: YDH commit ab177895085d9b5fcfcb1fc695bbd94b753b9160 Author: Todd Lipcon Date: Tue Feb 23 22:38:02 2010 +0530 MAPREDUCE-1476. committer.needsTaskCommit should not be called for a task cleanup attempt Patch: https://issues.apache.org/jira/secure/attachment/12436722/mr-1476-y20.patch Author: Amareshwari Sriramadasu Ref: YDH commit eb3c54506da655c486c06560877a2e30dd3aec3f Author: Todd Lipcon Date: Tue Feb 23 06:57:55 2010 +0000 HADOOP-6467. Performance improvement for liststatus on directories in hadoop archives. Patch: http://issues.apache.org/jira/secure/attachment/12436653/HADOOP-6467-y.0.20-branch-v2.patch Author: Mahadev konar Ref: YDH commit 1ee8f38881d450a818754c889038ef2ea8de865d Author: Todd Lipcon Date: Mon Feb 22 23:47:58 2010 +0000 HADOOP-6558. archive does not work with distcp -update Patch: http://issues.apache.org/jira/secure/attachment/12436264/c6558_20100216b_y0.20.patch Author: Tsz Wo (Nicholas), SZE Ref: YDH commit 5792ec1c6229c000b34fd2cb1f3447ae71ad9949 Author: Todd Lipcon Date: Mon Feb 22 15:21:20 2010 -0800 HADOOP-6583. Capture metrics for authentication/authorization at the RPC layer Patch: https://issues.apache.org/jira/secure/attachment/12436643/6583-bp20.patch Author: Devaraj Das Ref: CDH-648 commit f370a6f8f27d6bc813872a89d9dac122dd357b53 Author: Todd Lipcon Date: Mon Feb 22 14:35:53 2010 -0800 HADOOP-6577. IPC server response buffer reset threshold should be configurable Patch: https://issues.apache.org/jira/secure/attachment/12436399/hadoop-6577.2.rel20.patch) from yahoo-hadoop-0.20 into yahoo-hadoop-0.20.1xx Author: Suresh Srinivas Ref: YDH commit 364fd3118df3fb08ec239306fcc3b1762cb803d0 Author: Todd Lipcon Date: Mon Feb 22 17:11:21 2010 +0530 MAPREDUCE-1316. JobTracker holds stale references to retired jobs via unreported tasks Patch: https://issues.apache.org/jira/secure/attachment/12436563/mapreduce-1316-y20s.patch Author: Amar Kamat Ref: YDH commit 2471700c34dece87f611cceaf5b961647ab58700 Author: Todd Lipcon Date: Fri Feb 19 15:58:46 2010 -0800 HADOOP-6551, HDFS-986, MAPREDUCE-1503. Change API for tokens to throw exceptions instead of returning booleans. Author: Owen O'Malley Ref: CDH-648 commit 5838b15e3232608c1358887b8910638b1497043f Author: Todd Lipcon Date: Fri Feb 19 23:55:25 2010 -0800 HADOOP-6572. RPC responses may be out-of-order with respect to SASL Patch: https://issues.apache.org/jira/secure/attachment/12436421/6572-bp20.patch Author: Kan Zhang Ref: CDH-648 commit 4719ab45ca9a8ae6d8289dddc024854683726b19 Author: Todd Lipcon Date: Fri Feb 19 15:11:03 2010 -0800 HDFS-965. Split the HDFS TestDelegationToken into two tests, of which one proxy users and the other normal users. (jitendra via omalley) Author: Jitendra Nath Pandey Ref: CDH-648 commit a0380f3cae542769bd6861311d0fc709201e9dc3 Author: Todd Lipcon Date: Fri Feb 19 14:35:02 2010 -0800 HADOOP-6332, HDFS-1134, MAPREDUCE-1774. Herriot (system test framework) Author: Konstantin Boudnik Ref: YDH commit 205a6b6697f7a8934f684e08c187682d0f1d3b2d Author: Todd Lipcon Date: Thu Feb 18 22:12:29 2010 +0000 HADOOP-6560. HarFileSystem throws NPE for har://hdfs-/foo Patch: http://issues.apache.org/jira/secure/attachment/12436045/c6560_20100212_y0.20.patch Author: Tsz Wo (Nicholas), SZE Ref: YDH commit f33ae6567528673de4c4b0de8f40cf2c9ff5741c Author: Todd Lipcon Date: Thu Feb 18 12:15:13 2010 +0530 MAPREDUCE-686. Move TestSpeculativeExecution.Fake* into a separate class so that it can be used by other tests also Patch: https://issues.apache.org/jira/secure/attachment/12436181/MAPREDUCE-686-y20.patch Author: Jothi Padmanabhan Ref: YDH commit dd2ce99bb706ba8e7771b3382de7af687ae8467f Author: Todd Lipcon Date: Tue Feb 16 12:51:39 2010 -0800 HDFS-111. UnderReplicationBlocks should use generic types Patch: https://issues.apache.org/jira/secure/attachment/12436027/1026-bp20-bugfix.patch Author: Devaraj Das Ref: YDH commit 67e25d271e93cc035ed664f9bfa16705f6a45958 Author: Todd Lipcon Date: Sun Feb 14 23:34:54 2010 -0800 HADOOP-6559. The RPC client should try to re-login when it detects that the TGT expired Patch: https://issues.apache.org/jira/secure/attachment/12435851/h-6559.6.bp20.patch Author: Devaraj Das Ref: CDH-648 commit 176816d52d875d33877baba51294de5b1868d3aa Author: Todd Lipcon Date: Sun Feb 14 14:50:27 2010 +0530 HADOOP-2141. speculative execution start up condition based on completion time Patch: https://issues.apache.org/jira/secure/attachment/12435253/hadoop-2141-yahoo-v1.4.8.patch (only test related changes) Author: Andy Konwinski Ref: YDH commit cd035a28f73f373b695b3704243d013508036346 Author: Todd Lipcon Date: Thu Feb 11 19:51:52 2010 +0000 MAPREDUCE-1425. archive throws OutOfMemoryError Patch: http://issues.apache.org/jira/secure/attachment/12435030/MAPREDUCE-1425_y_0.20.patch Author: Mahadev konar Ref: YDH commit 7260de34b087c442e5054410e038f7bc2214e077 Author: Todd Lipcon Date: Thu Feb 11 19:35:40 2010 +0000 MAPREDUCE-1399. The archive command shows a null error message Patch: http://issues.apache.org/jira/secure/attachment/12435380/m1399_20100205trunk2_y0.20.patch Author: Tsz Wo (Nicholas), SZE Ref: YDH commit a60877ba994c36b0d81f7d1c47a81b1111906bd2 Author: Todd Lipcon Date: Tue Feb 9 21:08:18 2010 -0800 HADOOP-6552. KEYTAB_KERBEROS_OPTIONS in UserGroupInformation should have options for automatic renewal of keytab based tickets Patch: https://issues.apache.org/jira/secure/attachment/12435369/6552.patch Author: Devaraj Das Ref: CDH-648 commit b96008a997c4cf52f01a32daa103244a27190639 Author: Todd Lipcon Date: Tue Feb 9 21:06:07 2010 -0800 MAPREDUCE-1433. Create a Delegation token for MapReduce Patch: https://issues.apache.org/jira/secure/attachment/12435412/1433.bp20.patch Author: Owen O'Malley Ref: CDH-648 commit 29b6749dfdd9038d9f54ef9f0669c5b1fc553463 Author: Todd Lipcon Date: Tue Feb 9 01:34:27 2010 -0800 HADOOP-6547, HDFS-949, MAPREDUCE-1470. Move the Delegation Token feature to common since both HDFS and MapReduce needs it Patch: https://issues.apache.org/jira/secure/attachment/12435271/6547-949-1470-0_20.1.patch Author: Devaraj Das Ref: CDH-648 commit 542f37d8c93b1cae42aec29789068d27bc2330fb Author: Todd Lipcon Date: Tue Feb 9 12:07:45 2010 +0530 HADOOP-5879. GzipCodec should read compression level etc from configuration Patch: http://issues.apache.org/jira/secure/attachment/12435254/hadoop-5879-yahoo-0.20-v1.0.patch Author: He Yongqiang Ref: YDH commit 7d2d3c129e0c1358e490d09744ce1b448e430beb Author: Todd Lipcon Date: Tue Feb 9 11:10:54 2010 +0530 HADOOP-6161. Add get/setEnum to Configuration Patch: http://issues.apache.org/jira/secure/attachment/12434928/hadoop-6161-yahoo-20-v1.patch Author: Chris Douglas Ref: YDH commit ed87d9db29b9375d2c10047bbc3e7d136c76a581 Author: Todd Lipcon Date: Mon Feb 8 19:44:20 2010 -0800 HADOOP-6510, HDFS-935, MAPREDUCE-1464. Add support for a superuser authenticating on behalf of a proxy user. Patch: https://issues.apache.org/jira/secure/attachment/12435223/HADOOP-6510-0_20.4.patch Author: Jitendra Nath Pandey Ref: CDH-648 commit df2da5d463d6a2b09f11c44d1eebbf85fa73ce81 Author: Todd Lipcon Date: Mon Feb 8 20:15:15 2010 +0530 MAPREDUCE-1435. symlinks in cwd of the task are not handled properly after MAPREDUCE-896 Patch: https://issues.apache.org/jira/secure/attachment/12435154/MR-1435-y20s.patch Author: Ravi Gummadi Ref: CDH-648 commit 4fe06f63ca10d7cc5949615d2d41261782156653 Author: Todd Lipcon Date: Sun Feb 7 00:22:29 2010 -0800 MAPREDUCE-1457. For secure job execution, couple of more UserGroupInformation.doAs needs to be added Patch: https://issues.apache.org/jira/secure/attachment/12435115/MAPREDUCE-1457-BPY20.patch.1 Author: Jakob Homan Ref: CDH-648 commit 6b874f7d11e11f14450e882b670180daef48a76e Author: Todd Lipcon Date: Sat Feb 6 12:45:49 2010 -0800 MAPREDUCE-1440. MapReduce should use the short form of the user names Patch: https://issues.apache.org/jira/secure/attachment/12435087/1440.y20.patch Author: Owen O'Malley Ref: CDH-648 commit b14b570878b9262f1a77c668b5123345406f8374 Author: Todd Lipcon Date: Fri Feb 5 17:29:01 2010 -0800 HDFS-737. Improvement in metasave output Patch: https://issues.apache.org/jira/secure/attachment/12435041/HDFS-737.3.rel20.patch Author: Jitendra Nath Pandey Ref: YDH commit 5141b5979120da19551385ff1ce13b545266b204 Author: Todd Lipcon Date: Fri Feb 5 15:35:16 2010 -0800 HADOOP-6419. Change RPC layer to support SASL based mutual authentication Patch: https://issues.apache.org/jira/secure/attachment/12434998/HADOOP-6419-0.20-15.patch Patch: https://issues.apache.org/jira/secure/attachment/12435135/6419-bp20-jobsubmitprotocol.patch Author: Kan Zhang Ref: CDH-648 commit d1f946ae7bfd5e619e8167ebad228be72668b0a9 Author: Todd Lipcon Date: Fri Feb 5 15:29:26 2010 -0800 HADOOP-6538. Set hadoop.security.authentication to "simple" by default Patch: https://issues.apache.org/jira/secure/attachment/12435031/6538-bp20.patch Author: Devaraj Das Ref: CDH-648 commit d012efa36429328941a04742bd6febd35d3875ef Author: Todd Lipcon Date: Fri Feb 5 23:06:29 2010 +0000 HDFS-938. Replace calls to UGI.getUserName() with UGI.getShortUserName() Patch: https://issues.apache.org/jira/secure/attachment/12435015/HDFS-938-BP20-2.patch Author: Jakob Homan Ref: CDH-648 commit 4f96064bf4bb838ce0c6d4e99152a02ad9737032 Author: Todd Lipcon Date: Fri Feb 5 15:03:57 2010 -0800 HADOOP-6521. FsPermission:SetUMask not updated to use new-style umask setting. Patch: https://issues.apache.org/jira/secure/attachment/12434469/hadoop-6521.rel20.1.patch Author: Suresh Srinivas Ref: YDH commit ee18c74b284b015975b0df13c750e135ff938fbe Author: Todd Lipcon Date: Fri Feb 5 20:23:24 2010 +0000 HADOOP-6544. fix ivy settings to include JSON jackson.codehause.org libs for .20 Patch: https://issues.apache.org/jira/secure/attachment/12435002/contrib.ivy.jackson.patch-3 Author: Boris Shkolnik Reason: contrib build breaks because ivy is not configured to include jackson libs. Ref: YDH commit 18b89be19183117cbe0a567ecb16e8012bc83c48 Author: Todd Lipcon Date: Thu Feb 4 18:47:54 2010 -0800 HDFS-907. Add tests for getBlockLocations and totalLoad metrics. Patch: https://issues.apache.org/jira/secure/attachment/12434919/HDFS907s.patch Author: Ravi Phulari Ref: YDH commit 77510480a8b45b2f9e605b720671d609d7bf4687 Author: Todd Lipcon Date: Tue Feb 2 16:25:41 2010 -0800 HADOOP-6204. Implementing aspects development and fault injection framework for Hadoop Patch: https://issues.apache.org/jira/secure/attachment/12434616/HADOOP-6204-ydist.patch Author: Konstantin Boudnik Ref: YDH commit 993bc455b265d185f74c23ec7ccb272203190298 Author: Todd Lipcon Date: Tue Feb 2 10:31:13 2010 -0800 MAPREDUCE-1432. Add the hooks in JobTracker and TaskTracker to load tokens from the token cache into the user's UGI Patch: https://issues.apache.org/jira/secure/attachment/12434550/MAPREDUCE-1432-BP20-2.patch Author: Devaraj Das Ref: CDH-648 commit 32957687aa59ef0d515682d87e521d8aba244e3b Author: Todd Lipcon Date: Mon Feb 1 22:52:13 2010 -0800 MAPREDUCE-1383. Allow storage and caching of delegation token. Patch: https://issues.apache.org/jira/secure/attachment/12434455/MAPREDUCE-1383-BP20-7.patch Author: Boris Shkolnik Ref: CDH-648 commit ce7c3dc280b3d3ba0b176f0b7a9dc09d5ca163f5 Author: Todd Lipcon Date: Mon Feb 1 21:26:52 2010 -0800 HADOOP-6337. Update FilterInitializer class to be more visible and take a conf for further development Patch: https://issues.apache.org/jira/secure/attachment/12434503/HADOOP-6337-Y.patch Patch: https://issues.apache.org/jira/secure/attachment/12434547/HADOOP-6337-Y.patch Author: Jakob Homan Ref: CDH-648 commit 6fa12363baff6e2e11650c0f395b52e9f47d6266 Author: Todd Lipcon Date: Mon Feb 1 10:28:11 2010 -0800 HADOOP-6520. UGI should load tokens from the environment Patch: https://issues.apache.org/jira/secure/attachment/12434423/HADOOP-6520-0_20.2.patch Author: Devaraj Das Ref: CDH-648 commit 256f62f67d848260f25fca2e052fd878b93bfe17 Author: Todd Lipcon Date: Sun Jan 31 22:53:34 2010 -0800 HADOOP-6517, HADOOP-6518. Ability to add/get tokens from UserGroupInformation Patch: https://issues.apache.org/jira/secure/attachment/12434368/HADOOP-6518-0_20.1.patch Author: Owen O'Malley Ref: CDH-648 commit 34c6a146be045127f5e43828fd3c578ebb3c113c Author: Todd Lipcon Date: Fri Jan 22 16:03:23 2010 -0800 MAPREDUCE-1376. Support for varied user submission in Gridmix Patch: https://issues.apache.org/jira/secure/attachment/12431174/M1376-4.patch Patch: https://issues.apache.org/jira/secure/attachment/12440324/1376-5-yhadoop20-100.patch Author: Chris Douglas Ref: YDH commit e02e4b0320d45fe401470e72d62b73d18a3dd579 Author: Todd Lipcon Date: Sun Jan 31 20:00:10 2010 -0800 HADOOP-6299. Use JAAS LoginContext for our login Patch: https://issues.apache.org/jira/secure/attachment/12434362/HADOOP-6299-Y20.patch Author: Owen O'Malley Ref: CDH-648 commit e1ab72fadf1fc48bb21a44c62edd39b3883a392a Author: Todd Lipcon Date: Fri Jan 29 00:00:30 2010 +0530 Amend MAPREDUCE-842. Per-job local data on the TaskTracker node should have right access-control Reason: follow-up patch to fix a backport bug Patch: https://issues.apache.org/jira/secure/attachment/12431690/MR-842-follow-up.patch Author: Vinod K V Ref: CDH-648 commit 662c95fb0e8b2aaefb64362119aef66a04268eb1 Author: Todd Lipcon Date: Wed Jan 27 21:58:11 2010 +0530 MAPREDUCE-1186. While localizing a DistributedCache file, TT sets permissions recursively on the whole base-dir Patch: https://issues.apache.org/jira/secure/attachment/12431573/1186.20S-6.patch Author: Amareshwari Sriramadasu Reason: performance Ref: YDH commit 137e608a13a26b47883cea6d03a48974b2f16c68 Author: Todd Lipcon Date: Tue Jan 26 23:40:06 2010 -0800 HDFS-899. Delegation Token Implementation Patch: https://issues.apache.org/jira/secure/attachment/12431529/HDFS-899-0_20.2.patch Author: Jitendra Nath Pandey Ref: CDH-648 commit db6344ec8a93fb65830f7902e6566a96619ff7dd Author: Todd Lipcon Date: Tue Jan 26 15:10:41 2010 +0530 MAPREDUCE-896. Users can set non-writable permissions on temporary files for TT and can abuse disk usage. Patch: https://issues.apache.org/jira/secure/attachment/12431413/MR-896.v8-y20.patch Author: Ravi Gummadi Ref: CDH-648 commit 2aff67e0291b9641d2e17a7288faa694efe16976 Author: Todd Lipcon Date: Mon Jan 25 20:36:44 2010 +0530 MAPREDUCE-744. Support in DistributedCache to share cache files with other users after HADOOP-4493 Patch: https://issues.apache.org/jira/secure/attachment/12431313/744-6-y20.patch Author: Devaraj Das Ref: CDH-648 commit d1b26621983f80167bf3af5b38ae48467c739f14 Author: Todd Lipcon Date: Sat Jan 23 20:27:58 2010 +0530 MAPREDUCE-1140. Per cache-file refcount can become negative when tasks release distributed-cache files Patch: https://issues.apache.org/jira/secure/attachment/12431213/patch-1140-3-y20.txt Author: Amareshwari Sriramadasu Ref: CDH-648 commit 6804e20bd4d9ee5e0005b61d202ce7dd928b5b22 Author: Todd Lipcon Date: Sat Jan 23 20:01:51 2010 +0530 MAPREDUCE-1284. TestLocalizationWithLinuxTaskController fails Patch: https://issues.apache.org/jira/secure/attachment/12427577/MR-1284.patch Author: Ravi Gummadi Ref: CDH-648 commit 9e3f0d458c0ac31bad77cc336b6fdf0206fbe0d6 Author: Todd Lipcon Date: Sat Jan 23 14:00:14 2010 +0530 MAPREDUCE-1098. Incorrect synchronization in DistributedCache causes TaskTrackers to freeze up during localization of Cache for tasks. Patch: https://issues.apache.org/jira/secure/attachment/12431207/patch-1098-7-y20.txt Author: Amareshwari Sriramadasu Ref: CDH-648 commit d3131417c36e68cb59ad0833d271d10bd869b27c Author: Todd Lipcon Date: Fri Jan 22 17:45:33 2010 -0800 MAPREDUCE-1338. Add ability to store and load security keys Patch: https://issues.apache.org/jira/secure/attachment/12431172/MAPREDUCE-1338-BP20-3.patch Author: Boris Shkolnik Ref: CDH-648 commit 478ebff927c0a45f72c531952bcaf7632e990a12 Author: Todd Lipcon Date: Fri Jan 22 17:17:08 2010 -0800 HADOOP-6495. Identifier should be serialized after the password is created In Token constructor Patch: https://issues.apache.org/jira/secure/attachment/12431145/HADOOP-6495-0_20.2.patch Author: Jitendra Nath Pandey Ref: CDH-648 commit ef9e572a545e56b790000f16bf6d416b63083520 Author: Todd Lipcon Date: Fri Jan 22 16:23:34 2010 +0530 HADOOP-5457. Failing contrib tests should not stop the rest of the contrib tests Patch: https://issues.apache.org/jira/secure/attachment/12431103/Hadoop-5457-y20.patch Author: Giridharan Kesavan Ref: YDH commit 6b6fdbe4b79d6e623fdbcc60f052749cf99b0c32 Author: Todd Lipcon Date: Thu Jan 21 12:30:20 2010 -0800 Amend HADOOP-4181. Add support for git revision in saveVersion.sh Author: Owen O'Malley Reason: Support git revisions without explicitly passing them in Ref: YDH commit 71d9e9a5b3577937fe06b40cebc6656419324323 Author: Todd Lipcon Date: Thu Jan 21 22:10:58 2010 +0530 MAPREDUCE-856. Localized files from DistributedCache should have right access-control Patch: https://issues.apache.org/jira/secure/attachment/12431040/MAPREDUCE-856-20090908-y20.txt Author: Vinod K V Ref: CDH-648 commit c0826b2e0c43581aa90afff465ddd7401e12b1ee Author: Todd Lipcon Date: Wed Jan 20 16:12:33 2010 +0530 MAPREDUCE-871. Job/Task local files have incorrect group ownership set by LinuxTaskController binary Patch: https://issues.apache.org/jira/secure/attachment/12430867/871.20S.patch Author: Vinod K V Ref: CDH-648 commit 3ca7d8529bcb3cea9640dffaa296c508a07f89a4 Author: Todd Lipcon Date: Wed Jan 20 15:28:40 2010 +0530 MAPREDUCE-476. Extend DistributedCache to work locally (LocalJobRunner) Patch: https://issues.apache.org/jira/secure/attachment/12430866/476.20S-2.patch Author: Philip Zeyliger Ref: YDH commit 5d79da536f9811f47af0e073aa68ca776c24b0da Author: Todd Lipcon Date: Tue Jan 19 20:06:10 2010 +0530 MAPREDUCE-711. Move Distributed Cache from Common to Map/Reduce Patch: https://issues.apache.org/jira/secure/attachment/12430713/711.20S.patch Author: Vinod K V Ref: YDH commit be2df477d6dc267f4a4b7c6602c8108ece1cb783 Author: Todd Lipcon Date: Tue Jan 19 13:30:55 2010 +0530 MAPREDUCE-478. separate jvm param for mapper and reducer Patch: https://issues.apache.org/jira/secure/attachment/12430705/478.20S-1.patch Author: Arun C Murthy Ref: YDH commit 44df01f8009c02e7346b69389ee8a26ef824bba2 Author: Todd Lipcon Date: Tue Jan 19 12:11:07 2010 +0530 MAPREDUCE-842. Per-job local data on the TaskTracker node should have right access-control Patch: https://issues.apache.org/jira/secure/attachment/12430697/842.20S-4.patch Author: Vinod K V Ref: CDH-648 commit 3b9ed4395593a6f67897126126e7c4c74a35c42c Author: Todd Lipcon Date: Fri Jan 15 19:47:22 2010 +0530 MAPREDUCE-408. TestKillSubProcesses fails with assertion failure sometimes Patch: https://issues.apache.org/jira/secure/attachment/12430404/MR-408.v1.1.y20.patch Author: Ravi Gummadi Ref: CDH-648 commit fc472723ed6f5ca78abd4bfca56584489c485ee1 Author: Todd Lipcon Date: Fri Jan 15 19:16:38 2010 +0530 HADOOP-4041. IsolationRunner does not work as documented Patch: https://issues.apache.org/jira/secure/attachment/12430398/HADOOP-4041-v4-y20.patch Author: Philip Zeyliger Ref: YDH commit c759d3e421565c13c79d6091d1917ce57cbb6636 Author: Todd Lipcon Date: Wed Jan 13 22:27:07 2010 -0800 MAPREDUCE-1316. Fix jobs' retirement from the JobTracker to prevent memory leaks via stale references. Patch: https://issues.apache.org/jira/secure/attachment/12430197/mapreduce-1316-v1.15-branch20-yahoo.patch Author: Amar Kamat Ref: YDH commit 10024643cacf3d40faa870505c83dd344f8ff366 Author: Todd Lipcon Date: Wed Jan 13 15:28:34 2010 -0800 MAPREDUCE-1342. Fixed deadlock in global blacklisting of tasktrackers. Patch: https://issues.apache.org/jira/secure/attachment/12430116/patch-1342-3-ydist.txt Author: Amareshwari Sriramadasu Ref: YDH commit 675d02da77a6db7b98e8a30afda2926fe768fe3e Author: Todd Lipcon Date: Tue Jan 12 15:43:25 2010 -0800 MAPREDUCE-181. Secure job submission Patch: https://issues.apache.org/jira/secure/attachment/12430064/181.20.s.3.patch Patch: https://issues.apache.org/jira/secure/attachment/12436083/jobclient.patch Patch: https://issues.apache.org/jira/secure/attachment/12440358/181.20.s.3.fix.patch Author: Devaraj Das Ref: CDH-648 commit fda013b025b050107afd17120270ec6e5cb99138 Author: Todd Lipcon Date: Tue Jan 12 22:14:37 2010 +0530 HADOOP-5737. UGI checks in testcases are broken Patch: https://issues.apache.org/jira/secure/attachment/12430029/HADOOP-5737-y20.patch Author: Amar Kamat Ref: CDH-648 commit 829bb385ecbde947eeacff39ea2f3f3af703fdbe Author: Todd Lipcon Date: Tue Jan 12 12:34:52 2010 +0530 HADOOP-5771. Create unit test for LinuxTaskController Patch: https://issues.apache.org/jira/secure/attachment/12429998/5771.20S.patch Author: Sreekanth Ramakrishnan Ref: CDH-648 commit 0045482b56026f2858bdd983d09ab2a38e06bfa8 Author: Todd Lipcon Date: Fri Jan 8 18:49:32 2010 -0800 HADOOP-4656, HDFS-685, MAPREDUCE-1083. Add a user to groups mapping service Patch: https://issues.apache.org/jira/secure/attachment/12429805/MR-1083-0_20.2.patch Author: Boris Shkolnik Ref: CDH-648 commit bff776eae3e576e29b3f48546d620469c7a65a6f Author: Todd Lipcon Date: Thu Jan 7 11:15:53 2010 -0800 MAPREDUCE-1250. Refactor job token to use a common token interface Patch: https://issues.apache.org/jira/secure/attachment/12429629/MR-1250-0_20.2.patch Author: Kan Zhang Ref: CDH-648 commit f9bf7f1aa9a663f09e3377671722e4bce0fa5f20 Author: Todd Lipcon Date: Wed Jan 6 13:53:05 2010 -0800 MAPREDUCE-1026. Shuffle should be secure Patch: https://issues.apache.org/jira/secure/attachment/12429584/MR-1026-0_20.2.patch Author: Boris Shkolnik Ref: CDH-648 commit 8761586736722c1c6f2eb3c7ad7d1842383431b6 Author: Todd Lipcon Date: Mon Jan 4 17:54:04 2010 -0800 HADOOP-4268. Permission checking in fsck Patch: https://issues.apache.org/jira/secure/attachment/12428975/HADOOP-4268-0_20.2.patch Author: Tsz Wo (Nicholas), SZE Ref: CDH-648 commit ac18b1312c05f9d85b23686807c9b0120f99eac0 Author: Todd Lipcon Date: Mon Jan 4 14:50:55 2010 -0800 HADOOP-6415. Adding a common token interface for both job token and delegation token Patch: https://issues.apache.org/jira/secure/attachment/12429399/HADOOP-6415-0_20.2.patch Author: Kan Zhang Ref: CDH-648 commit dca4c3a73ff34fb37ff2b92c7d6ba2331cd1405d Author: Todd Lipcon Date: Fri Dec 25 13:56:07 2009 -0800 HDFS-764 and HADOOP-6367. Moving Access Token implementation from Common to HDFS Patch: https://issues.apache.org/jira/secure/attachment/12428959/HADOOP-6367_HDFS-764-0_20.1.patch Author: Kan Zhang Ref: CDH-648 commit 6205b10e6bbd702cab014af83061791b35a2248a Author: Todd Lipcon Date: Thu Dec 24 11:50:12 2009 -0800 HDFS-409. Add more access token tests Patch: https://issues.apache.org/jira/secure/attachment/12428924/HDFS-409-0_20.4.patch Author: Kan Zhang Ref: CDH-648 commit c1c67ca1ab0d958c2258c8c1571adb89996f684a Author: Todd Lipcon Date: Thu Dec 24 11:46:03 2009 -0800 HADOOP-6132. RPC client opens an extra connection for VersionedProtocol Patch: https://issues.apache.org/jira/secure/attachment/12428925/HADOOP-6132-0_20.1.patch Author: Kan Zhang Ref: YDH commit 07ba75bcabd5beeecd41c0c9f54b850304ad9225 Author: Todd Lipcon Date: Wed Dec 23 16:39:18 2009 -0800 Amend HDFS-445. Bring DFSClient block caching code more up to date with trunk Patch: https://issues.apache.org/jira/secure/attachment/12428885/HDFS-445-0_20.2.patch Author: Kan Zhang Ref: YDH commit ad1f19c5727c6457a4e37c26cc1ede0dae3b76ec Author: Todd Lipcon Date: Tue Dec 22 18:05:31 2009 -0800 HDFS-195. Need to handle access token expiration when re-establishing the pipeline for dfs write Patch: https://issues.apache.org/jira/secure/attachment/12428788/HDFS-195-0_20.1.patch Author: Kan Zhang Ref: CDH-648 commit 53782d128507af30429e8c697788aa14fa9849c8 Author: Todd Lipcon Date: Tue Dec 22 14:52:12 2009 -0800 HADOOP-6176. Adding a couple private methods to AccessTokenHandler for testing purposes Patch: https://issues.apache.org/jira/secure/attachment/12428771/HADOOP-6176-0_20.2.patch. Author: Kan Zhang Ref: CDH-648 commit 97b3aed79705201bfe0bea392ca19e3fc96cd81e Author: Todd Lipcon Date: Tue Dec 22 12:26:46 2009 -0800 HADOOP-5824. Remove unused OP_READ_METADATA functionality from Datanode Patch: https://issues.apache.org/jira/secure/attachment/12428759/HADOOP-5824-0_20.1.patch Author: Kan Zhang Ref: YDH commit 459d8d98b1ad0675b0e1525dfa23e445e1f82453 Author: Todd Lipcon Date: Tue Dec 22 00:16:39 2009 -0800 HADOOP-4359. Access Token: Support for data access authorization checking on DataNodes Patch: https://issues.apache.org/jira/secure/attachment/12428711/HADOOP-4359-0_20.2.patch Patch: https://issues.apache.org/jira/secure/attachment/12435352/4359.patch Author: Kan Zhang Ref: CDH-648 commit b76311584ce48bdc06c6c1103e45cbc2e2cc9112 Author: Todd Lipcon Date: Wed Dec 16 23:46:27 2009 +0530 MAPREDUCE-1100. Truncate user logs to prevent TaskTrackers' disks from filling up. Patch: https://issues.apache.org/jira/secure/attachment/12428200/MAPREDUCE-1100-20091216.2.txt Author: Vinod K V Ref: YDH commit 444beac7f8610cd3ec9433c8fe5e006462a2d07c Author: Todd Lipcon Date: Tue Dec 15 21:49:36 2009 -0800 HADOOP-6441. Prevent remote XSS attacks in Hostname and UTF-7. Patch: https://issues.apache.org/jira/secure/attachment/12428133/h-6441.20.patch Author: Owen O'Malley Ref: CDH-648 commit 18ec4a074b50ad5a7d8b3148da40d58ed0baf768 Author: Todd Lipcon Date: Tue Dec 15 20:37:18 2009 -0800 MAPREDUCE-1063. Document Gridmix benchmark Patch: https://issues.apache.org/jira/secure/attachment/12427976/M1063-y20-0.patch Author: Chris Douglas Ref: YDH commit 2b04facf5a226b592874137356929aca62320648 Author: Todd Lipcon Date: Tue Dec 15 20:19:19 2009 -0800 MAPREDUCE-1124. TestGridmixSubmission fails sometimes Patch: https://issues.apache.org/jira/secure/attachment/12427971/M1124-y20-1.patch Author: Chris Douglas Ref: YDH commit 4e30197fdddc64b48c0a3fb4575cdea4e5eaaf9b Author: Todd Lipcon Date: Tue Dec 15 18:52:33 2009 +0530 MAPREDUCE-1143. runningMapTasks counter is not properly decremented in case of failed Tasks. Patch: https://issues.apache.org/jira/secure/attachment/12427898/MAPRED-1143-ydist-9.patch Author: rahul k singh Ref: YDH commit e3a25294b2faf7d57fbf69b75060611362a35463 Author: Todd Lipcon Date: Tue Dec 15 15:41:45 2009 +0530 MAPREDUCE-676. Fix Hadoop Vaidya to ensure it works for map-only jobs. Patch: https://issues.apache.org/jira/secure/attachment/12410257/vaidya-patch-06092009.patch Author: Suhas Gogate Ref: YDH commit 600a35b4e22088859c7b12ece7fa67dbfe489c2b Author: Todd Lipcon Date: Tue Dec 15 15:38:41 2009 +0530 HADOOP-5582. Fix Hadoop Vaidya to use new Counters in org.apache.hadoop.mapreduce package. Contributed by Suhas Gogate. Patch: https://issues.apache.org/jira/secure/attachment/12407120/vaidya-0.21.0-5582-5764.patch Author: Suhas Gogate Ref: YDH commit 4ba755db0d9eae199355905193c424d1a8a78dae Author: Todd Lipcon Date: Mon Dec 14 19:45:50 2009 -0800 HDFS-595. FsPermission tests need to be updated for new octal configuration parameter from HADOOP-6234 Patch: https://issues.apache.org/jira/secure/attachment/12427977/HDFS-595-Y20.patch Author: Jakob Homan Ref: YDH commit 0dbd09e5e1a6f3eaa76ad7a54815d03707da5a27 Author: Todd Lipcon Date: Fri Dec 11 13:31:45 2009 +0530 MAPREDUCE-1171. Allow the read-error notification in shuffle to be configurable. Patch: https://issues.apache.org/jira/secure/attachment/12427571/patch-1171-1-ydist.txt Author: Amareshwari Sriramadasu Ref: YDH commit 29334f33eee06ef864f1fed490da870050e5c7ff Author: Todd Lipcon Date: Fri Dec 11 09:13:53 2009 +0530 MAPREDUCE-353. Allow shuffle read and connection timeouts to be configurable. Patch: https://issues.apache.org/jira/secure/attachment/12427566/patch-353-ydist.txt Author: Ravi Gummadi Ref: YDH commit a529046a05bfd89965d08b1ec6d80c1a777a8136 Author: Todd Lipcon Date: Wed Dec 9 10:23:53 2009 +0530 MAPREDUCE-754. NPE in expiry thread when a TT is lost Patch: https://issues.apache.org/jira/secure/attachment/12427347/mapreduce-754-v2.2.1-yahoo.patch Author: Amar Kamat Ref: YDH commit 96ee0a0a723e65ca3dbcbbdaece44e3a752256f0 Author: Todd Lipcon Date: Tue Dec 8 11:27:55 2009 +0530 MAPREDUCE-1185. URL to JT webconsole for running job and job history should be the same Patch: https://issues.apache.org/jira/secure/attachment/12426630/patch-1185-3-ydist.txt Author: Amareshwari Sriramadasu Ref: YDH commit 578be5dcfdece1f48aae8809648ae00f646bb040 Author: Todd Lipcon Date: Fri Dec 4 15:30:39 2009 -0800 HDFS-781. Metrics PendingDeletionBlocks is not decremented Patch: https://issues.apache.org/jira/secure/attachment/12426993/hdfs-781.rel20.1.patch. Author: Suresh Srinivas Ref: YDH commit b350799d72e4a4bec1f76527eb8fe02590295785 Author: Todd Lipcon Date: Mon Nov 30 16:10:53 2009 +0530 HADOOP-4933. ConcurrentModificationException in JobHistory.java Patch: http://issues.apache.org/jira/secure/attachment/12397116/HADOOP-4933-v1.1.patch Author: Amar Kamat Ref: YDH commit 9fa324d4ff152e41e2afada5990d26b0ba296e17 Author: Todd Lipcon Date: Fri Nov 27 11:38:52 2009 +0530 MAPREDUCE-1231. Allow distcp checksumming to be skipped for faster startup time Patch: https://issues.apache.org/jira/secure/attachment/12426265/mapred-1231-y20-v4.patch Author: Jothi Padmanabhan Ref: YDH commit becc6bade8b0d4ef4248cd82da7e7d337bc10cbc Author: Todd Lipcon Date: Tue Nov 24 11:42:17 2009 -0800 HDFS-758. Changes to report decommissioning status on namenode web UI. Patch: https://issues.apache.org/jira/secure/attachment/12426000/HDFS-758.5.0-20.patch Author: Jitendra Nath Pandey Ref: YDH commit 9d9e86678faa54e23c3ac41c1b8fdb6b379e9b5d Author: Todd Lipcon Date: Fri Nov 20 17:01:21 2009 -0800 HADOOP-6234. Permission configuration files should use octal and symbolic Patch: https://issues.apache.org/jira/secure/attachment/12425635/COMMON-6234.rel20.1.patch Author: Jakob Homan Ref: YDH commit c446d2df912c744705ecc72bf98f424973eb0817 Author: Todd Lipcon Date: Thu Nov 19 11:52:38 2009 -0800 MAPREDUCE-1219. Fixed JobTracker to not collect per-job metrics, thus easing load on it. Patch: https://issues.apache.org/jira/secure/attachment/12425302/patch-1219-ydist.txt Author: Amareshwari Sriramadasu Ref: YDH commit f6b78b61fda941b83973de1dceebf0549c9eaca9 Author: Todd Lipcon Date: Tue Nov 17 23:22:41 2009 +0000 HADOOP-6203. Improve error message when moving to trash fails due to quota issue Patch: https://issues.apache.org/jira/secure/attachment/12425243/c6203_20091116_0.20.patch Author: Boris Shkolnik Ref: YDH commit ec1f09b887a729a7682047248c624a42584d7233 Author: Todd Lipcon Date: Mon Nov 16 19:35:16 2009 +0000 HADOOP-5675. DistCp should not launch a job if it is not necessary Patch: https://issues.apache.org/jira/secure/attachment/12406687/5675_20090428.patch Author: Tsz Wo (Nicholas), SZE Ref: YDH commit f7e4f728e818e137066caaa1f0a277a5485a5080 Author: Todd Lipcon Date: Mon Nov 9 16:24:12 2009 -0800 MAPREDUCE-1196. MAPREDUCE-947 incompatibly changed FileOutputCommitter Patch: https://issues.apache.org/jira/secure/attachment/12424351/MAPREDUCE-1196_yhadoop20.patch Author: Arun C Murthy Ref: YDH commit 03a613af4dafb3212cf52833898f00c2b1f6195d Author: Todd Lipcon Date: Thu Nov 5 17:31:56 2009 -0800 HDFS-625. ListPathsServlet throws NullPointerException Patch: https://issues.apache.org/jira/secure/attachment/12424176/hdfs-625.0-20.patch Author: Suresh Srinivas Ref: YDH commit 6c2dc76b06cb9967d89e6ab94465e0668b921dfa Author: Todd Lipcon Date: Thu Nov 5 14:46:23 2009 -0800 HADOOP-6343. Stack trace of any runtime exceptions should be recorded in the server logs. Patch: https://issues.apache.org/jira/secure/attachment/12424150/HADOOP-6343.0-20.patch Author: Jitendra Nath Pandey Ref: YDH commit 3bd620eff0f312008d11e29dffea2fa62457a630 Author: Todd Lipcon Date: Thu Oct 29 19:11:25 2009 -0700 HADOOP-6344. rm and rmr fail to correctly move the user's files to the trash prior to deleting when they are over quota. Patch: https://issues.apache.org/jira/secure/attachment/12423634/HDFS-740-for-Y20.patch Author: Jakob Homan Ref: YDH commit ef18bb354fc9cd2b0f99bc141e094d328f4f1f14 Author: Todd Lipcon Date: Thu Oct 29 10:19:27 2009 +0530 MAPREDUCE-1160. Two log statements at INFO level fill up jobtracker logs Patch: https://issues.apache.org/jira/secure/attachment/12423534/MAPREDUCE-1160-20.patch Author: Ravi Gummadi Ref: YDH commit 93ea3b6b97dd23cc0a69eacd2438f11e8a64be54 Author: Todd Lipcon Date: Wed Oct 28 21:12:36 2009 +0530 MAPREDUCE-1158. running_maps metric is not decremented when the tasks of a job is killed/failed Patch: https://issues.apache.org/jira/secure/attachment/12423451/1158_yahoo.patch Author: Sharad Agarwal Ref: YDH commit 5165b65008a62a834903f46db18b991dfe2aeacf Author: Todd Lipcon Date: Mon Oct 26 09:21:56 2009 +0530 MAPREDUCE-1062. MRReliability test does not work with retired jobs Patch: https://issues.apache.org/jira/secure/attachment/12422201/mapreduce-1062-3-ydist.patch Author: Sreekanth Ramakrishnan Ref: YDH commit 6ae254e4aef44f833859bb060797bd3177085d4e Author: Todd Lipcon Date: Sun Oct 25 17:41:49 2009 +0530 MAPREDUCE-1090. Modify log statement in Tasktracker log related to memory monitoring to include attempt id. Patch: https://issues.apache.org/jira/secure/attachment/12423142/MAPREDUCE-1090-20.patch Author: Hemanth Yamijala Ref: YDH commit 07d06691d3d22dc7055568b8ca574ff264faf6ac Author: Todd Lipcon Date: Sun Oct 25 16:05:47 2009 +0530 MAPREDUCE-1048. Show total slot usage in cluster summary on jobtracker webui Patch: http://issues.apache.org/jira/secure/attachment/12423136/MAPREDUCE-1048-20.patch Author: Amareshwari Sriramadasu Ref: YDH commit 2041bfbf1352a83f450b8fb6680e3899a3582f5f Author: Todd Lipcon Date: Sat Oct 24 17:42:59 2009 +0530 MAPREDUCE-1103. Additional JobTracker metrics for slot usage Patch: https://issues.apache.org/jira/secure/attachment/12423030/1103_v5_yahoo_1.patch Author: Sharad Agarwal Ref: YDH commit b68a6a3c45b48d7681f5e8dc51571b161a90daec Author: Todd Lipcon Date: Thu Oct 22 20:31:26 2009 +0530 MAPREDUCE-947. OutputCommitter should have an abortJob method Patch: https://issues.apache.org/jira/secure/attachment/12422899/mr-947-y20.patch Patch: https://issues.apache.org/jira/secure/attachment/12423191/yhadoop20-bug-fix-947.patch Author: Amar Kamat Ref: YDH commit a268e40988a356cb7d6912906b88c8752c226656 Author: Todd Lipcon Date: Wed Oct 21 23:06:57 2009 +0530 MAPREDUCE-1105. CapacityScheduler: It should be possible to set queue hard-limit beyond its actual capacity Patch: https://issues.apache.org/jira/secure/attachment/12422823/MAPREDUCE-1105-yahoo-version20-5.patch Author: rahul k singh Ref: YDH commit e529fcd5080e89bc0e759164d7b7cc6fc19d8f69 Author: Todd Lipcon Date: Wed Oct 21 11:32:59 2009 +0530 MAPREDUCE-1086. hadoop commands in streaming tasks are trying to write to tasktracker's log Patch: https://issues.apache.org/jira/secure/attachment/12422677/MR-1086-yhadoop20.patch Author: Ravi Gummadi Ref: YDH commit 4d2f9fdf63f30f0149f60142796838e245e7d564 Author: Todd Lipcon Date: Sun Oct 18 23:21:37 2009 -0700 MAPREDUCE-1088. JobHistory files should have narrower 0600 perms Patch: https://issues.apache.org/jira/secure/attachment/12422526/MAPREDUCE-1088_yhadoop20.patch Author: Arun C Murthy Ref: CDH-648 commit 13e93cafe8d4b1e8b741c1873118cdba0313a564 Author: Todd Lipcon Date: Sun Oct 18 23:19:27 2009 -0700 HADOOP-6304. Use java.io.File.set{Readable|Writable|Executable} where possible in RawLocalFileSystem Patch: https://issues.apache.org/jira/secure/attachment/12422525/HADOOP-6304_yhadoop20.patch Author: Arun C Murthy Ref: YDH commit e5b918e037e5a01a4098b43a20e3437b34022328 Author: Todd Lipcon Date: Thu Oct 15 16:26:26 2009 +0530 HADOOP-6284. Add new HADOOP_JAVA_PLATFORM_OPTS passed to the java PlatformName command Patch: http://issues.apache.org/jira/secure/attachment/12421342/HADOOP-6284-y0.20.1.patch Author: Koji Noguchi Ref: YDH commit 5b18d7b8a13873ca3b0cb3f5da074f3ee846e63c Author: Todd Lipcon Date: Thu Oct 15 16:05:41 2009 +0530 MAPREDUCE-732. Node health check script should not log "UNHEALTHY" status for every heartbeat in INFO mode Patch: http://issues.apache.org/jira/secure/attachment/12413001/MAPRED-732-ydist.patch Author: Sreekanth Ramakrishnan Ref: YDH commit f2f02dce3f12d9fe445f62c6a28a7e89c1f33efa Author: Todd Lipcon Date: Thu Oct 15 15:47:51 2009 +0530 MAPREDUCE-144. TaskMemoryManager should log process-tree's status while killing tasks. Patch: http://issues.apache.org/jira/secure/attachment/12418917/MAPREDUCE-144-20090907.internal.txt Author: Vinod K V Reason: This helps a lot in debugging why a particular task has gone beyond memory limits. Ref: YDH commit a77ecd569495efc6bee0059eeafeebbfe6c797c4 Author: Todd Lipcon Date: Thu Oct 15 11:37:22 2009 +0530 MAPREDUCE-277. Job history counters should be available on the UI. Patch: https://issues.apache.org/jira/secure/attachment/12421419/patch-277-0.20.txt Author: Jothi Padmanabhan Ref: YDH commit ce8f674e1c1dc6ff24b33210f726ef4b006552b2 Author: Todd Lipcon Date: Thu Oct 8 17:00:59 2009 -0700 HDFS-587. Test programs support only default queue. Patch: http://issues.apache.org/jira/secure/attachment/12422760/jira.HDFS-587.branch-0.20-internal.1.patch Author: Erik Steffl Ref: YDH commit 5ef23c2c9af26024567584fb6308645c58db8088 Author: Todd Lipcon Date: Mon Sep 28 14:28:46 2009 -0700 MAPREDUCE-270. Fix the tasktracker to optionally send an out-of-band heartbeat on task-completion for better job-latency. Configuration changes: add mapreduce.tasktracker.outofband.heartbeat Patch: https://issues.apache.org/jira/secure/attachment/12420718/MAPREDUCE-270_yhadoop20.patch Author: Arun C Murthy Reason: increase scheduling throughput for short tasks Ref: YDH commit bfa424ff3808e6dc20199ecc7d52f2592afdbd3a Author: Todd Lipcon Date: Mon Sep 28 13:59:32 2009 -0700 MAPREDUCE-1030. Fix capacity-scheduler to assign a map and a reduce task per-heartbeat. Patch: http://issues.apache.org/jira/secure/attachment/12420549/MAPREDUCE-1030-2.patch.txt Author: rahul k singh Ref: YDH commit 48bfdd9b2a6eac72ac42b0defe5e86501001a7ab Author: Todd Lipcon Date: Mon Sep 28 13:54:07 2009 -0700 MAPREDUCE-1028. Fixed number of slots occupied by cleanup tasks to one irrespective of slot size for the job. Patch: http://issues.apache.org/jira/secure/attachment/12420581/yhadoop-0.20-MR1028.patch Author: Ravi Gummadi Ref: YDH commit 3ce342baafd3774e4d920a7fcb49a7e091a0cad1 Author: Todd Lipcon Date: Mon Sep 28 13:36:31 2009 -0700 MAPREDUCE-964. Fixed start and finish times of TaskStatus to be consistent, thereby fixing inconsistencies in metering tasks. Patch: http://issues.apache.org/jira/secure/attachment/12420539/mapreduce-964-ydist.patch Patch: http://issues.apache.org/jira/secure/attachment/12420893/mapreduce-964-ydist-1.patch Author: Sreekanth Ramakrishnan Ref: YDH commit 2219e76392d0bf29d8c40bf2b60d23d7b188ac3d Author: Todd Lipcon Date: Thu Sep 24 16:08:33 2009 -0700 HADOOP-5976. Add a new command, classpath, to the hadoop script. Contributed by Owen O'Malley Patch: http://issues.apache.org/jira/secure/attachment/12420325/script.patch Author: Owen O'Malley and Gary Murry Ref: YDH commit 1e8994a568a45a994ef7c2af354ac1ddc2c1586b Author: Todd Lipcon Date: Thu Sep 24 16:07:40 2009 -0700 HADOOP-5784. Makes the number of heartbeats that should arrive a second at the JobTracker configurable. Patch: http://issues.apache.org/jira/secure/attachment/12420257/HADOOP-5784_yhadoop20.patch Author: Amareshwari Sriramadasu Reason: Improve job latency on small clusters Ref: YDH commit eac5f2a5d51414c8fea6ea9792e21cf85433d017 Author: Todd Lipcon Date: Thu Sep 24 16:06:26 2009 -0700 MAPREDUCE-945. Modifies MRBench and TestMapRed to use ToolRunner so that options such as queue name can be passed via command line. Patch: http://issues.apache.org/jira/secure/attachment/12418910/mapreduce-945-internal-3.8.patch.txt Author: Sreekanth Ramakrishnan Ref: YDH commit 073f548e560fd8de055d8d075ac7c5db0239f6cf Author: Todd Lipcon Date: Thu Sep 3 11:25:54 2009 -0700 HADOOP-6227. Configuration does not lock parameters marked final if they have no value. Patch: http://issues.apache.org/jira/secure/attachment/12418242/patch-6227-ydist.txt Author: Amareshwari Sriramadasu Ref: YDH commit fe36ce2d38b60a1fe1541555e172cc05473debec Author: Todd Lipcon Date: Thu Sep 3 10:54:21 2009 -0700 Amend HADOOP-5363. Removed pickOneAddress function. Author: zhiyong zhang Ref: YDH commit e37a57a386abb6f03336097f2d7b0d54d4ec6a82 Author: Todd Lipcon Date: Mon Aug 31 10:23:19 2009 -0700 HADOOP-5780: Fix slightly confusing log from "-metaSave" on NameNode. Patch https://issues.apache.org/jira/secure/attachment/12417831/HADOOP-5780.hadoop-0.20.patch Author: Raghu Angadi Ref: YDH commit b131d77cefee39b7296530b018d59ca4d1516b01 Author: Todd Lipcon Date: Tue Aug 25 09:33:35 2009 -0700 Amend MAPREDUCE-768. Improved version of JobTracker configuration dump that also dumps job queues Author: V.V.Chaitanya Krishna Ref: YDH commit c602e3c58dab89470526d912f32ca05260a18e8c Author: Todd Lipcon Date: Tue Aug 18 09:16:07 2009 -0700 MAPREDUCE-682. Reserved tasktrackers should be removed when a node is globally blacklisted Patch: http://issues.apache.org/jira/secure/attachment/12414313/mapreduce-682-ydist.patch Author: Sreekanth Ramakrishnan Ref: YDH commit 953a6498484ee51bf09691568ecb5e56cdb31034 Author: Todd Lipcon Date: Tue Aug 18 09:14:36 2009 -0700 HADOOP-5420. Support killing of process groups in LinuxTaskController binary Author: Sreekanth Ramakrishnan Ref: YDH commit e2a79393fa3a9f88029f289e89831c5dcbd7274c Author: Todd Lipcon Date: Tue Aug 18 09:12:41 2009 -0700 HADOOP-5488. HADOOP-2721 doesn't clean up descendant processes of a jvm that exits cleanly after running a task successfully Author: Ravi Gummadi Reason: Avoid zombie processes Ref: YDH commit 000ab92d9544f483b3c59c0c100154badf8fd8a6 Author: Todd Lipcon Date: Tue Aug 18 09:07:19 2009 -0700 MAPREDUCE-467. Collect information about number of tasks succeeded / total per time unit for a tasktracker. Author: Sharad Agarwal Reason: Useful operational feature Ref: YDH commit ef2406bed6475cd6665f3601e9d78972beed739f Author: Todd Lipcon Date: Thu Aug 13 09:35:35 2009 -0700 MAPREDUCE-817. Add a cache for retired jobs with minimal job info and provide a way to access history file url Author: Sharad Agarwal Reason: Reduces memory usage of JT for completed jobs Ref: YDH commit ff22ad890d9228399e36846f308dd42f96c49fde Author: Todd Lipcon Date: Thu Jul 30 17:40:49 2009 -0700 MAPREDUCE-809. Job summary logs from MAPREDUCE-740 show status of completed jobs as RUNNING Author: Arun C Murthy Reason: Bug fix for MAPREDUCE-740 Ref: YDH commit 9ed072be95517e09cbc78333abbc3d5129e2db7d Author: Todd Lipcon Date: Thu Jul 30 17:40:48 2009 -0700 MAPREDUCE-740. Log a job-summary at the end of a job, while allowing it to be configured to use a custom appender if desired. Author: Arun C Murthy Ref: YDH commit cdd93ee3bca2b400f7b193c5b6527705262c4769 Author: Todd Lipcon Date: Thu Jul 30 17:40:47 2009 -0700 MAPREDUCE-771. Setup and cleanup tasks remain in UNASSIGNED state for a long time on tasktrackers with long running high RAM tasks. Author: Hemanth Yamijala Reason: Bug fix Ref: YDH commit e53741132f4e458382899f5181e4c3a45a199113 Author: Todd Lipcon Date: Thu Jul 30 17:40:47 2009 -0700 MAPREDUCE-733. When running ant test TestTrackerBlacklistAcrossJobs, losing task tracker heartbeat. Author: Arun C Murthy Reason: Bug fix Ref: YDH commit ce660087bdc95831ee5d2d18621bbdafb2c7e3fb Author: Todd Lipcon Date: Thu Jul 30 17:40:46 2009 -0700 MAPREDUCE-734. ConcurrentModificationException observed in unreserving slots for HiRam Jobs Author: Arun Murthy Ref: YDH commit a44f3f66cbc30bf5493aa6a3d21c3b6ca42fbac6 Author: Todd Lipcon Date: Thu Jul 30 17:40:44 2009 -0700 MAPREDUCE-693. Conf files not moved to "done" subdirectory after JT restart Author: Amar Kamat Reason: Improves stability of JobTracker job recovery Ref: YDH commit 9e729a1e4afd7f691dfd86f38cb89788e8eeee00 Author: Todd Lipcon Date: Thu Jul 30 17:40:44 2009 -0700 MAPREDUCE-722. More slots are getting reserved for HiRAM job tasks then required Author: Vinod K V Reason: More slots were getting reserved for HiRAM job tasks then required Ref: YDH commit 45605c6b29c206b9ed3ec2324f4f709c914ca1e3 Author: Todd Lipcon Date: Thu Jul 30 17:40:42 2009 -0700 MAPREDUCE-709. Node health check script does not display the correct message on timeout Author: Sreekanth Ramakrishnan Reason: Improve usefulness of health check feature Ref: YDH commit 5c24b7d50ba0960f694bce33332e61fe7c5abe68 Author: Todd Lipcon Date: Thu Jul 30 17:40:41 2009 -0700 MAPREDUCE-732. Removed spurious log statements in the node blacklisting logic. Author: Sreekanth Ramakrishnan Ref: YDH commit 6b1a17e13ddaf20b519eba0b49d4b0e8717bd5b9 Author: Todd Lipcon Date: Thu Jul 30 17:40:40 2009 -0700 MAPREDUCE-522. Rewrite TestQueueCapacities to make it simpler and avoid timeout errors Author: Sreekanth Ramakrishnan Reason: Fix unit test failures Ref: YDH commit 73597dcbf6f791bd6e01c3096d41fe65ddc2034c Author: Todd Lipcon Date: Thu Jul 30 17:40:40 2009 -0700 MAPREDUCE-532. Allow admins of the Capacity Scheduler to set a hard-limit on the capacity of a queue Reason: There should be a mechanism to cap the capacity available for a queue/job. Author: Rahul K Singh Ref: YDH commit aea5743326793c6f5aa6dc7f7fc5baf5752528d9 Author: Todd Lipcon Date: Thu Jul 30 17:40:38 2009 -0700 MAPREDUCE-211. Provide a node health check script and run it periodically to check the node health status Reason: Adds ability to preemptively blacklist task-trackers when node health is bad Author: Sreekanth Ramakrishnan Ref: YDH commit a89847e2c69619eff9ced8b86c81bfab321a9918 Author: Todd Lipcon Date: Thu Jul 30 17:40:34 2009 -0700 MAPREDUCE-516. Fix the 'cluster drain' problem in the Capacity Scheduler wrt High RAM Jobs Reason: When a HighRAMJob turns up at the head of the queue, the current implementation of support for HighRAMJobs in the Capacity Scheduler has a problem in that the scheduler stops assigning tasks to all TaskTrackers in the cluster until a HighRAMJob finds a suitable TaskTrackers for all its tasks. Author: Arun C Murthy Ref: YDH commit 7a6862110776544476ac1066e3dbade4d1456567 Author: Todd Lipcon Date: Thu Jul 30 17:40:31 2009 -0700 HADOOP-5980. LD_LIBRARY_PATH not passed to tasks spawned off by LinuxTaskController Reason: Security Author: Sreekanth Ramakrishnan Ref: CDH-648 commit d37609510f33ad26bbe6bf3c3d235b34b804f93a Author: Todd Lipcon Date: Thu Jul 30 17:40:28 2009 -0700 HADOOP-5420. Support killing of process groups in LinuxTaskController binary Reason: Security - prevent orphaning forked child processes Author: Sreekanth Ramakrishnan Ref: CDH-648 commit 4c3c667f54a058d0f2e746ceb2e744f56dd9515a Author: Todd Lipcon Date: Thu Jul 30 17:40:24 2009 -0700 HADOOP-5801. JobTracker should refresh the hosts list upon recovery Reason: YDH Author: Amar Kamat Ref: YDH commit 91d28f32f9db514661cc9bd755c8e85756c09cfc Author: Todd Lipcon Date: Thu Jul 30 17:40:26 2009 -0700 HADOOP-5818. Revert the renaming from checkSuperuserPrivilege to checkAccess by HADOOP-5643 Author: Amar Kamat Ref: YDH commit b46f960ff5488b6d6ace47e127257eb1b0fbc330 Author: Todd Lipcon Date: Thu Jul 30 17:40:23 2009 -0700 HADOOP-5643. Add ability to blacklist a TaskTracker Author: Amar Kamat Ref: YDH commit ebb508c5a286dc3939d960fbf44ca18b34f1c12f Author: Todd Lipcon Date: Thu Jul 30 17:40:22 2009 -0700 HADOOP-5419. Provide a way for users to find out what operations they can do on which M/R queues Reason: Security Author: Rahul K Singh Ref: CDH-648 commit feb0e489f3e9757db541ea1694fe49f902e93f8c Author: Todd Lipcon Date: Thu Jul 30 17:40:17 2009 -0700 HADOOP-5739 / MAPREDUCE-521. After JobTracker restart Capacity Scheduler does not schedule pending tasks from already running tasks. Reason: YDH Author: Rahul K Singh Ref: YDH commit 32bac3250a29cc47985fc88edadf0844d2519045 Author: Todd Lipcon Date: Thu Jul 30 17:40:17 2009 -0700 HADOOP-5396. Queue ACLs should be refreshed without requiring a restart of the Job Tracker Reason: Security Author: Vinod K V Ref: CDH-648 commit cd043f04714cf1a9940fe4351d0919011f8e9f86 Author: Todd Lipcon Date: Thu Jul 30 17:40:15 2009 -0700 HADOOP-4490. Tasks should run as the user who submitted the jobe Reason: Security Author: Hemanth Yamijala Ref: CDH-648 commit c64b6a0deb1311e410f01e5d94b9498795cbbaef Author: Todd Lipcon Date: Thu Jul 30 17:40:11 2009 -0700 HADOOP-4930. Implement setuid executable for Linux to launch tasks as job owners Reason: Security Author: Sreekanth Ramakrishnan Ref: CDH-648 commit 5b5972174da804fb6dcb4d0723208bfa42366a31 Author: Eli Collins Date: Tue Aug 24 16:18:50 2010 -0700 CLOUDERA-BUILD. Revert scribe log4j. Ref: CDH-742 commit 2a37b553b8f446f03cb3610b2f7a84f54064f812 Author: Eli Collins Date: Tue Aug 24 14:01:25 2010 -0700 CLOUDERA-BUILD. Revert scribe log4j. Revert "CLOUDERA-BUILD. Apply Scribe patches to Hadoop" This reverts commit cb7a3677942c1d2f9e0d2a75dbffa09fa6125e61. Conflicts: src/contrib/scribe-log4j/ivy.xml Ref: CDH-742 commit ea2a876095da80eccebca35890f437307843eb2c Author: Eli Collins Date: Tue Aug 24 13:55:39 2010 -0700 CLOUDERA-BUILD. Revert scribe log4j. Revert "CLOUDERA-BUILD. Add dependency libraries for Scribe/log4j" This reverts commit aaeb69f8dda72a2e7aecacd622e99c00bc961efa. Ref: CDH-742 commit e463bba27fcae3ea83a8d33a64a8c1c38c2a7578 Author: Eli Collins Date: Tue Aug 24 13:41:13 2010 -0700 CLOUDERA-BUILD. Revert scribe log4j. Revert "CLOUDERA-BUILD. Fix scribe-log4j's ivy.xml to properly get log4j on the compile classpath" This reverts commit 349281bfa0243f5adbbd459266f4a9ac7ac8c1cc. Ref: CDH-742 commit c912024353450a0fa2c53a95500b4ed653f76129 Author: Eli Collins Date: Tue Aug 24 22:53:23 2010 -0700 MAPREDUCE-118. Job.getJobID() will always return null. Reason: Bug Author: Amareshwari Sriramadasu Ref: DISTRO-20 commit be7cd3b5cec66c22b58caa8053de4258826e7c08 Author: Eli Collins Date: Wed Aug 11 15:07:00 2010 -0700 CLOUDERA-BUILD. Update the default build version. commit 506dc096fcc4a288fc853dfb527d7fa8888dd6f6 Author: Bruno Mahé Date: Fri Jul 16 19:51:45 2010 -0700 CDH-1085. $SYSTEM_LIB_DIR default value shouldn't contain $PREFIX. Description: $SYSTEM_LIB_DIR default value shouldn't contain $PREFIX. $PREFIX will be prepended later on Reason: Bug Author: Bruno Mahe Ref: CDH-1085 commit b7cba5f7ab2cb9f2240b45dd90c34f4974c5757a Author: Bruno Mahé Date: Mon Jul 12 20:17:48 2010 -0700 CDH-1085. Native libraries should be installed in /usr/lib64/ on 64bit redhat Description: On 64bit redhat, native libraries should be installed in /usr/lib64/ instead of /usr/lib/. This patch makes possible to override the destination of native libraries and will default to /usr/lib/. Reason: Bug Author: Bruno Mahe Ref: CDH-1085 commit 9b72d268a0b590b4fd7d13aca17c1c453f8bc957 Author: Eli Collins Date: Sun Jun 27 18:42:45 2010 -0700 CLOUDERA-BUILD. Make symlinks so old hadoop jar names are preserved (CDH-1543). commit 4c50269dda2038d202ddb890ffde38dc3fb2ead2 Author: Aaron Kimball Date: Thu Jun 24 18:25:09 2010 -0700 MAPREDUCE-1887. MRAsyncDiskService does not properly absolutize volume root paths. Description: In MRAsyncDiskService, volume names are sometimes specified as relative paths, which are not converted to absolute paths. This can cause errors of the form "cannot delete </full/path/to/foo> since it is outside of <relative/volume/root>" even though the actual path is inside the root. Reason: Bug Author: Aaron Kimball Ref: CDH-1509 commit 43ccf90369692c4d8b7d13a7f04b0864c55f615a Author: Todd Lipcon Date: Wed Jun 23 17:35:08 2010 -0700 HDFS-1266. Add Apache License Notice to several places where it was missing Description: Adds license headers to source code Reason: Apache policy Author: Todd Lipcon Ref: CDH-1495 commit bf08bde983501e3ce8ebf6197049262518580611 Author: Todd Lipcon Date: Wed Jun 23 16:14:50 2010 -0700 HDFS-1260. tryUpdateBlock should do validation before renaming meta file Description: Solves bug where block became inaccessible in certain failure conditions (particularly network partitions). Observed under HBase workload at user site. Reason: Potential loss of synced data when write pipeline fails Author: Todd Lipcon Ref: CDH-659 commit 7243001d5511922f293f0641cb8dbc0af4850dae Author: Todd Lipcon Date: Fri Jun 18 16:13:45 2010 -0700 HDFS-1254. Enable append feature by default Description: Changes dfs.support.append to "true" in hdfs-default.xml Reason: Append/sync have been tested in CDH3b2 and are safe to use. Author: Dhruba Borthakur Ref: CDH-659 commit 0e1d71c08923bb4c4172ef043b0b2d82f95b92fa Author: Todd Lipcon Date: Sat Jun 19 16:26:39 2010 -0700 HDFS-1252. Updates to TestDFSConcurrentFileOperations (test was previously broken) Description: Fixes TestDFSConcurrentFileOperations to test the correct semantics for sync feature Reason: Test was previously flaky Author: Todd Lipcon Ref: CDH-659 commit 829497f4867a0e92da712faf02f83c7087df07ce Author: Eli Collins Date: Fri Jun 18 19:31:58 2010 -0700 CLOUDERA-BUILD. Remove Sqoop from the build. commit 298fda37c4c25434a15886ee9c261e566d595dff Author: Aaron Kimball Date: Fri Jun 18 18:42:37 2010 -0700 HADOOP-5203. TT's version build is too restrictive. Description: Use the md5sum checksum of the source for determining version compatibility. Reason: Improvement Author: Rick Cox (0.20 backport by Bill Au) Ref: CDH-1139 commit f07b2df591b91c7de50e8dbb526cf11b27a32a6f Author: Aaron Kimball Date: Fri Jun 18 17:58:53 2010 -0700 MAPREDUCE-679. XML-based metrics as JSP servlet for JobTracker Description: A simple XML translation of the existing JobTracker status page which provides the same metrics (including the tables of running/completed/failed jobs) as the human-readable page. This is a relatively lightweight addition to provide some machine-understandable metrics reporting. Reason: Improvement Author: Aaron Kimball Ref: CDH-651 commit d8dc8dad821a02619afdbfc3d1cb978b86cb071b Author: Aaron Kimball Date: Fri Jun 18 17:24:07 2010 -0700 MAPREDUCE-1372. ConcurrentModificationException in JobInProgress Description: Fixes a ConcurrentModificationException in JobInProgress Reason: Bug Author: Dick King Ref: CDH-546 commit e212ca0b0abbd78cdea4596fe9f3c6dbbaa57258 Author: Aaron Kimball Date: Fri Jun 18 16:20:01 2010 -0700 MAPREDUCE-1378. Args in job details links on jobhistory.jsp are not URL encoded Description: The logFile argument in the job links on the JT jobhistory.jsp page is not properly URL encoded leading to links that result in 500 errors. Reason: Bug Author: Eric Sammer Ref: CDH-645 commit 23e68e669a118d34e265af5e8ffda3615c2666f9 Author: Aaron Kimball Date: Fri Jun 18 15:52:15 2010 -0700 MAPREDUCE-1570. Shuffle stage - Key and Group Comparators Description: Shuffle method in org.apache.hadoop.mrunit.MapReduceDriverBase doesn't currently allow the use of custom GroupingComparator and SortComparator. This patch adds these features. Reason: Improvement Author: Chris White Ref: CDH-958 commit 4601521a9793255e8b5881d64ff1a921451bc951 Author: Aaron Kimball Date: Fri Jun 18 15:48:41 2010 -0700 MAPREDUCE-739. Allow relative paths to be created inside archives. Description: Allow creating archives with relative paths with a -p option on the command line. Archives currently stores the full path from the input sources – since it allows multiple sources and regular expressions as inputs. So the created archives have the full path of the input sources. This is un intuitive and a user hassle. We should get rid of it and allow users to say that the created archive should be relative to some absolute path and throw an exception if the input does not confirm to the relative absolute path. Reason: Improvement Author: Mahadev konar Ref: CDH-501 commit 1d4e15f0f8b749981d62bfca9849e0d0493afdad Author: Todd Lipcon Date: Thu Jun 17 20:02:51 2010 -0700 HDFS-1247. Improvements to HDFS-1204 test Reason: Fixes compile warnings Author: Todd Lipcon Ref: CDH-659 commit 1fab52d87c29bc7117eb7324d1a152d8d889f62b Author: Todd Lipcon Date: Wed Jun 2 18:25:11 2010 -0700 HDFS-1246. Manual tool to test sync on a cluster Description: Tool for automated testing that sync maintains every edit after kill -9 Reason: Cluster Testing of Sync support for CDH3 Author: Todd Lipcon Ref: CDH-659 commit b9259a145f516a01ba37a33b3803c88824fd55e5 Author: Todd Lipcon Date: Thu Jun 17 09:55:31 2010 -0700 HDFS-1240. Fix failing TestDFSShell due to HDFS-909 backport on branch-20 Reason: Fix red build Author: Todd Lipcon Ref: CDH-659 commit 7276208c2789f2c3961c6dc9fa1d2757774971b1 Author: Todd Lipcon Date: Wed Jun 16 12:16:25 2010 -0700 HDFS-1243. Replication tests in TestFileAppend4 should wait for a second for replication to occur Reason: Test error - fix sporadic failure of TestFileAppend4 Author: Todd Lipcon Ref: CDH-659 commit dc1797ec8380b07117bbc6d662e2f1f56b25e6bd Author: Todd Lipcon Date: Tue Jun 15 17:56:43 2010 -0700 HDFS-1207. stallReplicationWork should be marked volatile in FSNamesystem Description: Small bug fix for code used by tests only Reason: Fix sporadic failure of TestFileAppend4 Author: Todd Lipcon Ref: CDH-659 commit a960eea40dbd6a4e87072bdf73ac3b62e772f70a Author: Todd Lipcon Date: Sun Jun 13 23:02:38 2010 -0700 HDFS-1197. Received blocks should not be added to block map prematurely for under construction files Description: Fixes a possible dataloss scenario when using append() on real-life clusters. Also augments unit tests to uncover similar bugs in the future by simulating latency when reporting blocks received by datanodes. Reason: Append support dataloss bug Author: Todd Lipcon Ref: CDH-659 commit 3cc1405289ac4ec6616a5ba9da18ff421a93678e Author: Todd Lipcon Date: Mon Jun 14 01:43:18 2010 -0700 HDFS-1209. Add parameter dfs.client.block.recovery.retries to determine how many times to try to recover block Reason: Used by append tests Author: Todd Lipcon Ref: CDH-659 commit 128395ae4d317204fe8fb118333270826adf96d5 Author: Todd Lipcon Date: Sun Jun 6 16:38:21 2010 -0400 HDFS-1118. DFSOutputStream socket leak when can't connect to DN Reason: Fixes DFS Client socket leaks in an error condition Author: Zheng Shao Ref: CDH-659 commit 4ba384d2b9f92f7300ce06b35a967e4edc3ba671 Author: Todd Lipcon Date: Fri Jun 4 15:10:00 2010 -0700 HADOOP-6762. Interrupting a thread performing an RPC should not hang that thread. Description: Moves the sending of parameters for RPC calls to a separate thread, such that interrupting a thread that is making an RPC call does not negatively affect the shared RPC channel. Reason: Fixes occasional hangs of HBase under heavy load during failure testing. Author: Sam Rash Ref: CDH-659, CDH-1084 commit 6e99c7e2a12eea782629337f5fb5734e8e5e5865 Author: Todd Lipcon Date: Wed Jun 2 22:32:45 2010 -0700 HDFS-1210. DFSClient should print IOE that caused recovery failure Description: Adds an extra WARN message during DFS client error recovery Reason: Makes it easier to debug/diagnose recovery issues Author: Todd Lipcon Ref: CDH-659 commit 1b8d8c3de261c8334d6eac4f5d3fd42cad894e81 Author: Todd Lipcon Date: Wed Jun 2 21:53:01 2010 -0700 HDFS-1186. Writers should be interrupted when recovery is started, not when it's completed. Description: When the write pipeline recovery process is initiated, this interrupts any concurrent writers to the block under recovery. This prevents a case where some edits may be lost if the writer has lost its lease but continues to write (eg due to a garbage collection pause) Reason: Fixes a potential dataloss bug Author: Todd Lipcon Ref: CDH-659 commit 2ec4301341b249acd0c0cac1792aaa6a6dabab8e Author: Todd Lipcon Date: Thu May 20 00:23:20 2010 -0700 HDFS-915. Write pipeline hangs for too long when ResponseProcessor hits timeout Description: Previously, the write pipeline would hang for the entire write timeout when it encountered a read timeout (eg due to a network connectivity issue). This patch interrupts the writing thread when a read error occurs. Reason: Faster recovery from pipeline failure for HBase and other interactive applications. Author: Todd Lipcon Ref: CDH-659 commit 641090318603c47bfd55e1eea2b039f37e5b723a Author: Todd Lipcon Date: Fri May 14 19:20:10 2010 -0700 HDFS-1218. Replicas that are recovered during DN startup should not be allowed to truncate better replicas. Description: If a datanode loses power and then recovers, its replicas may be truncated due to the recovery of the local FS journal. This patch ensures that a replica truncated by a power loss does not truncate the block on HDFS. Reason: Potential dataloss bug uncovered by power failure simulation Author: Todd Lipcon Ref: CDH-659 commit 46f2b3ad578ea1d2ee2cca4e6467ba2daa57df0e Author: Todd Lipcon Date: Fri May 14 19:34:09 2010 -0700 HDFS-445. pread should refetch block locations when necessary Description: The positional read API in DFSInputStream was previously missing any retry logic. This patch adds this logic. Reason: HBase and other applications depend on the pread API. Author: Kan Zhang Ref: CDH-659 commit aea067a20e16345f307de7efe80935dd7addbe6b Author: Todd Lipcon Date: Fri May 14 19:19:56 2010 -0700 HDFS-1204. LeaseManager expiring leases should only expire the single file, not entire lease Reason: Logic bug in lease recovery could cause incorrectly interrupted writers Author: Sam Rash Ref: CDH-659 commit 10e5944da20d851a847cb2ef422383507d070085 Author: Todd Lipcon Date: Thu May 13 16:33:15 2010 -0700 HDFS-1242. Add unit test for the appendFile race condition / synchronization bug fixed in HDFS-142 Reason: Test coverage for previously applied patch. Author: Todd Lipcon Ref: CDH-659 commit 18174a2abc5a91105ae1adc2bda026d90c41a60b Author: Todd Lipcon Date: Wed May 12 20:06:33 2010 -0700 HDFS-1202. Don't try to update block scan status if block scanner is not initialized yet Reason: Fixes NPE seen at DataNode startup Author: Todd Lipcon Ref: CDH-659 commit ca9e1b3c59b05de9dc4fafa19f24dca80110bcc0 Author: Todd Lipcon Date: Wed May 12 19:28:56 2010 -0700 HDFS-1205. Make async disk service threads nameable Description: HDFS-611 moved some datanode operations to a separate thread pool. This patch ensures that these worker threads have clear names. Reason: Aids debugging/diagnosing of issues Author: Todd Lipcon Ref: CDH-659 commit 1b8316d403ac542772c0745159a7397c798a5698 Author: Todd Lipcon Date: Tue May 11 16:47:47 2010 -0700 HDFS-606. Avoid ConcurrentModification in replica invalidation Description: Replica invalidation iterated over a collection that it also modified, causing a CME. This patch makes a copy before iteration. Performance should be unaffected as this is a rare code path. Reason: Avoid runtime exception in namenode Author: Konstantin Shvachko Ref: CDH-659 commit b7f908bc77d9344c36dcc409bbfe92709b98cf88 Author: Todd Lipcon Date: Thu May 6 08:52:18 2010 -0700 HDFS-1244. Misc improvements to TestFileAppend2 Description: Improvements made to a test case to enable it to be run from the command line, with the various test parameters available in arguments. Reason: Enable long-running stress tests of append functionality. Author: Todd Lipcon Ref: CDH-659 commit 370c9a1e75cc5d5e93cec066006ada0485139fb8 Author: Todd Lipcon Date: Tue Jun 15 18:48:58 2010 -0700 HDFS-1141. completeFile should check lease holder Description: Fixes a bug where a writer could finalize an in-progress file after it had already lost its lease. This could occur for example if the writer entered a GC pause after finishing the last block but before finalizing the file. Reason: Potential dataloss bug with append/sync Author: Todd Lipcon Ref: CDH-659 commit 7f0d67fa52b9c58360b06e851bf77bc2f909f65f Author: Todd Lipcon Date: Wed May 5 14:43:40 2010 -0700 HDFS-1215. Fix TestNodeCount to not infinite loop after HDFS-409 MiniCluster changes Description: Fixes a test to work properly after some test infrastructure was changed by HDFS-142 in branch-0.20-append. Reason: Fixes failing test. Author: Todd Lipcon Ref: CDH-659 commit 77ac4f46fb5c011b5ac7c5fedb4c51b31580c9ba Author: Todd Lipcon Date: Tue Jun 15 18:33:58 2010 -0700 HDFS-1248. Miscellaneous cleanup and improvements on 0.20 append branch Description: Miscellaneous code cleanup and logging changes, including: - Slight cleanup to recoverFile() function in TestFileAppend4 - Improve error messages on OP_READ_BLOCK - Some comment cleanup in FSNamesystem - Remove toInodeUnderConstruction (was not used) - Add some checks for null blocks in FSNamesystem to avoid a possible NPE - Only log "inconsistent size" warnings at WARN level for non-under-construction blocks. - Redundant addStoredBlock calls are also not worthy of WARN level - Add some extra information to a warning in ReplicationTargetChooser Reason: Improves diagnosis of error cases and clarity of code Author: Todd Lipcon Ref: CDH-659 commit 46e6199d8819538d96c3f4c5dbbfba163382b2a9 Author: Todd Lipcon Date: Mon May 3 15:02:32 2010 -0700 HDFS-1122. Don't allow client verification to prematurely add inprogress blocks to DataBlockScanner Description: When a client reads a block that is also open for writing, it should not add it to the datanode block scanner. If it does, the block scanner can incorrectly mark the block as corrupt, causing data loss. Reason: Potential dataloss with concurrent writer-reader case. Author: Sam Rash Ref: CDH-659 commit 07711a4ea3edd1a504eb9bbb13c93d5573620d34 Author: Todd Lipcon Date: Mon May 3 12:04:49 2010 -0700 HDFS-1057. Fixes for concurrent readers behind an appended file Description: Allows a client to read a file while it is still being written by a writer, so long as the writer has called sync(). Reason: Used by HBase replication, and useful for other "tail"-like applications. Author: Sam Rash Ref: CDH-659 commit 587de668e43486f7109a885f617b9b757d7a649e Author: Todd Lipcon Date: Sat Apr 24 17:33:34 2010 -0700 HADOOP-6722. Workaround a TCP spec quirk by not allowing NetUtils.connect to connect to itself Description: TCP's ephemeral port assignment results in the possibility that a client can connect back to its own outgoing socket, resulting in failed RPCs or datanode transfers. Reason: Fixes intermittent errors in cluster testing with ephemeral IPC/transceiver ports on datanodes. Author: Todd Lipcon Ref: CDH-659 commit 7a93fcc8c22b7cff87221ec0a8bf8f6689f12b82 Author: Todd Lipcon Date: Thu Apr 22 10:24:59 2010 -0700 HDFS-1203. Add small sleep to prevent DN flooding NN in error cases Description: If the datanode experiences an error in sending its block reports to the name node, it previously would loop retrying with no delay between attempts. In the case that the DN is sending an invalid report, this will flood the NN with RPCs. This patch adds a short sleep before the retry. Reason: Prevents possible flood of RPCs to the NameNode in DN error conditions. Author: Todd Lipcon Ref: CDH-659 commit a30c033c1eed744948ddfddb82b81b06e12bba46 Author: Todd Lipcon Date: Fri Apr 16 15:19:08 2010 -0700 HDFS-561. Fix read timeouts in write pipeline to stage correctly Description: Previously, the read timeout on the write pipeline was incorrectly calculated. This caused the client to detect the wrong failed datanode when a datanode's network failed or froze for another reason. Reason: Fix recovery behavior for frozen datanodes Author: Kan Zhang Ref: CDH-659 commit 02ab12541a004d67a96428055a58a3b726c1c4b6 Author: Todd Lipcon Date: Thu Apr 15 01:04:43 2010 -0700 HDFS-895. Allow hflush/sync to operate in parallel with other writers Description: Modifies synchronization of the DFSOutputStream sync feature such that multiple threads can sync the same stream concurrently and each will wait only the minimal amount of time. Also allows further writes to continue past the sync point while the sync waits. Reason: Substantial performance improvement for durable HBase Author: Todd Lipcon Ref: CDH-659 commit d1c4359e1abc3f3e5e4fa16ee1c83a3d7f015da3 Author: Todd Lipcon Date: Wed Apr 14 14:59:39 2010 -0700 HDFS-1211. BlockReceiver logs too much at INFO level when using sync() Description: Reduces the log level from INFO to DEBUG for a common message in the datanode log when using the sync feature. Reason: Substantially reduces DN log chattiness for syncing clients. Author: Todd Lipcon Ref: CDH-659 commit 23cfa9e8263ad1d92814b5829e2f50bb37d57857 Author: todd Date: Sun Mar 21 16:25:48 2010 -0700 HDFS-1056. Fix possible multinode deadlocks during block recovery when using ephemeral dataxceiver ports Description: Fixes the logic by which datanodes identify local RPC targets during block recovery for the case when the datanode is configured with an ephemeral data transceiver port. Reason: Potential internode deadlock for clusters using ephemeral ports Author: Todd Lipcon Ref: CDH-659 commit 08cbce1e413e98d0aaeceeaca26a60c3d9a50b29 Author: todd Date: Sun Mar 21 14:56:56 2010 -0700 HDFS-611. Move block deletions to an async thread. Applying this to make the HDFS-142 patch apply cleanly Description: Moves the deletion of blocks in the datanode into a thread pool. Substantially improves datanode heartbeat consistency for workloads with heavy deletes and/or lots of disks. Reason: Substantially reduces frequency of "could not complete block" errors and needless re-replication on clusters with lots of disks or heavy deletes. Author: Zheng Shao Ref: CDH-659 commit 57783d0683f0d675423369e0a0f9f5dd520c17f2 Author: todd Date: Sun Mar 21 03:36:45 2010 -0700 HDFS-1055. Improve thread naming in DN Xceiver Description: Names the threads created by the DataNode based on the action they are performing. Reason: Eases diagnosis of datanode performance/lock contention issues. Author: Todd Lipcon Ref: CDH-659 commit fddb2bd057e88506a1bb94232426053d1640a34b Author: todd Date: Sun Mar 21 03:36:29 2010 -0700 HDFS-894. Fix ipcPort tracking in Datanode registration. TODO: add the test case from JIRA Description: Fixes the NameNode to properly reregister datanodes when they crash and restart with a different IPC port (eg when IPC port is configured to be ephemeral) Reason: Fixes errors on clusters with ephemeral ports. Author: Todd Lipcon Ref: CDH-659 commit bc5217543eccc2cfd8a182cdbb051b39d2abf3e7 Author: Dhruba Borthakur Date: Fri Jun 11 23:37:38 2010 +0000 HDFS-1054. remove sleep before retry for allocating a block. Description: When the write pipeline fails to allocate a new block, it previously slept for hard-coded 6 seconds before retrying. This sleep has little reasoning behind it, so is removed. Reason: Improve failure recovery performance for interactive applications like HBase. Author: Todd Lipcon Ref: CDH-931 commit 870c7526a3e6a632eb23cf14f9011f279181a759 Author: Dhruba Borthakur Date: Thu Jun 10 22:25:39 2010 +0000 HDFS-142. Blocks that are being written by a client are stored in the blocksBeingWritten directory. git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-append@953482 13f79535-47bb-0310-9956-ffa450edef68 Description: Moves blocks being written by clients into a different directory in dfs.data.dir. Also fixes several other bugs in the datanode and namenode to support various error conditions related to append and sync. Reason: Necessary for proper recovery of synced data in several error conditions. Author: Dhruba Borthakur, Nicolas Spiegelberg, Todd Lipcon Ref: CDH-659 commit 8e888717294496caae825d7f3f609d0661e7997a Author: Dhruba Borthakur Date: Thu Jun 10 18:46:03 2010 +0000 HDFS-826. Allow a mechanism for an application to detect that datanode(s) have died in the write pipeline. (dhruba) Description: Adds an API in DFSOutputStream to determine the current length of the write pipeline. Reason: Necessary for better reliability of HBase write-ahead logs. Author: Dhruba Borthakur Ref: CDH-931 commit 8fcb419648160efaed6fdd467875c3b1743d2bee Author: Dhruba Borthakur Date: Wed Jun 9 23:12:21 2010 +0000 HDFS-988. Fix bug where savenameSpace can corrupt edits log. Description: Fixes several synchronization errors in the NameNode and ensures that all edits have been synced to the edits log before the namespace is saved. Reason: Fixes potential data corruption bug. Author: Todd Lipcon Ref: CDH-1436 commit f5ace5f920bc16fd202a6e4a53fe0ffe0cb5045e Author: Todd Lipcon Date: Thu May 20 01:23:15 2010 -0700 HDFS-101. Datanodes should continue to forward acks until client stops pipeline. Description: When one node in the pipeline dies, the datanodes in between the client and the dead node should stay alive and continue to forward acks until the client stops the pipeline. This fixes an issue where the client would incorrectly determine that the local DN had failed when in fact another DN in the pipeline was at fault. Reason: Common source of failed pipeline recovery in cluster fault testing Author: Hairong Kuang, Todd Lipcon Ref: CDH-693 commit 132ef7c852847e9d2c1e7879f2fca26652bb77ef Author: Dhruba Borthakur Date: Fri Jun 4 07:20:10 2010 +0000 HDFS-200. Support append and sync for hadoop 0.20 branch. Description: Provides basic support for append and sync on 0.20 Reason: Append and sync required for durable HBase and many other applications. Author: Dhruba Borthakur Ref: CDH-659 commit 092bcd174dbf609f5002078490c357462e0ce8b1 Author: Konstantin Shvachko Date: Wed Apr 21 03:05:45 2010 +0000 HDFS-909. Fix race in edit log rolling Description: Fixes a race condition when rolling edit logs that can corrupt the logs. Reason: Potential namenode metadata corruption bug. Author: Todd Lipcon Ref: CDH-1174 commit e2a78f767d26b838bf67354a4b85235ddd731038 Author: Eli Collins Date: Fri Jun 18 14:41:14 2010 -0700 CLOUDERA-BUILD. Update hadoop-config.sh to reflect new jar version. commit 1756e97a35451bbc01a493e843f1ec0885c99792 Author: Aaron Kimball Date: Fri Jun 18 11:37:22 2010 -0700 MAPREDUCE-1644. Remove Sqoop from Apache Hadoop (moving to github) Description: Sqoop is moving to github! All code for sqoop is already live at http://github.com/cloudera/sqoop - this issue removes the duplicate code from the Apache Hadoop repository. CDH users should install the separate 'sqoop' package for this functionality. Reason: Moving to a separate package Author: Aaron Kimball Ref: CDH-1404 commit e0afb34b89a013419fca4bdcda5f2cf0401f93ca Author: Aaron Kimball Date: Thu Jun 17 19:06:50 2010 -0700 MAPREDUCE-1302. TrackerDistributedCacheManager can delete file asynchronously Description: With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to delete files from distributed cache asynchronously. That will help make task initialization faster, because task initialization calls the code that localizes files into the cache and may delete some other files. The deletion can slow down the task initialization speed. Reason: Performance improvement Author: Zheng Shao Ref: CDH-495 commit 456821d6934fd769ab317c2290a4ff53b075269e Author: Aaron Kimball Date: Thu Jun 17 19:04:31 2010 -0700 HADOOP-6433. Add AsyncDiskService that is used in both hdfs and mapreduce Description: create a thread pool per disk volume, and use that for scheduling async disk operations. Reason: Improvement Author: Zheng Shao Ref: CDH-495 commit 6e467c42d62aafd00fd2f38269806680427631c8 Author: Aaron Kimball Date: Thu Jun 17 18:50:47 2010 -0700 MAPREDUCE-1213. TaskTrackers restart is very slow because it deletes distributed cache directory synchronously Description: We are seeing that when we restart a tasktracker, it tries to recursively delete all the file in the distributed cache. It invoked FileUtil.fullyDelete() which is very very slow. This means that the TaskTracker cannot join the cluster for an extended period of time (upto 2 hours for us). The problem is acute if the number of files in a distributed cache is a few-thousands. Reason: Performance Author: Zheng Zhao Ref: CDH-495 commit 5626a0e301557dbc93ad5084aa9ef4527316db7b Author: Aaron Kimball Date: Thu Jun 17 18:45:58 2010 -0700 MAPREDUCE-1443. DBInputFormat can leak connections Description: The DBInputFormat creates a Connection to use when enumerating splits, but never closes it. This can leak connections to the database which are not cleaned up for a long time. Reason: bug Author: Aaron Kimball Ref: CDH-1435 commit 912eed1c5d50066e68700d2143b775914d7f8e54 Author: Aaron Kimball Date: Thu Jun 17 16:00:49 2010 -0700 MAPREDUCE-1489. DataDrivenDBInputFormat should not query the database when generating only one split Description: DataDrivenDBInputFormat runs a query to establish bounding values for each split it generates; but if it's going to generate only one split (mapreduce.job.maps == 1), then there's no reason to do this. This will remove overhead associated with a single-threaded import of a non-indexed table since it avoids a full table scan. Reason: Improvement Author: Aaron Kimball Ref: CDH-1431 commit 1c3fc82063212196fd2fac7f55df8eb323e8f601 Author: Aaron Kimball Date: Tue Apr 27 11:44:29 2010 -0700 MAPREDUCE-1728. Oracle timezone strings do not match Java Description: OracleDBRecordReader sets the session timezone based on the toString representation of the current java.util.TimeZone. This is incorrect; Oracle manages a separate database of acceptable timezone strings, whose string representations are different than the timezone representations recognized by Java. Reason: Bug Author: Aaron Kimball Ref: CDH-961 commit 11bc9be1ff2fd994046acd660afa7631f9203cfb Author: Eli Collins Date: Thu May 27 17:44:00 2010 -0700 HADOOP-6714. FsShell 'hadoop fs -text' does not support compression codecs. Currently, 'hadoop fs -text myfile' looks at the first few magic bytes of a file to determine whether it is gzip compressed or a sequence file. This means 'fs -text' cannot properly decode .deflate or .bz2 files (or other codecs specified via configuration). Reason: Improvement Author: Eli Collins Ref: CDH-1136 commit e95781032b5d886aa6583cab1306025fe372babf Author: Eli Collins Date: Tue May 25 13:20:00 2010 -0700 HADOOP-1849. IPC server max queue size should be configurable. Description: Currently max queue size for IPC server is set to (100 * handlers). Usually when RPC failures are observed (e.g. HADOOP-1763), we increase number of handlers and the problem goes away. I think a big part of such a fix is increase in max queue size. I think we should make maxQsize per handler configurable (with a bigger default than 100). There are other improvements also (HADOOP-1841). Server keeps reading RPC requests from clients. When the number in-flight RPCs is larger than maxQsize, the earliest RPCs are deleted. This is the main feedback Server has for the client. I have often heard from users that Hadoop doesn't handle bursty traffic. Say handler count is 10 (default) and Server can handle 1000 RPCs a sec (quite conservative/low for a typical server), it implies that an RPC can wait for only for 1 sec before it is dropped. If there 3000 clients and all of them send RPCs around the same time (not very rare, with heartbeats etc), 2000 will be dropped. In stead of dropping the earliest RPCs, if the server delays reading new RPCs, the feedback to clients would be much smoother, I will file another jira regd queue management. For this jira I propose to make queue size per handler configurable, with a larger default (may be 500). Reason: Improvement Author: Eli Collins Ref: CDH-1133 commit 776a20d37142534751178b060285d2813cc66c1c Author: Eli Collins Date: Tue May 25 13:09:30 2010 -0700 HADOOP-6724. IPC doesn't properly handle IOEs thrown by socket factory. Description: If the socket factory throws an IOE inside setupIOStreams, then handleConnectionFailure will be called with socket still null, and thus generate an NPE on socket.close(). This ends up orphaning clients, etc. Reason: Bug fix Author: Eli Collins Ref: CDH-1132 commit 1864359f4ef32974ed41a1278e640e1ee246ef9b Author: Eli Collins Date: Tue May 25 13:05:38 2010 -0700 HADOOP-6723. Unchecked exceptions thrown in IPC connection should not orphan clients. Description: If the server sends back some malformed data, for example, receiveResponse() can end up with an incorrect call ID. Then, when it tries to find it in the calls map, it will end up with null and throw NPE in receiveResponse. This isn't caught anywhere, so the original IPC client ends up hanging forever instead of catching an exception. Another example is if the writable implementation itself throws an unchecked exception or OOME. We should catch Throwable in Connection.run() and shut down the connection if we catch one. Reason: Bug fix Author: Eli Collins Ref: CDH-1131 commit 95d64157f05d467dad3e1190a5cba2a3f89b0925 Author: Eli Collins Date: Thu May 20 17:15:13 2010 -0700 CLOUDERA-BUILD. Rename the fuse_dfs wrapper. Description: Rename the fuse_dfs wrapper to hadoop-fuse-dfs. Reason: Improvement Author: Alex Newman Ref: CDH-1103 commit d8c973d9c6f650032c88915d9fef6f4a568d37a5 Author: Chad Metcalf Date: Wed May 19 15:38:14 2010 -0700 CLOUDERA-BUILD. Fixes for the fuse_dfs wrapper. Description: The wrapper uses bash syntax (i.e., +=) so we should use bash. We need to modprobe fuse explicitly on Ubuntu. Since this is installed by install_hadoop.sh we know HADOOP_HOME and should use it directly. Lastly, there is more robust JAVA_HOME checking in hadoop-config.sh so we should use that. Reason: Fuse currently broken on Ubuntu Author: Chad Metcalf Ref: CDH-1089 commit e810911445859693ee0b868c2a5d8bc18360cdb9 Author: Eli Collins Date: Tue May 18 14:30:04 2010 -0700 HDFS-1161. Make DN minimum valid volumes configurable Description: This change adds a dfs.datanode.failed.volumes.tolerated parameter so that users can configure the number of volumes that are allowed to fail before a datanode stops offering service. By default any volume failure will cause a datanode to shutdown. Reason: Improvement Author: Eli Collins Ref: CDH-1081 commit baa77bdde4fd971877418391a4fe491c2d4c2501 Author: Eli Collins Date: Mon May 17 19:49:44 2010 -0700 HDFS-1160. Improve some FSDataset warnings and comments. Description: Cleans up HDFS-547 warnings. Reason: Improvement Author: Eli Collins Ref: CDH-1080 commit 90f5a4bf77d17adcabb834a3cc2e02becb9f012d Author: Eli Collins Date: Mon May 17 18:53:50 2010 -0700 HDFS-612. FSDataset should not use org.mortbay.log.Log. Description: Cleans up HDFS-547 logging. Reason: Improvement Author: Eli Collins Ref: CDH-1079 commit 4a925fe53a2015e504cd8c8796e0e590d22019c4 Author: Eli Collins Date: Thu Apr 22 14:41:08 2010 -0700 HDFS-457. Better handling of volume failure in Data Node storage. Description: Current implementation shuts DataNode down completely when one of the configured volumes of the storage fails. This is rather wasteful behavior because it decreases utilization (good storage becomes unavailable) and imposes extra load on the system (replication of the blocks from the good volumes). These problems will become even more prominent when we move to mixed (heterogeneous) clusters with many more volumes per Data Node. Reason: Improvement Author: Eli Collins Ref: CDH-472 commit 3af9533ee6f260373f302ff4a16dd04eb75e0616 Author: Chad Metcalf Date: Mon Mar 1 15:28:19 2010 -0800 CLOUDERA-BUILD. hadoop-config runs before hadoop-env.sh conf/hadoop-env.sh says you can update JAVA_HOME there, but it gets sourced after hadoop-config.sh, which errors out if JAVA_HOME is not set. This patch changes the flow so hadoop-env is always sourced by hadoop-config after the --config flag is processed. This will allow JAVA_HOME to be set in hadoop-env and still allow for trying to find a valid JAVA_HOME. commit c9295d4ac2848403362e5dbaa78aa7be4ce4254e Author: Eli Collins Date: Sat May 15 13:39:08 2010 -0700 HADOOP-3659. Fix hadoop native to compile on Mac OS X. Description: This patch makes the autoconf script work on Mac OS X. LZO needs to be installed (including the optional shared libraries) for the compile to succeed. You'll want to regenerate the configure script using autoconf after applying this patch. Reason: Bug fix Author: Eli Collins Ref: CDH-825 commit cc035175e1cf1ddef878cba6aa93725f832d0327 Author: Eli Collins Date: Sat May 15 12:55:06 2010 -0700 MAPREDUCE-1785. Add streaming config option for not emitting the key. Description: PipeMapper currently does not emit the key when using TextInputFormat. If you switch to input formats (eg LzoTextInputFormat) the key will be emitted. We should add an option so users can explicitly make streaming not emit the key so they can change input formats without breaking or having to modify their existing programs. Reason: Improvement Author: Eli Collins Ref: CDH-856 commit 590a82c257842be51170619deafd15cc2988541e Author: Eli Collins Date: Thu May 13 21:25:53 2010 -0700 HADOOP-4885. Try to restore failed replicas of Name Node storage (at checkpoint time). Description: If one of the replicas of the NameNode storage fails for whatever reason (for example temporarily failure of NFS) this Storage object is removed from the list of storage objects forever. It can be added back only on restart of the NameNode. We propose to check the status of a failed storage on every checkpoint and if it becomes valid - try to restore the edits and fsimage. Reason: Improvement Author: Eli Collins Ref: CDH-473 commit 0f2f19e1bd5725f6163998ae86d9103c0d552de3 Author: Eli Collins Date: Thu May 13 20:07:02 2010 -0700 HDFS-1024. SecondaryNamenode fails to checkpoint because namenode fails with CancelledKeyException. Description: The secondary namenode fails to retrieve the entire fsimage from the Namenode. It fetches a part of the fsimage but believes that it has fetched the entire fsimage file and proceeds ahead with the checkpointing. Reason: Bug fix Author: Eli Collins Ref: CDH-891 commit 0ec1d6ed85a30327c657c2418932728d0e4e98df Author: Todd Lipcon Date: Wed May 12 21:33:45 2010 -0700 HADOOP-6254. Slow reads cause s3n to fail with SocketTimeoutException Reason: Bug fix for users of s3n:// file system Author: Andrew Hitchcock Ref: CDH-1035 commit d64943401780c3dd1dc498419f33ded8222c3210 Author: Eli Collins Date: Wed May 12 12:05:26 2010 -0700 HADOOP-6667. RPC.waitForProxy should retry through NoRouteToHostException. Description: RPC.waitForProxy already loops through ConnectExceptions, but NoRouteToHostException is not a subclass of ConnectException. In the case that the NN is on a VIP, the No Route To Host error is reasonably common during a failover, so we should retry through it just the same as the other connection errors. Reason: Improvement Author: Eli Collins Ref: CDH-907 commit a5fb4a8c8bf9d6a3a96c3a06eb3a46febaf21a0f Author: Todd Lipcon Date: Fri May 7 15:36:14 2010 -0700 MAPREDUCE-1375. TestFileArgs fails intermittently Description: Fixes an error in a test case without modifying code. This is an amendment to the prior fix which did not address the issue properly. Reason: Should fix flaky tests. Author: Todd Lipcon Ref: CDH-657 commit 148d291aa14a4481dc206d2fc9a8527eb6761488 Author: newalex Date: Fri Apr 16 15:48:14 2010 -0700 CLOUDERA-BUILD. Add a fuse manpage Description: Adding a fuse_dfs manpage and adding a manpage to the build. Reason: New Feature Author: Alex Newman Ref: CDH-927 commit 9acfd39492f85c92bc45d47d6dcfb309e3826c64 Author: newalex Date: Thu Apr 8 10:35:19 2010 -0700 CLOUDERA-BUILD. Build script changes to build DEB packages Description: The required changes to the cloudera hadoop building scripts for pulling the fuse files out and cleaning up its mess v.v. DEBs. Reason: Building packages Author: Alex Newman Ref: CDH-929 commit d144085817496eecc57c510022d66d0540b4511d Author: newalex Date: Tue Apr 6 14:05:29 2010 -0700 CLOUDERA-BUILD. Added an RPM for fuse Description: The required changes to the cloudera hadoop building scripts for pulling the fuse files out and cleaning up its mess. Reason: Building packages Author: Alex Newman Ref: CDH-928 commit 56648efe291503249fec22a242917ec4dddc6214 Author: Eli Collins Date: Tue Mar 30 15:17:50 2010 -0700 HADOOP-6522. Fix decoding of codepoint zero in UTF8. Description: TestUTF8 is actually flaky. It generates 10 random strings to run the test on. If you change this number to 100000 it fails every time. The problem is that the null character (codepoint zero) was correctly encoded but incorrectly decoded. I've attached a patch that fixes this and increases the size of the tests so that problems like this will likely be discovered sooner. Reason: Bugfix to UTF8 Author: Eli Collins Ref: CDH-718 commit 936a67ba3b34dc8c8efd3df92d9e50309fafb8f6 Author: Aaron Kimball Date: Mon Mar 29 23:50:14 2010 -0700 MAPREDUCE-1460. Oracle support in DataDrivenDBInputFormat Description: DataDrivenDBInputFormat does not work with Oracle due to various SQL syntax issues. Reason: Required for Sqoop/Oracle integration Author: Aaron Kimball Ref: CDH-888 commit c08f94a6927f9c8b0dfaeb674835afdd3fdd1d08 Author: Aaron Kimball Date: Mon Mar 29 17:15:53 2010 -0700 MAPREDUCE-1569. Mock Contexts & Configurations Description: Currently the library creates a new Configuration object in the MockMapContext and MocKReduceContext constructors, rather than allowing the developer to configure and pass their own Reason: Feature improvement for MRUnit Author: Chris White Ref: CDH-838 commit 27cfda1de80048bf2b46d74d78b61275ecc79be1 Author: Aaron Kimball Date: Mon Mar 29 16:43:49 2010 -0700 MAPREDUCE-1536. DataDrivenDBInputFormat does not split date columns correctly. Description: The DateSplitter does not properly split a range of (min, max) dates. Reason: Bugfix to DateSplitter Author: Aaron Kimball Ref: CDH-813 commit 7fc6e48e296c30f0afa8ae8da668bddbc9f422bf Author: Aaron Kimball Date: Mon Mar 29 16:11:22 2010 -0700 MAPREDUCE-1480. CombineFileRecordReader does not properly initialize child RecordReader Description: CombineFileRecordReader instantiates child RecordReader instances but never calls their initialize() method to give them the proper TaskAttemptContext. Reason: Bug in CombineFileInputFormat prevents proper use. Author: Aaron Kimball Ref: CDH-811 commit 32330fbadb4aed16627397979b90d52f2474ef38 Author: Aaron Kimball Date: Mon Mar 29 15:50:20 2010 -0700 MAPREDUCE-1423. Improve performance of CombineFileInputFormat when multiple pools are configured Description: I have a map-reduce job that is using CombineFileInputFormat. It has configured 10000 pools and 30000 files. The time to create the splits takes more than an hour. The reaosn being that CombineFileInputFormat.getSplits() converts the same path from String to Path object multiple times, one for each instance of a pool. Similarly, it calls Path.toUri(0 multiple times. This code can be optimized. Reason: Improves CombineFileInputFormat performance (used by Sqoop); needed to apply MAPREDUCE-1480 cleanly Author: Dhruba Borthakur Ref: CDH-811 commit 6906389e07244931a108f2930544b9feada3a487 Author: Aaron Kimball Date: Mon Mar 29 15:41:38 2010 -0700 MAPREDUCE-364. Change org.apache.hadoop.examples.MultiFileWordCount to use new mapreduce api. Description: Updates MultiFileWordCount example to use the new API in org.apache.hadoop.mapreduce instead of the deprecated API of org.apache.hadoop.mapred. This incorporates MAPREDUCE-367: Change org.apache.hadoop.mapred.lib.CombineFileInputFormat to use the new api. This solves duplicate issue MAPREDUCE-1112: Fix CombineFileInputFormat for hadoop 0.20 Reason: CombineFileInputFormat required for many clients of the new API, including Sqoop. Author: Amareshwari Sriramadasu Ref: CDH-811 commit 4b592cf8cb44c018f86abe529d71434d5106ce1e Author: Aaron Kimball Date: Mon Mar 29 13:07:15 2010 -0700 HADOOP-6382. Publish hadoop jars to apache mvn repo. Description: This provides an 'ant mvn-install' command that will install Hadoop core, streaming, examples, etc. jars in a maven repository. Uses the maven ant task to publish hadoop 20 jars to the apache maven repo. Reason: Required for cross-distribution dependency management in downstream projects (e.g., sqoop) Author: Giridharan Kesavan Ref: CDH-402 commit 8424e32eb866d677f40a9446f9c4cf74972b751e Author: Chad Metcalf Date: Thu Mar 18 17:05:47 2010 -0700 HADOOP-6643. Set executable bit for python cloud scripts in the distribution Description: This needs to be set in the tar target. Reason: Required for the EC2 scripts. Author: Tom White Ref: CDH-821 commit cfc3233ece0769b11af9add328261295aaf4d1ad Author: Aaron Kimball Date: Fri Mar 12 17:56:30 2010 -0800 CLOUDERA-BUILD. Fix ivy xml after rebase. Removed a redundant closing tag. Author: Matt Massie commit 54e1aefdd7a25a539831cac2c9b1bc3597f119ea Author: Aaron Kimball Date: Fri Mar 12 17:56:07 2010 -0800 CLOUDERA-BUILD. Small tweaks and fixes to Cloudera styling: Description: - Fixes trivial CSS bug for missing table cell borders in Chrome - Fixes footer to read "Distribution for Hadoop" instead of "Distribution of Hadoop" Author: Todd Lipcon commit ea83036b3838fa97c673e73145d52867b8ace6ac Author: Aaron Kimball Date: Fri Mar 12 17:55:30 2010 -0800 HDFS-1013. Miscellaneous improvements to HTML markup for web UIs Description: The Web UIs have various bits of bad markup (eg missing <head> sections, some pages missing CSS links, inconsistent td vs th for table headings). We should fix this up.
Improve markup and add Cloudera styling to Web UIs This adds a favicon and a number of HTML/CSS improvements to make the pages more space-efficient and easy on the eyes. This may be an incompatible change for users who are scraping the HTML output of the web UIs. Those users are encouraged to access the data programmatically rather than through scraping. The non-Cloudera-specific improvements will be contributed upstream as HDFS-1013 and MAPREDUCE-1544. Reason: User experience improvement Author: Todd Lipcon Ref: UNKNOWN commit 90ba5543e4c3176343e23943131a34d666c23d89 Author: Aaron Kimball Date: Fri Mar 12 17:54:58 2010 -0800 MAPREDUCE-1436. Deadlock in preemption code in fair scheduler Description: In testing the fair scheduler with preemption, I found a deadlock between updatePreemptionVariables and some code in the JobTracker. This was found while testing a backport of the fair scheduler to Hadoop 0.20, but it looks like it could also happen in trunk and 0.21. Details are in a comment below.
The fair scheduler introduces a potential jobtracker deadlock which was fixed on trunk by MAPREDUCE-870. This patch adjusts the locking in 0.20-based MapReduce to prevent this condition. Reason: bugfix (deadlock) Author: Matei Zaharia Ref: UNKNOWN commit 6f04e94feee3f40a73449cc6fbe7b4e3c48f1fc4 Author: Aaron Kimball Date: Fri Mar 12 17:54:13 2010 -0800 HDFS-696. Java assertion failures triggered by tests Description: Re-purposing as catch-all ticket for assertion failures when running tests with java asserts enabled. Running with the attached patch on trunk@823732 the following tests all trigger assertion failures:

TestAccessTokenWithDFS
TestInterDatanodeProtocol
TestBackupNode
TestBlockUnderConstruction
TestCheckpoint
TestNameEditsConfigs
TestStartup
TestStorageRestore


Disable failing asserts (see HDFS-696). Disabled asserts in HDFS that cause unit tests to fail. These will be re-enabled at a later date when the underlying cause is fixed upstream. In the meantime, these are disabled to keep our CI server returning only new failures. Issue HDFS-696 lists the failing tests and tracks their progress. Reason: Test harness improvement Author: Eli Collins Ref: UNKNOWN commit 74b80b9c9490bba1a1120f3a9376d2f21f3763b6 Author: Aaron Kimball Date: Fri Mar 12 17:53:38 2010 -0800 MAPREDUCE-1093. Java assertion failures triggered by tests Description: Removes failing asserts from the CDH build until they are fixed in trunk. Tracking MAPREDUCE-1506 to include a fix for this assertion failure. Reason: Test harness improvement Author: Aaron Kimball Ref: UNKNOWN commit b4be440cd928976544bcbeb7e10566fc523dbd0c Author: Aaron Kimball Date: Fri Mar 12 17:53:13 2010 -0800 MAPREDUCE-1092. Enable asserts for tests by default Description: See HADOOP-6309. Let's make the tests run with java asserts by default. Reason: Test coverage improvement Author: Eli Collins Ref: UNKNOWN commit 5e7fb9843f99f5e1023f2723210f26ac0c33323b Author: Aaron Kimball Date: Fri Mar 12 17:52:45 2010 -0800 MAPREDUCE-1375. TestFileArgs fails intermittently Description: TestFileArgs failed once for me with the following error
expected:<[job.jar
    sidefile
    tmp
    ]> but was:<[]>
    sidefile
    tmp
    ]> but was:<[]>
            at org.apache.hadoop.streaming.TestStreaming.checkOutput(TestStreaming.java:107)
            at org.apache.hadoop.streaming.TestStreaming.testCommandLine(TestStreaming.java:123)
This test was flaky due to trying to write some data into /bin/ls. Depending on the speed of the test run, this sometimes resulted in a Broken Pipe on flush() which caused the test to fail. Reason: Bugfix (race condition in test) Author: Todd Lipcon Ref: UNKNOWN commit ae699cda01c093097ae723224553773247577aa2 Author: Aaron Kimball Date: Fri Mar 12 17:52:32 2010 -0800 HDFS-961. dfs_readdir incorrectly parses paths Description: fuse-dfs dfs_readdir assumes that DistributedFileSystem#listStatus returns Paths with the same scheme/authority as the dfs.name.dir used to connect. If NameNode.DEFAULT_PORT port is used listStatus returns Paths that have authorities without the port (see HDFS-960), which breaks the following code.
// hack city: todo fix the below to something nicer and more maintainable but
    // with good performance
    // strip off the path but be careful if the path is solely '/'
    // NOTE - this API started returning filenames as full dfs uris
    const char *const str = info[i].mName + dfs->dfs_uri_len + path_len + ((path_len == 1 && *path == '/') ? 0 : 1);

Let's make the path parsing here more robust. listStatus returns normalized paths so we can find the start of the path by searching for the 3rd slash. A more long term solution is to have hdfsFileInfo maintain a path object or at least pointers to the relevant URI components.

Reason: bugfix Author: Eli Collins Ref: UNKNOWN commit 7f9f42b27b109eff6fafc6ee24526fcadaf68d69 Author: Aaron Kimball Date: Fri Mar 12 17:52:23 2010 -0800 MAPREDUCE-1467. Add a --verbose flag to Sqoop Description: Need a --verbose flag that sets the log4j level to DEBUG. Reason: Logging improvement Author: Aaron Kimball Ref: UNKNOWN commit db680058f5796fc41d61242d60bc86b1b25facf9 Author: Aaron Kimball Date: Fri Mar 12 17:52:07 2010 -0800 MAPREDUCE-1469. Sqoop should disable speculative execution in export Description: Concurrent writers of the same output shard may cause the database to try to insert duplicate primary keys concurrently. Not a good situation. Speculative execution should be forced off for this operation. Reason: Bugfix (race condition) Author: Aaron Kimball Ref: UNKNOWN commit a5ccc56a79fc53de5ff16c6cb996f41a4216c28d Author: Aaron Kimball Date: Fri Mar 12 17:51:29 2010 -0800 MAPREDUCE-1341. Sqoop should have an option to create hive tables and skip the table import step Description: In case the client only needs to create tables in hive, it would be helpful if Sqoop had an optional parameter:

--hive-create-only

which would omit the time consuming table import step, generate hive create table statements and run them.

Also adds --hive-overwrite flag which allows overwriting of existing table definition. Reason: New feature Author: Leonid Furman Ref: UNKNOWN commit bdf576aa69eeb56a954416f7c2fcbe0136f421bd Author: Aaron Kimball Date: Fri Mar 12 17:51:16 2010 -0800 HADOOP-4012. Providing splitting support for bzip2 compressed files Description: Hadoop assumes that if the input data is compressed, it can not be split (mainly due to the limitation of many codecs that they need the whole input stream to decompress successfully). So in such a case, Hadoop prepares only one split per compressed file, where the lower split limit is at 0 while the upper limit is the end of the file. The consequence of this decision is that, one compress file goes to a single mapper. Although it circumvents the limitation of codecs (as mentioned above) but reduces the parallelism substantially, as it was possible otherwise in case of splitting.

BZip2 is a compression / De-Compression algorithm which does compression on blocks of data and later these compressed blocks can be decompressed independent of each other. This is indeed an opportunity that instead of one BZip2 compressed file going to one mapper, we can process chunks of file in parallel. The correctness criteria of such a processing is that for a bzip2 compressed file, each compressed block should be processed by only one mapper and ultimately all the blocks of the file should be processed. (By processing we mean the actual utilization of that un-compressed data (coming out of the codecs) in a mapper).

We are writing the code to implement this suggested functionality. Although we have used bzip2 as an example, but we have tried to extend Hadoop's compression interfaces so that any other codecs with the same capability as that of bzip2, could easily use the splitting support. The details of these changes will be posted when we submit the code.

Reason: New feature Author: Abdul Qadeer Ref: UNKNOWN commit 8e47288583fcdbdf649ddf3486bf201788e79202 Author: Aaron Kimball Date: Fri Mar 12 17:50:51 2010 -0800 MAPREDUCE-707. Provide a jobconf property for explicitly assigning a job to a pool Description: A common use case of the fair scheduler is to have one pool per user, but then to define some special pools for various production jobs, import jobs, etc. Therefore, it would be nice if jobs went by default to the pool of the user who submitted them, but there was a setting to explicitly place a job in another pool. Today, this can be achieved through a sort of trick in the JobConf:
<property>
      <name>mapred.fairscheduler.poolnameproperty</name>
      <value>pool.name</value>
    </property>
    
    <property>
      <name>pool.name</name>
      <value>${user.name}</value>
    </property>

This JIRA proposes to add a property called mapred.fairscheduler.pool that allows a job to be placed directly into a pool, avoiding the need for this trick.

Reason: Configuration improvement Author: Alan Heirich Ref: UNKNOWN commit 96e17e1e593b818a888c8dfc177b8fb36e514e8f Author: Aaron Kimball Date: Fri Mar 12 17:50:18 2010 -0800 MAPREDUCE-967. (version 2) TaskTracker does not need to fully unjar job jars Description: This is a performance improvement for jobs that contain a large number of classes. The unpacking of these jars consumes a large amount of time, as does the resulting cleanup. This patch changes the classpath to simply include the jar itself, and only unpacks the lib/ directory out of the jar in order to add those dependencies to the classpath. Users who previously depended on this functionality for shipping non-code dependencies can use the undocumented configuration parameter "mapreduce.job.jar.unpack.pattern" to cause specific jar contents to be unpacked This new patch version fixes a streaming regression where the "-file" argument no longer worked. It includes a new unit test, TestFileArgs, to protect against this regression. Author: Todd Lipcon Ref: UNKNOWN commit cf08a128b87bbfae90babd61795599b3645d37a3 Author: Aaron Kimball Date: Fri Mar 12 17:48:40 2010 -0800 HDFS-455, MAPREDUCE-1441, HADOOP-6534. Allow spaces in between comma-separated elements in directory list configurations. Description: Make NN and DN handle in a intuitive way comma-separated configuration strings The following configuration causes problems:
<property>
<name>dfs.data.dir</name>
<value>/mnt/hstore2/hdfs, /home/foo/dfs</value>
</property>

The problem is that the space after the comma causes the second directory for storage to be " /home/foo/dfs" which is in a directory named <SPACE> which contains a sub-dir named "home" in the hadoop datanodes default directory. This will typically cause the user's home partition to fill, but will be very hard for the user to understand since a directory with a whitespace name is hard to understand.

(ripped from HADOOP-2366)


This fixes any configuration consisting of a comma-separated list of directories (e.g., dfs.data.dir, dfs.name.dir, fs.checkpoint.dir, mapred.local.dir, etc) so that the elements may also contain separating whitespace. Without this patch, setting mapred.local.dir to "/disk1, /disk2" would create a directory by the name " " in the user's home directory, or fail outright. The patch trims the directory names as they are fetched from the configuration. Reason: Configuration improvement Author: Todd Lipcon Ref: UNKNOWN commit 65a04ab8197a8db21a97d279ca881b5cd45a5365 Author: Aaron Kimball Date: Fri Mar 12 17:48:03 2010 -0800 HADOOP-2366. Space in the value for dfs.data.dir can cause great problems Description: The following configuration causes problems:

<property>
<name>dfs.data.dir</name>
<value>/mnt/hstore2/hdfs, /home/foo/dfs</value>
<description>
Determines where on the local filesystem an DFS data node should store its bl
ocks. If this is a comma-delimited list of directories, then data will be stor
ed in all named directories, typically on different devices. Directories that
do not exist are ignored.
</description>
</property>

The problem is that the space after the comma causes the second directory for storage to be " /home/foo/dfs" which is in a directory named <SPACE> which contains a sub-dir named "home" in the hadoop datanodes default directory. This will typically cause the user's home partition to fill, but will be very hard for the user to understand since a directory with a whitespace name is hard to understand.

My proposed solution would be to trimLeft all path names from this and similar property after splitting on comma. This still allows spaces in file and directory names but avoids this problem.


This provides support in Configuration to get comma-separated string lists in such a way that whitespace in between elements is ignored. This patch is required for later patches which fix mapred.local.dir, dfs.data.dir, etc to support spaces in between elements. Test plan: unit tested in TestStringUtils Reason: Configuration improvement Author: Michele (@pirroh) Catasta Ref: UNKNOWN commit 8d4807322a42509726b376b37a89739acd6cbd7d Author: Aaron Kimball Date: Fri Mar 12 17:47:55 2010 -0800 MAPREDUCE-1356. Allow user-specified hive table name in sqoop Description: The table name used in a hive-destination import is currently pegged to the input table name. This should be user-configurable. Reason: New feature Author: Aaron Kimball Ref: UNKNOWN commit 8bf3439ff69762a33967dca4abb15c0cd2bb8417 Author: Aaron Kimball Date: Fri Mar 12 17:47:45 2010 -0800 MAPREDUCE-1395. Sqoop does not check return value of Job.waitForCompletion() Description: Old code depended on JobClient.runJob() throwing IOException on failure. Job.waitForCompletion can fail in that manner, or it can fail by returning false. Sqoop needs to check for this condition. Reason: bugfix Author: Aaron Kimball Ref: UNKNOWN commit bd4e81234dd12fa9534577f0caa0db5c3d0a99fc Author: Aaron Kimball Date: Fri Mar 12 17:47:30 2010 -0800 CLOUDERA-BUILD. Set HADOOP_PID_DIR to something smarter than /tmp Author: Chad Metcalf commit 2466310d0e2a426e848860e9a8411b8ea14e1bb1 Author: Aaron Kimball Date: Fri Mar 12 17:47:07 2010 -0800 HADOOP-6453. Hadoop wrapper script shouldn't ignore an existing JAVA_LIBRARY_PATH Description: Currently the hadoop wrapper script assumes its the only place that uses JAVA_LIBRARY_PATH and initializes it to a blank line.

JAVA_LIBRARY_PATH=''

This prevents anyone from setting this outside of the hadoop wrapper (say hadoop-config.sh) for their own native libraries.

The fix is pretty simple. Don't initialize it to '' and append the native libs like normal.

Reason: Bugfix (environment) Author: Chad Metcalf Ref: UNKNOWN commit a67b4b1c361c26e002da64953a7f8bc068d29b98 Author: Aaron Kimball Date: Fri Mar 12 17:46:42 2010 -0800 MAPREDUCE-1327. Oracle database import via sqoop fails when a table contains the column types such as TIMESTAMP(6) WITH LOCAL TIME ZONE and TIMESTAMP(6) WITH TIME ZONE Description: When Oracle table contains the columns "TIMESTAMP(6) WITH LOCAL TIME ZONE" and "TIMESTAMP(6) WITH TIME ZONE", Sqoop fails to map values for those columns to valid Java data types, resulting in the following exception:

ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException
java.lang.NullPointerException
at org.apache.hadoop.sqoop.orm.ClassWriter.generateFields(ClassWriter.java:253)
at org.apache.hadoop.sqoop.orm.ClassWriter.generateClassForColumns(ClassWriter.java:701)
at org.apache.hadoop.sqoop.orm.ClassWriter.generate(ClassWriter.java:597)
at org.apache.hadoop.sqoop.Sqoop.generateORM(Sqoop.java:75)
at org.apache.hadoop.sqoop.Sqoop.importTable(Sqoop.java:87)
at org.apache.hadoop.sqoop.Sqoop.run(Sqoop.java:175)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.sqoop.Sqoop.main(Sqoop.java:201)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

Reason: Compatibility improvement Author: Leonid Furman Ref: UNKNOWN commit a937ba2b9b6132883d727f856911ae31d22ad619 Author: Aaron Kimball Date: Fri Mar 12 17:46:26 2010 -0800 MAPREDUCE-1394. Sqoop generates incorrect URIs in paths sent to Hive Description: Hive used to require a ':8020' in HDFS URIs used with LOAD DATA statements, even though the normalized form of such a URI does not contain an explicit port number (since 8020 is the default port). Sqoop matched this by hacking the URI strings it forwarded to Hive.

Hive fixed this bug a while ago – Sqoop should catch up.

Reason: bugfix (compatibility) Author: Aaron Kimball Ref: UNKNOWN commit c5c9b8bf0bf83637589a809b3c376cf74a2fb464 Author: Aaron Kimball Date: Fri Mar 12 17:45:54 2010 -0800 MAPREDUCE-1313. NPE in FieldFormatter if escape character is set and field is null Description: Performing an import with the --escaped-by character set on a table with a null field will cause a NullPointerException in FieldFormatter Reason: bugfix Author: Aaron Kimball Ref: UNKNOWN commit 1c6dd471832946929928801dd9c9e4b79259ad9d Author: Aaron Kimball Date: Fri Mar 12 17:45:38 2010 -0800 HADOOP-6460. Namenode runs of out of memory due to memory leak in ipc Server Description: Namenode heap usage grows disproportional to the number objects supports (files, directories and blocks). Based on heap dump analysis, this is due to large growth in ByteArrayOutputStream allocated in o.a.h.ipc.Server.Handler.run(). Reason: Bugfix (Scalability) Author: Suresh Srinivas Ref: UNKNOWN commit d190a8067827ce09cdcb7741d588cce0e0e7aa02 Author: Aaron Kimball Date: Fri Mar 12 17:45:23 2010 -0800 HADOOP-5687. Hadoop NameNode throws NPE if fs.default.name is the default value Description: Throwing NPE is confusing; instead, an exception with a useful string description could be thrown instead. Reason: Logging improvement Author: Philip Zeyliger Ref: UNKNOWN commit 7604c6f69076effbb0c9793e114946d679f5912d Author: Aaron Kimball Date: Fri Mar 12 17:45:02 2010 -0800 HADOOP-6505. sed in build.xml fails Description: I'm not sure whether this is a Solaris thing or an ant 1.7.1 thing, but it definitely doesn't do what it is supposed to. Instead of getting SunOS-x86-32 (or whatever) I get -x86-32.

This patch replaces the sed call with tr.

Reason: OS compatibility improvement Author: Allen Wittenauer Ref: UNKNOWN commit ca662cbba6044be216b586e7359d9fc2f1dd4e4f Author: Aaron Kimball Date: Fri Mar 12 17:44:00 2010 -0800 HDFS-908. (version 2) TestDistributedFileSystem fails with Wrong FS on weird hosts Description: On the same host where I experienced HDFS-874, I also experience this failure for TestDistributedFileSystem:

Testcase: testFileChecksum took 0.492 sec
Caused an ERROR
Wrong FS: hftp://localhost.localdomain:59782/filechecksum/foo0, expected: hftp://127.0.0.1:59782
java.lang.IllegalArgumentException: Wrong FS: hftp://localhost.localdomain:59782/filechecksum/foo0, expected: hftp://127.0.0.1:59782
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:310)
at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:222)
at org.apache.hadoop.hdfs.HftpFileSystem.getFileChecksum(HftpFileSystem.java:318)
at org.apache.hadoop.hdfs.TestDistributedFileSystem.testFileChecksum(TestDistributedFileSystem.java:166)

Doesn't appear to occur on trunk or branch-0.21.

This is version two of this patch. THe previous patch fixed some systems but broke others. Reason: Bugfix Author: Todd Lipcon Ref: UNKNOWN commit 7fafe032223921ad194c69b16ab451b4aade87fa Author: Aaron Kimball Date: Fri Mar 12 17:43:41 2010 -0800 HADOOP-4368. Superuser privileges required to do "df" Description: super user privileges are required in DFS in order to get the file system statistics (FSNamesystem.java, getStats method). This means that when HDFS is mounted via fuse-dfs as a non-root user, "df" is going to return 16exabytes total and 0 free instead of the correct amount.

As far as I can tell, there's no need to require super user privileges to see the file system size (and historically in Unix, this is not required).

To fix this, simply comment out the privilege check in the getStats method.

Reason: Usability improvement Author: Craig Macdonald Ref: UNKNOWN commit 6129c87f5dd1fdb7375c80285534b8b91fbcd392 Author: Aaron Kimball Date: Fri Mar 12 17:43:25 2010 -0800 HDFS-412. Hadoop JMX usage makes Nagios monitoring impossible Description: When Hadoop reports Datanode information to JMX, the bean uses the name "DataNode-" + storageid. The storage ID incorporates a random number and is unpredictable.

This prevents me from monitoring DFS datanodes through Hadoop using the JMX interface; in order to do that, you must be able to specify the bean name on the command line.

The fix is simple, patch will be coming momentarily. However, there was probably a reason for making the datanodes all unique names which I'm unaware of, so it'd be nice to hear from the metrics maintainer.

Reason: Monitoring improvement Author: Brian Bockelman Ref: UNKNOWN commit 5dfcc6d2d7806636c6237996e1b28a00ba075b4b Author: Aaron Kimball Date: Fri Mar 12 17:43:05 2010 -0800 HADOOP-6503. contrib projects should pull in the ivy-fetched libs from the root project Description: On branch-20 currently, I get an error just running "ant contrib -Dtestcase=TestHdfsProxy". In a full "ant test" build sometimes this doesn't appear to be an issue. The problem is that the contrib projects don't automatically pull in the dependencies of the "Hadoop" ivy project. Thus, they each have to declare all of the common dependencies like commons-cli, etc. Some are missing and this causes test failures. Reason: Build system improvement Author: Todd Lipcon Ref: UNKNOWN commit be70b10f11445f4a71807405718bfeebd38ad924 Author: Aaron Kimball Date: Fri Mar 12 17:42:51 2010 -0800 MAPREDUCE-1155. Streaming tests swallow exceptions Description: Many of the streaming tests (including TestMultipleArchiveFiles) catch exceptions and print their stack trace rather than failing the job. This means that tests do not fail even when the job fails. Reason: Test coverage improvement Author: Todd Lipcon Ref: UNKNOWN commit f84830ae5e6c862cd0e2b8ebea57880e54c8a082 Author: Aaron Kimball Date: Fri Mar 12 17:42:33 2010 -0800 HADOOP-5647. TestJobHistory fails if /tmp/_logs is not writable to. Testcase should not depend on /tmp Description: TestJobHistory sets /tmp as hadoop.job.history.user.location to check if the history file is created in that directory or not. If /tmp/_logs is already created by some other user, this test will fail because of not having write permission. Reason: Bugfix in test harness Author: Ravi Gummadi Ref: UNKNOWN commit 669b65f14d78ffd1cf0304cf459d1abbae3412ae Author: Aaron Kimball Date: Fri Mar 12 17:42:15 2010 -0800 CLOUDERA-BUILD. Fix javadoc warnings shown by test-patch, and update eclipse classpath to match current CDH. Author: Todd Lipcon commit 51804fd45d3a527a130a373c591a17c185102a0c Author: Aaron Kimball Date: Fri Mar 12 17:41:40 2010 -0800 Revert "HDFS-127: DFSClient block read failures cause open DFSInputStream to become unusable" Description: This is being reverted as it causes infinite retries when there are no valid replicas. Reason: bugfix Author: Todd Lipcon Ref: UNKNOWN commit 623bfc0c18087274315dfbd41d025a8a775abe80 Author: Aaron Kimball Date: Fri Mar 12 17:40:30 2010 -0800 HDFS-877. Client-driven block verification not functioning Description: This is actually the reason for HDFS-734 (TestDatanodeBlockScanner timing out). The issue is that DFSInputStream relies on readChunk being called one last time at the end of the file in order to receive the lastPacketInBlock=true packet from the DN. However, DFSInputStream.read checks pos < getFileLength() before issuing the read. Thus gotEOS never shifts to true and checksumOk() is never called. This is a simpler patch than the one on 0.21/0.22 since those fix a further regression since 0.20. Reason: bugfix Author: Todd Lipcon Ref: UNKNOWN commit b332fe77255047409da701dfb97df1bddb5b10cb Author: Aaron Kimball Date: Fri Mar 12 17:40:05 2010 -0800 CLOUDERA-BUILD. Add mockito to 0.20 branch for easier unit testing of HDFS stability patches. Reason: Test coverage improvement Author: Todd Lipcon commit 44a6c559de056b35c6eb2e2d53798c88d8c779e6 Author: Aaron Kimball Date: Fri Mar 12 17:39:09 2010 -0800 HDFS-630. In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block. Description: created from hdfs-200.

If during a write, the dfsclient sees that a block replica location for a newly allocated block is not-connectable, it re-requests the NN to get a fresh set of replica locations of the block. It tries this dfs.client.block.write.retries times (default 3), sleeping 6 seconds between each retry ( see DFSClient.nextBlockOutputStream).

This setting works well when you have a reasonable size cluster; if u have few datanodes in the cluster, every retry maybe pick the dead-datanode and the above logic bails out.

Our solution: when getting block location from namenode, we give nn the excluded datanodes. The list of dead datanodes is only for one block allocation.

Reason: bugfix (Fault tolerance improvement) Author: Cosmin Lehene (modified by Cloudera to not break compatibility) Ref: UNKNOWN commit 47c404e0cf10ceb31336d2a77d53e0a971348102 Author: Aaron Kimball Date: Fri Mar 12 17:37:37 2010 -0800 HDFS-908. TestDistributedFileSystem fails with Wrong FS on weird hosts Description: On the same host where I experienced HDFS-874, I also experience this failure for TestDistributedFileSystem:

Testcase: testFileChecksum took 0.492 sec
Caused an ERROR
Wrong FS: hftp://localhost.localdomain:59782/filechecksum/foo0, expected: hftp://127.0.0.1:59782
java.lang.IllegalArgumentException: Wrong FS: hftp://localhost.localdomain:59782/filechecksum/foo0, expected: hftp://127.0.0.1:59782
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:310)
at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:222)
at org.apache.hadoop.hdfs.HftpFileSystem.getFileChecksum(HftpFileSystem.java:318)
at org.apache.hadoop.hdfs.TestDistributedFileSystem.testFileChecksum(TestDistributedFileSystem.java:166)

Doesn't appear to occur on trunk or branch-0.21.

Reason: bugfix Author: Todd Lipcon Ref: UNKNOWN commit 7c2a791f0a397d924a623e45bf823c238374c42c Author: Aaron Kimball Date: Fri Mar 12 17:37:19 2010 -0800 MAPREDUCE-1258. Fair scheduler event log not logging job info Description: The MAPREDUCE-706 patch seems to have left an unfinished TODO in the Fair Scheduler - namely, in the dump() function for periodically dumping scheduler state to the event log, the part that dumps information about jobs is commented out. This makes the event log less useful than it was before.

It should be fairly easy to update this part to use the new scheduler data structures (Schedulable etc) and print the data.

Reason: Logging improvement Author: Matei Zaharia Ref: UNKNOWN commit 353f7813bf7dfb0bca1362f9370f6a080256a345 Author: Aaron Kimball Date: Fri Mar 12 17:36:58 2010 -0800 MAPREDUCE-1198. Alternatively schedule different types of tasks in fair share scheduler Description: Matei has mentioned in MAPREDUCE-961 that the current scheduler will first try to launch map tasks until canLaunthTask() returns false then look for reduce tasks. This might starve reduce task. He also mention that alternatively schedule different types of tasks can solve this problem. Reason: bugfix Author: Scott Chen Ref: UNKNOWN commit ef449fb7832055951e2364cf12a73717b2add3ce Author: Aaron Kimball Date: Fri Mar 12 17:36:50 2010 -0800 MAPREDUCE-698. Per-pool task limits for the fair scheduler Description: The fair scheduler could use a way to cap the share of a given pool similar to MAPREDUCE-532. Reason: New feature Author: Kevin Peterson Ref: UNKNOWN commit a1e25ec70e677db322b2cce43c6381f865eb3f79 Author: Aaron Kimball Date: Fri Mar 12 17:36:42 2010 -0800 HDFS-464. Memory leaks in libhdfs Description: hdfsExists does not call destroyLocalReference for jPath anytime,
hdfsDelete does not call it when it fails, and
hdfsRename does not call it for jOldPath and jNewPath when it fails Reason: bugfix Author: Christian Kunz Ref: UNKNOWN commit d93dad715d3c702d15c2a32c85d586c708e70857 Author: Aaron Kimball Date: Fri Mar 12 17:36:23 2010 -0800 CLOUDERA-BUILD. Add test ivy configurations to additional projects. Author: Aaron Kimball Reason: Build system improvement commit 5d0c8f82b87e7cbb541ace9e4f22abfad2799e56 Author: Aaron Kimball Date: Fri Mar 12 17:35:08 2010 -0800 CLOUDERA-BUILD. Sqoop bin script now includes jars from contrib/sqoop/lib/ on classpath. Author: Aaron Kimball commit 7e009a29c0806537cd50972df90ec87b617eb78f Author: Aaron Kimball Date: Fri Mar 12 17:34:54 2010 -0800 MAPREDUCE-1212. Mapreduce contrib project ivy dependencies are not included in binary target Description: As in HADOOP-6370, only Hadoop's own library dependencies are promoted to ${build.dir}/lib; any libraries required by contribs are not redistributed. Reason: Build system (packaging) improvement Author: Aaron Kimball Ref: UNKNOWN commit 8d289f97d6b66cd435f755a4acae9f138de934d6 Author: Aaron Kimball Date: Fri Mar 12 17:34:43 2010 -0800 CLOUDERA-BUILD. Update cloud script version to cdh-0.20.1 Author: Tom White commit ac7eacd44af059d7a859b8d6773a82cd84ba4c9b Author: Aaron Kimball Date: Fri Mar 12 17:34:35 2010 -0800 HADOOP-6466. Add a ZooKeeper service to the cloud scripts Description: It would be good to add other Hadoop services to the cloud scripts. Reason: New feature Author: Tom White Ref: UNKNOWN commit 06ceb079693292a41085af795c5b2bbc3fd10af2 Author: Aaron Kimball Date: Fri Mar 12 17:34:24 2010 -0800 HADOOP-6454. Create setup.py for EC2 cloud scripts Description: This would make it easier to install the scripts. Reason: Installation improvement Author: Tom White Ref: UNKNOWN commit 23c45791bbc3a23d69c77f3518b5d1a1a4702ccc Author: Aaron Kimball Date: Fri Mar 12 17:34:11 2010 -0800 HADOOP-6462. contrib/cloud failing, target "compile" does not exist Description: I'm not seeing this mentioned in hudson or other bugreports, which confuses me. With the addition of a src/contrib/cloud/build.xml from HADOOP-6426, contrib/build.xml won't build no more:
hadoop-common/src/contrib/build.xml:30: The following error occurred while executing this line:
Target "compile" does not exist in the project "hadoop-cloud".

What is odd is this: the final patch of HADOOP-6426 does include the stub <target> files needed, yet they aren't in SVN_HEAD. Which implies that a different version may have gone in than intended.

Reason: Build system bugfix Author: Tom White Ref: UNKNOWN commit 083a6a1cfb2a5198243aa82a020681ad62da5938 Author: Aaron Kimball Date: Fri Mar 12 17:33:58 2010 -0800 HADOOP-6444. Support additional security group option in hadoop-ec2 script Description: When deploying a hadoop cluster on ec2 alongside other services it is very useful to be able to specify additional (pre-existing) security groups to facilitate access control. For example one could use this feature to add a cluster to a generic "hadoop" group, which authorizes hdfs access from instances outside the cluster. Without such an option the access control for the security groups created by the script need to manually updated after cluster launch. Reason: Security improvement Author: Paul Egan Ref: UNKNOWN commit 63152ce4ba3c0cf2006016cc825fc72b0bd23d2d Author: Aaron Kimball Date: Fri Mar 12 17:33:49 2010 -0800 HADOOP-6426. Create ant build for running EC2 unit tests Description: There is no easy way currently to run the Python unit tests for the cloud contrib. Reason: Test coverage improvement Author: Tom White Ref: UNKNOWN commit a20069b2adfafa59e0001fe5e5685d36d9eb7fee Author: Aaron Kimball Date: Fri Mar 12 17:33:15 2010 -0800 HADOOP-6392. Run namenode and jobtracker on separate EC2 instances Description: Replace concept of "master" with that of "namenode" and "jobtracker". Still need to be able to run both on one node, of course. Reason: Scalability improvement Author: Tom White Ref: UNKNOWN commit 361221a2a082d0ab7a87ba0226dbe05938440738 Author: Aaron Kimball Date: Fri Mar 12 17:33:07 2010 -0800 HADOOP-6108. Add support for EBS storage on EC2 Description: By using EBS for namenode and datanode storage we can have persistent, restartable Hadoop clusters running on EC2. Reason: New feature Author: Tom White Ref: UNKNOWN commit 4ca1c78e1b257eefa10b5ed94479df8a6473d3e9 Author: Aaron Kimball Date: Fri Mar 12 17:32:50 2010 -0800 HDFS-861. fuse-dfs does not support O_RDWR Description: Some applications (for us, the big one is rsync) will open a file in read-write mode when it really only intends to read xor write (not both). fuse-dfs should try to not fail until the application actually tries to write to a pre-existing file or read from a newly created file. Reason: bugfix Author: Brian Bockelman Ref: UNKNOWN commit 00f6976093cc20ea825a35f6831f645dc5f61637 Author: Aaron Kimball Date: Fri Mar 12 17:32:17 2010 -0800 HDFS-860. fuse-dfs truncate behavior causes issues with scp Description: For whatever reason, scp issues a "truncate" once it's written a file to truncate the file to the # of bytes it has written (i.e., if a file is X bytes, it calls truncate(X)).

This fails on the current fuse-dfs.

Reason: bugfix (tool compatibility) Author: Brian Bockelman Ref: UNKNOWN commit 46d2b6d6b27887375c44d691d776f70e89e4b81b Author: Aaron Kimball Date: Fri Mar 12 17:31:58 2010 -0800 HDFS-859. fuse-dfs utime behavior causes issues with tar Description: When trying to untar files onto fuse-dfs, tar will try to set the utime on all the files and directories. However, setting the utime on a directory in libhdfs causes an error.

We should silently ignore the failure of setting a utime on a directory; this will allow tar to complete successfully.

Reason: bugfix (tool compatibility) Author: Brian Bockelman Ref: UNKNOWN commit 9a38b9c423aca358307aa6455977432f34aef990 Author: Aaron Kimball Date: Fri Mar 12 17:31:45 2010 -0800 HDFS-858. Incorrect return codes for fuse-dfs Description: fuse-dfs doesn't pass proper error codes from libhdfs; places I'd like to correct are hdfsFileOpen (which can result in permission denied or quota violations) and hdfsWrite (which can result in quota violations).

By returning the correct error codes, command line utilities return much better error messages - especially for quota violations, which can be a devil to debug.

Reason: bugfix Author: Brian Bockelman Ref: UNKNOWN commit 84afb26bb0e42eda1e26b07e3aac016695f5ad87 Author: Aaron Kimball Date: Fri Mar 12 17:31:37 2010 -0800 HDFS-857. Incorrect type for fuse-dfs capacity can cause "df" to return negative values on 32-bit machines Description: On sufficiently large HDFS installs, the casting of hdfsGetCapacity to a long may cause "df" to return negative values. tOffset should be used instead. Reason: bugfix Author: Brian Bockelman Ref: UNKNOWN commit a4cf3e8e86cbd42bef25eb3aab7e464ac86e3068 Author: Aaron Kimball Date: Fri Mar 12 17:31:19 2010 -0800 HDFS-856. Hardcoded replication level for new files in fuse-dfs Description: In fuse-dfs, the number of replicas is always hardcoded to 3 in the arguments to hdfsOpenFile. We should use the setting in the hadoop configuration instead. Reason: Configuration improvement Author: Brian Bockelman Ref: UNKNOWN commit e9f3ec90e57b383faf49e6a6eb8cc91e5182d31e Author: Aaron Kimball Date: Fri Mar 12 17:31:08 2010 -0800 HADOOP-5625. Add I/O duration time in client trace Description: Add I/O duration information into client trace log for analyzing performance. Reason: Logging improvement Author: Lei Xu Ref: UNKNOWN commit 42eeb4540850278563e76841f0c6b369933d5b70 Author: Aaron Kimball Date: Fri Mar 12 17:30:43 2010 -0800 HADOOP-5222. Add offset in client trace Description: By adding offset in client trace, the client trace information can provide more accurately information about I/O.
It is useful for performance analyzing.

Since there is no random write now, the offset of writing is always zero.

Reason: Logging improvement Author: Lei Xu Ref: UNKNOWN commit 5880960fb32ae0fc2c16bac1f333dbb237c3448f Author: Aaron Kimball Date: Fri Mar 12 17:30:27 2010 -0800 CLOUDERA-BUILD. Solaris do-release-build fix Author: Eli Collins Ref: CDH-531 commit 35f87aef6d7cd4030644a1d454da2e0a6e2969c0 Author: Aaron Kimball Date: Fri Mar 12 17:30:18 2010 -0800 MAPREDUCE-1310. CREATE TABLE statements for Hive do not correctly specify delimiters Description: Imports to HDFS via Sqoop that also inject metadata into Hive do not correctly specify delimiters; using Hive to access the data results in rows being parsed as NULL characters. See http://getsatisfaction.com/cloudera/topics/sqoop_hive_import_giving_null_query_values for an example bug report Reason: Bugfix Author: Aaron Kimball Ref: UNKNOWN commit 60784d712cdd5781ceff262bb67e2d484fde428b Author: Aaron Kimball Date: Fri Mar 12 17:29:56 2010 -0800 MAPREDUCE-1235. java.io.IOException: Cannot convert value '0000-00-00 00:00:00' from column 6 to TIMESTAMP. Description: Description: java.io.IOException is thrown when trying to import a table to HDFS using Sqoop. Table has "0" value in a field of type datetime.
Full Exception: java.io.IOException: Cannot convert value '0000-00-00 00:00:00' from column 6 to TIMESTAMP.
Original question: http://getsatisfaction.com/cloudera/topics/cant_import_table?utm_content=reply_link&utm_medium=email&utm_source=reply_notification Reason: Bugfix (compatibility) Author: Aaron Kimball Ref: UNKNOWN commit 23c116b6ab5615bdb846e22b61a41e92ca287bdf Author: Aaron Kimball Date: Fri Mar 12 17:29:47 2010 -0800 MAPREDUCE-1174. Sqoop improperly handles table/column names which are reserved sql words Description: In some databases it is legal to name tables and columns with terms that overlap SQL reserved keywords (e.g., CREATE, table, etc.). In such cases, the database allows you to escape the table and column names. We should always escape table and column names when possible. Reason: Bugfix Author: Aaron Kimball Ref: UNKNOWN commit d4b3b7592c94aa1f4608245829b5de202ed1b148 Author: Aaron Kimball Date: Fri Mar 12 17:29:39 2010 -0800 MAPREDUCE-1168. Export data to databases via Sqoop Description: Sqoop can import from a database into HDFS. It's high time it works in reverse too. Reason: New feature Author: Aaron Kimball Ref: UNKNOWN commit b29023803d1136bf7d4de45853a2d4481fb36d3c Author: Aaron Kimball Date: Fri Mar 12 17:29:24 2010 -0800 MAPREDUCE-1169. Improvements to mysqldump use in Sqoop Description: Improve Sqoop's integration with mysqldump Reason: Feature/performance improvements Author: Aaron Kimball Ref: UNKNOWN commit c6b956630e327ddabf674f8e06de02408e603155 Author: Aaron Kimball Date: Wed Jan 6 16:05:05 2010 -0800 MAPREDUCE-1169. Improvements to mysqldump use in Sqoop commit 26ba4fd749755a3df79eaa27792662e5b7e3da80 Author: Aaron Kimball Date: Fri Mar 12 17:29:15 2010 -0800 MAPREDUCE-1036. An API Specification for Sqoop Description: Over the last several months, Sqoop has evolved to a state that is functional and has room for extensions. Developing extensions requires a stable API and documentation. I am attaching to this ticket a description of Sqoop's design and internal APIs, which include some open questions. I would like to solicit input on the design regarding these open questions and standardize the API. Reason: Documentation Author: Aaron Kimball Ref: UNKNOWN commit e8c47124bb2ada5de0cfdf49150dd7296a41df71 Author: Aaron Kimball Date: Fri Mar 12 17:29:04 2010 -0800 MAPREDUCE-1069. Implement Sqoop API refactoring Description: Implement refactoring decisions outlined in MAPREDUCE-1036 Reason: API compatibility Author: Aaron Kimball Ref: UNKNOWN commit b73cab8083c1594c0328a565eef05951a17f998a Author: Aaron Kimball Date: Fri Mar 12 17:28:46 2010 -0800 MAPREDUCE-1146. Sqoop dependencies break Eclipse build on Linux Description: Under Linux there's the error in the Eclipse "Problems" view:
- "com.sun.tools cannot be resolved" at line 166 of  org.apache.hadoop.sqoop.orm.CompilationManager
    

The problem doesn't appear on MacOS though

Reason: bugfix Author: Aaron Kimball Ref: UNKNOWN commit 0629ac30abb5e58fb80be56a385867ac7360de22 Author: Aaron Kimball Date: Fri Mar 12 17:28:37 2010 -0800 MAPREDUCE-1148. SQL identifiers are a superset of Java identifiers Description: SQL identifiers can contain arbitrary characters, can start with numbers, can be words like class which are reserved in Java, etc. If Sqoop uses these names literally for class and field names then compilation errors can occur in auto-generated classes. SQL identifiers need to be cleansed to map onto Java identifiers. Reason: bugfix Author: Aaron Kimball Ref: UNKNOWN commit dec4c616921b547e5a332a254254d77efc3a7d5e Author: Aaron Kimball Date: Fri Mar 12 17:28:25 2010 -0800 MAPREDUCE-1224. Calling "SELECT t.* from AS t" to get meta information is too expensive for big tables Description: The SqlManager uses the query, "SELECT t.* from <table> AS t" to get table spec is too expensive for big tables, and it was called twice to generate column names and types. For tables that are big enough to be map-reduced, this is too expensive to make sqoop useful. Reason: Performance improvement Author: Spencer Ho Ref: UNKNOWN commit 1198ef1375387ba107d46f0ab5e9a7c6a7645931 Author: Aaron Kimball Date: Fri Mar 12 17:28:15 2010 -0800 MAPREDUCE-706. Support for FIFO pools in the fair scheduler Description: The fair scheduler should support making the internal scheduling algorithm for some pools be FIFO instead of fair sharing in order to work better for batch workloads. FIFO pools will behave exactly like the current default scheduler, sorting jobs by priority and then submission time. Pools will have their scheduling algorithm set through the pools config file, and it will be changeable at runtime.

To support this feature, I'm also changing the internal logic of the fair scheduler to no longer use deficits. Instead, for fair sharing, we will assign tasks to the job farthest below its share as a ratio of its share. This is easier to combine with other scheduling algorithms and leads to a more stable sharing situation, avoiding unfairness issues brought up in MAPREDUCE-543 and MAPREDUCE-544 that happen when some jobs have long tasks. The new preemption (MAPREDUCE-551) will ensure that critical jobs can gain their fair share within a bounded amount of time.

Reason: New feature Author: Matei Zaharia Ref: UNKNOWN commit 5699f5483e2a9ee9debd0f0154c6506ee5dc87e2 Author: Aaron Kimball Date: Fri Mar 12 17:28:03 2010 -0800 MAPREDUCE-1285. DistCp cannot handle -delete if destination is local filesystem Description: The following exception is thrown:
Copy failed: java.io.IOException: wrong value class: org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus is not class org.apache.hadoop.fs.FileStatus
    	at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:988)
    	at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:977)
    	at org.apache.hadoop.tools.DistCp.deleteNonexisting(DistCp.java:1226)
    	at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1134)
    	at org.apache.hadoop.tools.DistCp.copy(DistCp.java:650)
    	at org.apache.hadoop.tools.DistCp.run(DistCp.java:857)
    	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
Reason: bugfix Author: Peter Romianowski Ref: UNKNOWN commit 34bb813a5884aeb05909c2ce2cc541882ca3eda1 Author: Aaron Kimball Date: Fri Mar 12 17:27:53 2010 -0800 MAPREDUCE-764. TypedBytesInput's readRaw() does not preserve custom type codes Description: The typed bytes format supports byte sequences of the form <custom type code> <length> <bytes>. When reading such a sequence via TypedBytesInput's readRaw() method, however, the returned sequence currently is 0 <length> <bytes> (0 is the type code for a bytes array), which leads to bugs such as the one described here. Reason: bugfix Author: Klaas Bosteels Ref: UNKNOWN commit 7fd2cb371354219abd108fda35087f08dc481b35 Author: Aaron Kimball Date: Fri Mar 12 17:27:31 2010 -0800 HADOOP-6400. Log errors getting Unix UGI Description: For various reasons, the calls out to `whoami` and `id` can fail when trying to get the unix UGI information. Currently it silently ignores failures and uses the default DrWho/Tardis ugi. This is extremely confusing for users - we should log the exception at warn level when the shell execs fail. Reason: Debug logging improvement Author: Todd Lipcon Ref: UNKNOWN commit d6dc22fecc058e12695a481fa354078d9b012089 Author: Aaron Kimball Date: Fri Mar 12 17:27:21 2010 -0800 MAPREDUCE-1293. AutoInputFormat doesn't work with non-default FileSystems Description: AutoInputFormat uses the wrong FileSystem.get() method when getting a reference to a FileSystem object. AutoInputFormat gets the default FileSystem, so this method breaks if the InputSplit's path is pointing to a different FileSystem. Reason: bugfix Author: Andrew Hitchcock Ref: UNKNOWN commit 25a4ea86b0b085e3afd6f2f040201594155b3de1 Author: Aaron Kimball Date: Fri Mar 12 17:27:09 2010 -0800 MAPREDUCE-1131. Using profilers other than hprof can cause JobClient to report job failure Description: If task profiling is enabled, the JobClient will download the profile.out file created by the tasks under profile. If this causes an IOException, the job is reported as a failure to the client, even though all the tasks themselves may complete successfully. The expected result files are assumed to be generated by hprof. Using the profiling system with other profilers will cause job failure. Reason: compatibility bugfix Author: Aaron Kimball Ref: UNKNOWN commit ab98123c7114752945452af0b96c8de04af9ba93 Author: Aaron Kimball Date: Fri Mar 12 17:26:02 2010 -0800 MAPREDUCE-370. Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api. Description: Ports the MultipleOutputs OutputFormat to the new context-based API. Reason: API compatibility improvement. Author: Amareshwari Sriramadasu Ref: UNKNOWN commit 50726d13750f3f71d2fc5d3a012ce81aa2adb26d Author: Aaron Kimball Date: Fri Mar 12 17:24:46 2010 -0800 CLOUDERA-BUILD. Backport MapReduceTestUtil to Hadoop 0.20 Description: MapReduceTestUtil is required for unit tests in subsequent patches, but this class itself was not created in one clean JIRA. Therefore it was backported "As-is" from the trunk and not in a patch-wise fashion. This class is only used in the JUnit tests for Hadoop. Author: Aaron Kimball Reason: Testing improvement Ref: UNKNOWN commit d713dc1063afc4967381b6583ec424d2850bac63 Author: Aaron Kimball Date: Fri Mar 12 17:24:30 2010 -0800 MAPREDUCE-1059. distcp can generate uneven map task assignments Description: distcp writes out a SequenceFile containing the source files to transfer, and their sizes. Map tasks are created over spans of this file, representing files which each mapper should transfer. In practice, some transfer loads yield many empty map tasks and a few tasks perform the bulk of the work. Reason: Improvement for load balancing Author: Aaron Kimball Ref: UNKNOWN commit 855b0bf3718f2c397ef79967475468e4153f120a Author: Aaron Kimball Date: Fri Mar 12 17:24:20 2010 -0800 MAPREDUCE-1128. MRUnit Allows Iteration Twice Description: MRUnit allows one to iterate over a collection of values twice (ie.

reduce(Key key, Iterable<Value> values, Context context){ for(Value : values ) /* iterate once */; for(Value : values ) /* iterate again */; }

Hadoop will allow this as well, however the second iterator will be empty. MRUnit should either match hadoop's behavior or warn the user that their code is likely flawed.

Reason: bugfix (API compatibility) Author: Aaron Kimball Ref: UNKNOWN commit c9d77f6e1fdbb24b45675e363e3bd5111533893a Author: Aaron Kimball Date: Fri Mar 12 17:24:10 2010 -0800 HDFS-464. Memory leaks in libhdfs Description: hdfsExists does not call destroyLocalReference for jPath anytime,
hdfsDelete does not call it when it fails, and
hdfsRename does not call it for jOldPath and jNewPath when it fails Reason: bugfix Author: Christian Kunz Ref: UNKNOWN commit c7996c5e2fbb9260740fec369550551d6320762a Author: Aaron Kimball Date: Fri Mar 12 17:23:51 2010 -0800 HDFS-423. Unbreak FUSE build and fuse_dfs_wrapper.sh Description: fuse-dfs depends on libhdfs, and fuse-dfs build.xml still points to the libhfds/libhdfs.so location but libhdfs now is build in a different location
please take a look at this bug for the location details

https://issues.apache.org/jira/browse/HADOOP-3344

Thanks,
Giri

Reason: Build system bugfix Author: Eli Collins Ref: UNKNOWN commit 72b0b791cd347e760807a44f5197599f57afde03 Author: Aaron Kimball Date: Fri Mar 12 17:23:39 2010 -0800 CLOUDERA-BUILD. Make bin/hadoop-config.sh work with dev builds Author: Eli Collins commit a9466041ccfcdb07f4f0dd34a57c9e9bdd6a3e70 Author: Aaron Kimball Date: Fri Mar 12 17:23:06 2010 -0800 HDFS-727. bug setting block size hdfsOpenFile Description: In hdfsOpenFile in libhdfs invokeMethod needs to cast the block size argument to a jlong so a full 8 bytes are passed (rather than 4 plus some garbage which causes writes to fail due to a bogus block size). Reason: Bugfix Author: Eli Collins Ref: UNKNOWN commit 4e7d205daa86d904614252101bb422664ab6d203 Author: Aaron Kimball Date: Fri Mar 12 17:22:47 2010 -0800 Revert MAPREDUCE-967. TaskTracker does not need to fully unjar job jars Author: Todd Lipcon Ref: UNKNOWN commit d5f0c77a6c81e9e56da81976645614280247f7a2 Author: Aaron Kimball Date: Fri Mar 12 17:22:18 2010 -0800 HADOOP-5640. Allow ServicePlugins to hook callbacks into key service events Description: HADOOP-5257 added the ability for NameNode and DataNode to start and stop ServicePlugin implementations at NN/DN start/stop. However, this is insufficient integration for some common use cases.

We should add some functionality for Plugins to subscribe to events generated by the service they're plugging into. Some potential hook points are:

NameNode:

  • new datanode registered
  • datanode has died
  • exception caught
  • etc?

DataNode:

  • startup
  • initial registration with NN complete (this is important for HADOOP-4707 to sync up datanode.dnRegistration.name with the NN-side registration)
  • namenode reconnect
  • some block transfer hooks?
  • exception caught

I see two potential routes for implementation:

1) We make an enum for the types of hookpoints and have a general function in the ServicePlugin interface. Something like:

enum HookPoint {
      DN_STARTUP,
      DN_RECEIVED_NEW_BLOCK,
      DN_CAUGHT_EXCEPTION,
     ...
    }
    
    void runHook(HookPoint hp, Object value);

2) We make classes specific to each "pluggable" as was originally suggested in HADDOP-5257. Something like:

class DataNodePlugin {
      void datanodeStarted() {}
      void receivedNewBlock(block info, etc) {}
      void caughtException(Exception e) {}
      ...
    }

I personally prefer option (2) since we can ensure plugin API compatibility at compile-time, and we avoid an ugly switch statement in a runHook() function.

Interested to hear what people's thoughts are here.

HADOOP-5640 puts this in the new test dir. It needs to be in the old one. Reason: Improvement Author: Todd Lipcon Ref: UNKNOWN commit e9b04609d88ed5d1af442ee950aa5dcd6646e830 Author: Aaron Kimball Date: Fri Mar 12 17:22:08 2010 -0800 MAPREDUCE-1017. Compression and output splitting for Sqoop Description: Sqoop "direct mode" writing will generate a single large text file in HDFS. It is important to be able to compress this data before it reaches HDFS. Due to the difficulty in splitting compressed files in HDFS for use by MapReduce jobs, data should also be split at compression time. Reason: New feature Author: Aaron Kimball Ref: UNKNOWN commit 8c9b473e1af036a3e2cc9036a945a4567277db8a Author: Aaron Kimball Date: Fri Mar 12 17:21:14 2010 -0800 HADOOP-6312. Configuration sends too much data to log4j Description: Configuration objects send a DEBUG-level log message every time they're instantiated, which include a full stack trace. This is more appropriate for TRACE-level logging, as it renders other debug logs very hard to read. Reason: Logging improvement Author: Aaron Kimball Ref: UNKNOWN commit 698fe169f31e54111d30e4420cd1c1c5eaeecdec Author: Aaron Kimball Date: Fri Mar 12 17:21:03 2010 -0800 HDFS-686. NullPointerException is thrown while merging edit log and image Description: Our secondary name node is not able to start on NullPointerException:
ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.lang.NullPointerException
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1232)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1221)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:776)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:992)
at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:590)
at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225)
at java.lang.Thread.run(Thread.java:619)

This was caused by setting access time on a non-existent file.

Reason: bugfix Author: Hairong Kuang Ref: UNKNOWN commit b2cc8e02f37a1604bb076acefff0ebf016c249d5 Author: Aaron Kimball Date: Fri Mar 12 17:20:40 2010 -0800 MAPREDUCE-112. Reduce Input Records and Reduce Output Records counters are not being set when using the new Mapreduce reducer API Description: After running the examples/wordcount (which uses the new API), the reduce input and output record counters always show 0. This is because these counters are not getting updated in the new API This adds counters for reduce input, output records to the new API. Reason: Bugfix Author: Jothi Padmanabhan Ref: UNKNOWN commit 3e62477434542dc3de89fd43fd9b19abaf76f0de Author: Aaron Kimball Date: Fri Mar 12 17:20:00 2010 -0800 MAPREDUCE-768. Configuration information should generate dump in a standard format. Description: We need to generate the configuration dump in a standard format . This adds the 'hadoop jobtracker -dumpConfiguration' command. This is modified from the original patch in that it does not dump QueueManager configuration. This is because we have not backported HADOOP-5396 Reason: New feature Author: V.V.Chaitanya Krishna Ref: UNKNOWN commit 4d9333b00772455a1ca7a365fa5b5b2f6872abd7 Author: Aaron Kimball Date: Fri Mar 12 17:19:46 2010 -0800 HADOOP-6184. Provide a configuration dump in json format. Description: Configuration dump in json format. Reason: New feature Author: V.V.Chaitanya Krishna Ref: UNKNOWN commit 96244c3e7d6735f450b618fdcbdbbf9a81436ba3 Author: Aaron Kimball Date: Fri Mar 12 17:19:27 2010 -0800 CLOUDERA-BUILD. Duplicated effort. FULL_VERSION already set in package.mk Description: Revert "Need to pass in FULL_VERSION" Author: Chad Metcalf commit 604d3a71334b9340a6219e3b88bf563b79f5d083 Author: Aaron Kimball Date: Fri Mar 12 17:19:11 2010 -0800 CLOUDERA-BUILD. Copy the sqoop manpage to the expected version number Author: Chad Metcalf commit 6d428f70591a92a90dca5256968c62a510659240 Author: Aaron Kimball Date: Fri Mar 12 17:18:58 2010 -0800 CLOUDERA-BUILD. Bump jdiff stable to 0.20.1 Author: Chad Metcalf commit 46ffc9aa9260a96bdf67fbaee9a2acd76cfcf675 Author: Aaron Kimball Date: Fri Mar 12 17:18:44 2010 -0800 CLOUDERA-BUILD. Need to pass in FULL_VERSION Author: Chad Metcalf commit aa7ae9d9826866f94ecfe5629d087ef68e4b5c54 Author: Aaron Kimball Date: Fri Mar 12 17:18:29 2010 -0800 MAPREDUCE-999. Improve Sqoop test speed and refactor tests Description: Sqoop's tests take a long time to run, but this can be improved (by a factor of 2 or more) by taking advantage of jobclient.completion.poll.interval. Reason: Testing performance improvement Author: Aaron Kimball Ref: UNKNOWN commit 084c390ed5fcb03c456121c8497759b40a74f809 Author: Aaron Kimball Date: Fri Mar 12 17:18:13 2010 -0800 MAPREDUCE-1089. Fair Scheduler preemption triggers NPE when tasks are scheduled but not running Description: We see exceptions like this when preemption runs when a task has been scheduled on a TT but has not yet started running.

2009-10-09 14:30:53,989 INFO org.apache.hadoop.mapred.FairScheduler: Should preempt 2 MAP tasks for job_200910091420_0006: tasksDueToMinShare = 2, tasksDueToFairShare = 0
2009-10-09 14:30:54,036 ERROR org.apache.hadoop.mapred.FairScheduler: Exception in fair scheduler UpdateThread
java.lang.NullPointerException
at org.apache.hadoop.mapred.FairScheduler$2.compare(FairScheduler.java:1015)
at org.apache.hadoop.mapred.FairScheduler$2.compare(FairScheduler.java:1013)
at java.util.Arrays.mergeSort(Arrays.java:1270)
at java.util.Arrays.sort(Arrays.java:1210)
at java.util.Collections.sort(Collections.java:159)
at org.apache.hadoop.mapred.FairScheduler.preemptTasks(FairScheduler.java:1013)
at org.apache.hadoop.mapred.FairScheduler.preemptTasksIfNecessary(FairScheduler.java:911)
at org.apache.hadoop.mapred.FairScheduler$UpdateThread.run(FairScheduler.java:286)

Reason: Bugfix Author: Todd Lipcon Ref: UNKNOWN commit 34ca2a5547398f9435a5d3d22603d0f7da420226 Author: Aaron Kimball Date: Fri Mar 12 17:17:48 2010 -0800 MAPREDUCE-551. Add preemption to the fair scheduler Description: Task preemption is necessary in a multi-user Hadoop cluster for two reasons: users might submit long-running tasks by mistake (e.g. an infinite loop in a map program), or tasks may be long due to having to process large amounts of data. The Fair Scheduler (HADOOP-3746) has a concept of guaranteed capacity for certain queues, as well as a goal of providing good performance for interactive jobs on average through fair sharing. Therefore, it will support preempting under two conditions:
1) A job isn't getting its guaranteed share of the cluster for at least T1 seconds.
2) A job is getting significantly less than its fair share for T2 seconds (e.g. less than half its share).

T1 will be chosen smaller than T2 (and will be configurable per queue) to meet guarantees quickly. T2 is meant as a last resort in case non-critical jobs in queues with no guaranteed capacity are being starved.

When deciding which tasks to kill to make room for the job, we will use the following heuristics:

  • Look for tasks to kill only in jobs that have more than their fair share, ordering these by deficit (most overscheduled jobs first).
  • For maps: kill tasks that have run for the least amount of time (limiting wasted time).
  • For reduces: similar to maps, but give extra preference for reduces in the copy phase where there is not much map output per task (at Facebook, we have observed this to be the main time we need preemption - when a job has a long map phase and its reducers are mostly sitting idle and filling up slots).
This fixes an error in the previous backport where the EagerTaskInitializationListener wasn't properly passed the TaskTrackerManager before starting. Reason: New feature Author: Matei Zaharia Ref: UNKNOWN commit a3e29eff0b9337a1007ec1b90ccb832dca5c1d20 Author: Aaron Kimball Date: Fri Mar 12 17:17:33 2010 -0800 CLOUDERA-BUILD. Fix hadoop wrapper to properly pass through multiword quoted arguments Author: Todd Lipcon commit 975647b6c3a6644cabbd48bf14e074a0efda2cb9 Author: Aaron Kimball Date: Fri Mar 12 17:17:15 2010 -0800 CLOUDERA-BUILD. Sqoop documentation is now part of the generated tarball. Updated the install script to reflect that change. Author: Matt Massie commit 19c038a6af07e3999e83a2178d2328535e00dedb Author: Aaron Kimball Date: Fri Mar 12 17:16:55 2010 -0800 CLOUDERA-BUILD. Generate the sqoop documentation and ensure that it's in the release tarball Author: Matt Massie commit 6957626991875302f33bb73630f4f376412f9711 Author: Aaron Kimball Date: Fri Mar 12 17:16:43 2010 -0800 CLOUDERA-BUILD. More changes to get debs building correctly Author: Chad Metcalf commit 67d1c732cea0eebf59de512301ae8f2a1cb2f349 Author: Aaron Kimball Date: Fri Mar 12 17:16:30 2010 -0800 CLOUDERA-BUILD. Reformatted Sqoop manpage asciidoc for CDH build process Author: Aaron Kimball commit af158d6aa7ffe72d931bc4763ace7d4a299d077b Author: Aaron Kimball Date: Fri Mar 12 17:16:14 2010 -0800 CLOUDERA-BUILD. Only rerun libtoolize if version 2.2 is installed Author: Todd Lipcon commit 586992381042e1b4ec8c9ece069561ad2e4dfcc0 Author: Aaron Kimball Date: Fri Mar 12 17:15:42 2010 -0800 HADOOP-6279. Add JVM memory usage to JvmMetrics Description: The JvmMetrics currently publish memory usage from the MemoryMXBean. This is useful, but doesn't include the total heap size (eg as displayed in the JT Web UI).

It would be nice to expose Runtime.getRuntime().maxMemory() as part of JvmMetrics.

It seems that Runtime.getRuntime().totalMemory() (used by the JT for "memory used") is the same as the 'memHeapCommittedM' which already exists.

Reason: Metrics improvement Author: Todd Lipcon Ref: UNKNOWN commit 7c168a8a2613d93e19508a91e7c4db3b3cfb503b Author: Aaron Kimball Date: Fri Mar 12 17:15:26 2010 -0800 HADOOP-6269. Missing synchronization for defaultResources in Configuration.addResource Description: Configuration.defaultResources is a simple ArrayList. In two places in Configuration it is accessed without appropriate synchronization, which we've seen to occasionally result in ConcurrentModificationExceptions. Reason: bugfix (race condition) Author: Sreekanth Ramakrishnan Ref: UNKNOWN commit 8bf845170decdcb12254bc1dc98ccbf0fda7d233 Author: Aaron Kimball Date: Fri Mar 12 17:15:01 2010 -0800 CLOUDERA-BUILD. Recreate c++ configure files during build if we have the right build dependencies Author: Todd Lipcon commit e7e9812fa7a6a256652f2f6bbb269334f883c53b Author: Aaron Kimball Date: Fri Mar 12 17:14:43 2010 -0800 CLOUDERA-BUILD. Package sqoop docs w/o requiring asciidoc Author: Chad Metcalf Ref: UNKNOWN commit 7171eabfad501d635b1da9e0287f50e025b4a83f Author: Aaron Kimball Date: Fri Mar 12 17:13:39 2010 -0800 CLOUDERA-BUILD. Revert "Package sqoop docs." Description: This reverts packaging of sqoop documentation in preparation for including MAPREDUCE-906 properly after it has been committed to Apache. Author: Chad Metcalf Ref: UNKNOWN commit 4bd437c9d70f2c0d68047e0376a7af21cc4a70e0 Author: Aaron Kimball Date: Fri Mar 12 17:13:17 2010 -0800 HADOOP-5891. If dfs.http.address is default, SecondaryNameNode can't find NameNode Description: As detailed in this blog post:
http://www.cloudera.com/blog/2009/02/10/multi-host-secondarynamenode-configuration/
if dfs.http.address is not configured, and the 2NN is a different machine from the NN, the 2NN fails to connect.

In SecondaryNameNode.getInfoServer, the 2NN should notice a "0.0.0.0" dfs.http.address and, in that case, pull the hostname out of fs.default.name. This would fix the default configuration to work properly for most users.

Reason: Configuration improvement Author: Todd Lipcon Ref: UNKNOWN commit 74e10e4a137b2aa60ab39186115350b5e82464fc Author: Aaron Kimball Date: Fri Mar 12 17:11:50 2010 -0800 HDFS-127. DFSClient block read failures cause open DFSInputStream to become unusable Description: We are using some Lucene indexes directly from HDFS and for quite long time we were using Hadoop version 0.15.3.

When tried to upgrade to Hadoop 0.19 - index searches started to fail with exceptions like:
2008-11-13 16:50:20,314 WARN [Listener-4] [] DFSClient : DFS Read: java.io.IOException: Could not obtain block: blk_5604690829708125511_15489 file=/usr/collarity/data/urls-new/part-00000/20081110-163426/_0.tis
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1708)
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1536)
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1663)
at java.io.DataInputStream.read(DataInputStream.java:132)
at org.apache.nutch.indexer.FsDirectory$DfsIndexInput.readInternal(FsDirectory.java:174)
at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:152)
at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:76)
at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:63)
at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:131)
at org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java:162)
at org.apache.lucene.index.TermInfosReader.scanEnum(TermInfosReader.java:223)
at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:217)
at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:54)
...

The investigation showed that the root of this issue is that we exceeded # of xcievers in the data nodes and that was fixed by changing configuration settings to 2k.
However - one thing that bothered me was that even after datanodes recovered from overload and most of client servers had been shut down - we still observed errors in the logs of running servers.
Further investigation showed that fix for HADOOP-1911 introduced another problem - the DFSInputStream instance might become unusable once number of failures over lifetime of this instance exceeds configured threshold.

The fix for this specific issue seems to be trivial - just reset failure counter before reading next block (patch will be attached shortly).

This seems to be also related to HADOOP-3185, but I'm not sure I really understand necessity of keeping track of failed block accesses in the DFS client.

HADOOP-4681: Also referenced This as-yet-uncommitted patch is recommended by HBase people. Applied patch "4681.patch" attached to the JIRA on 2008-11-18. Reason: Bugfix Author: Igor Bolotin Ref: UNKNOWN commit ca547d89042fff3a38c0c93b6e0ece78e74ae064 Author: Aaron Kimball Date: Fri Mar 12 17:11:10 2010 -0800 HADOOP-4655. FileSystem.CACHE should be ref-counted Description: FileSystem.CACHE is not ref-counted, and could lead to resource leakage. Adds new method FileSystem.newInstance() that always returns a newly allocated FileSystem object. Reason: Bugfix Author: dhruba borthakur Ref: UNKNOWN commit 15660507606b32c3c6c2878f8ed69fe106119bc9 Author: Aaron Kimball Date: Fri Mar 12 17:10:51 2010 -0800 MAPREDUCE-967. TaskTracker does not need to fully unjar job jars Description: In practice we have seen some users submitting job jars that consist of 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning up after them has a significant cost (both in wall clock and in unnecessary heavy disk utilization). This cost can be easily avoided Reason: Performance improvement Author: Todd Lipcon Ref: UNKNOWN commit 648e30e074a16de837fb4c604a198bc780c2e6c5 Author: Aaron Kimball Date: Fri Mar 12 17:10:34 2010 -0800 MAPREDUCE-968. NPE in distcp encountered when placing _logs directory on S3FileSystem Description: If distcp is pointed to an empty S3 bucket as the destination for an s3:// filesystem transfer, it will fail with the following exception

Copy failed: java.lang.NullPointerException
at org.apache.hadoop.fs.s3.S3FileSystem.makeAbsolute(S3FileSystem.java:121)
at org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:332)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:633)
at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1005)
at org.apache.hadoop.tools.DistCp.copy(DistCp.java:650)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:857)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:884)

Reason: Bugfix Author: Aaron Kimball Ref: UNKNOWN commit a61718b87c36dbeddcc6f9917438f81ebdda0214 Author: Aaron Kimball Date: Fri Mar 12 17:10:22 2010 -0800 HADOOP-6133. ReflectionUtils performance regression Description: HADOOP-4187 introduced extra calls to Class.forName in ReflectionUtils.setConf. This caused a fairly large performance regression. Attached is a microbenchmark that shows the following timings (ms) for 100M constructions of new instances:

Explicit construction (new Test): around ~1.6sec
Using Test.class.newInstance: around ~2.6sec
ReflectionUtils on 0.18.3: ~8.0sec
ReflectionUtils on 0.20.0: ~200sec

This illustrates the ~80x slowdown caused by HADOOP-4187.

Reason: Performance improvement Author: Todd Lipcon Ref: UNKNOWN commit 5e299f831420ed52569eefc5ba815359a0ebc64e Author: Chad Metcalf Date: Tue Sep 15 22:21:42 2009 -0700 HADOOP-6133: ReflectionUtils performance regression commit b6f790774d34ed34bb7c649142dc770c25121ac3 Author: Aaron Kimball Date: Fri Mar 12 17:10:13 2010 -0800 HADOOP-5981. HADOOP-2838 doesnt work as expected Description: The substitution feature i.e X=$X:/tmp doesnt work as expected.

This issue completes the feature mentioned in HADOOP-2838. HADOOP-2838 provided a way to set env variables in child process. This issue provides a way to inherit tt's env variables and append or reset it. So now
X=$X:y will inherit X (if there) and append y to it.

Reason: Bugfix Author: Amar Kamat Ref: UNKNOWN commit eb635e4de3a8b2b5bd9f34225770f24be42dcd83 Author: Chad Metcalf Date: Tue Sep 15 22:29:50 2009 -0700 HADOOP-5981: HADOOP-2838 doesnt work as expected commit 5d4e93d8e0df3c445f56c5eb51965eef92bebd78 Author: Aaron Kimball Date: Fri Mar 12 17:09:46 2010 -0800 HADOOP-2838. Add HADOOP_LIBRARY_PATH config setting so Hadoop will include external directories for jni Description: Currently there is no way to configure Hadoop to use external JNI directories. I propose we add a new variable like HADOOP_CLASS_PATH that is added to the JAVA_LIBRARY_PATH before the process is run.

Now the users can set environment variables using mapred.child.env. They can do the following
X=Y : set X to Y
X=$X:Y : Append Y to X (which should be taken from the tasktracker)

Reason: Improves job launch flexibility Author: Amar Kamat Ref: UNKNOWN commit 9b3fc32fa793b338dc700a7f6c437402f80d6b7f Author: Chad Metcalf Date: Tue Sep 15 22:09:57 2009 -0700 HADOOP-2838: Add HADOOP_LIBRARY_PATH config setting so Hadoop will include external directories for jni commit 877429c3f94a1e937fbe29b4cbe8da573831d802 Author: Aaron Kimball Date: Fri Mar 12 17:09:31 2010 -0800 MAPREDUCE-814. Move completed Job history files to HDFS Description: Currently completed job history files remain on the jobtracker node. Having the files available on HDFS will enable clients to access these files more easily. Reason: New feature Author: Sharad Agarwal Ref: UNKNOWN commit c0575c0908fee4ec01f5bc0abbd7f4b2254dd38e Author: Chad Metcalf Date: Tue Sep 15 18:15:17 2009 -0700 MAPREDUCE-814: Move completed Job history files to HDFS commit a8bf06eac5312ede0982118801e4495285a442fe Author: Aaron Kimball Date: Fri Mar 12 17:08:12 2010 -0800 MAPREDUCE-693. Conf files not moved to "done" subdirectory after JT restart Description: After MAPREDUCE-516, when a job is submitted and the JT is restarted (before job files have been written) and the job is killed after recovery, the conf files fail to be moved to the "done" subdirectory.
The exact scenario to reproduce this issue is:
  • Submit a job
  • Restart JT before anything is written to the job files
  • Kill the job
  • The old conf files remain in the history folder and fail to be moved to "done" subdirectory
Reason: bugfix Author: Amar Kamat Ref: UNKNOWN commit cc22e9f92db6470d244fb17f57601b93bab6db80 Author: Aaron Kimball Date: Fri Mar 12 17:07:55 2010 -0800 MAPREDUCE-683. TestJobTrackerRestart fails with Map task completion events ordering mismatch Description: TestJobTrackerRestart fails consistently with Map task completion events ordering mismatch error. Reason: bugfix Author: Amar Kamat Ref: UNKNOWN commit 57a67dff5d15e3833c7968254df076e440de2765 Author: Aaron Kimball Date: Fri Mar 12 17:07:39 2010 -0800 MAPREDUCE-416. Move the completed jobs' history files to a DONE subdirectory inside the configured history directory Description: Whenever a job completes, the history file can be moved to a directory called DONE. That would make the management of job history files easier (for example, administrators can move the history files from that directory to some other place, delete them, archive them, etc.). Reason: System management improvement Author: Amar Kamat Ref: UNKNOWN commit 99dfdb9a98e1ebd643f47877be3541962c32dcd0 Author: Aaron Kimball Date: Fri Mar 12 17:07:18 2010 -0800 HADOOP-5733. Add map/reduce slot capacity and lost map/reduce slot capacity to JobTracker metrics Description: It would be nice to have the actual map/reduce slot capacity and the lost map/reduce slot capacity (# of blacklisted nodes * map-slot-per-node or reduce-slot-per-node). This information can be used to calculate a JT view of slot utilization. Reason: Metrics improvement Author: Sreekanth Ramakrishnan Ref: UNKNOWN commit 955fe9433b13f21079f92e4035393b683486ad07 Author: Aaron Kimball Date: Fri Mar 12 17:05:59 2010 -0800 HADOOP-5738. Split waiting tasks field in JobTracker metrics to individual tasks Description: Currently, job tracker metrics reports waiting tasks as a single field in metrics. It would be better if we can split waiting tasks into maps and reduces. Reason: User experience improvement Author: Sreekanth Ramakrishnan Ref: UNKNOWN commit 3b8f77cd452c1098c6af5907b787bf9167df806b Author: Aaron Kimball Date: Fri Mar 12 17:05:48 2010 -0800 HADOOP-5442. The job history display needs to be paged Description: Currently the list of job history will try to render the entire list of jobs that have run. That doesn't scale up as more and more jobs run on a job tracker. Reason: Scalability improvement Author: Amar Kamat Ref: UNKNOWN commit dfac0482267aaf0fabac97c163e0015306ec5b16 Author: Aaron Kimball Date: Fri Mar 12 17:05:16 2010 -0800 HADOOP-4842. Streaming combiner should allow command, not just JavaClass Description: Streaming jobs are way slower than Java jobs for many reasons, but certainly stopping the shell-only programmer from using the combiner feature won't help. Right now, the streaming usage says:

-mapper <cmd|JavaClassName> The streaming command to run
-combiner <JavaClassName> Combiner has to be a Java class
-reducer <cmd|JavaClassName> The streaming command to run

Reason: Usability improvement Author: Amareshwari Sriramadasu Ref: UNKNOWN commit 33e4f0a87effa466914e292488c47977245edc96 Author: Aaron Kimball Date: Fri Mar 12 17:04:06 2010 -0800 MAPREDUCE-987. Exposing MiniDFS and MiniMR clusters as a single process command-line Description: It's hard to test non-Java programs that rely on significant mapreduce functionality. The patch I'm proposing shortly will let you just type "bin/hadoop jar hadoop-hdfs-hdfswithmr-test.jar minicluster" to start a cluster (internally, it's using Mini{MR,HDFS}Cluster) with a specified number of daemons, etc. A test that checks how some external process interacts with Hadoop might start minicluster as a subprocess, run through its thing, and then simply kill the java subprocess.

I've been using just such a system for a couple of weeks, and I like it. It's significantly easier than developing a lot of scripts to start a pseudo-distributed cluster, and then clean up after it. I figure others might find it useful as well.

I'm at a bit of a loss as to where to put it in 0.21. hdfs-with-mr tests have all the required libraries, so I've put it there. I could conceivably split this into "minimr" and "minihdfs", but it's specifically the fact that they're configured to talk to each other that I like about having them together. And one JVM is better than two for my test programs.

Reason: Testing feature Author: Philip Zeyliger Ref: UNKNOWN commit 39ff7e5ee285df97c765a73271066df718be0e30 Author: Aaron Kimball Date: Fri Mar 12 17:03:23 2010 -0800 HADOOP-6267. build-contrib.xml unnecessarily enforces that contrib projects be located in contrib/ dir Description: build-contrib.xml currently sets hadoop.root to ${basedir}/../../../. This path is relative to the contrib project which is assumed to be inside src/contrib/. We occasionally work on contrib projects in other repositories until they're ready to contribute. We can use the <dirname> ant task to do this more correctly. Reason: Build system improvement Author: Todd Lipcon Ref: UNKNOWN commit 139bea6660193cc73852832e03fe570437343e96 Author: Aaron Kimball Date: Fri Mar 12 15:02:55 2010 -0800 HDFS-528. Add ability for safemode to wait for a minimum number of live datanodes Description: When starting up a fresh cluster programatically, users often want to wait until DFS is "writable" before continuing in a script. "dfsadmin -safemode wait" doesn't quite work for this on a completely fresh cluster, since when there are 0 blocks on the system, 100% of them are accounted for before any DNs have reported.

This JIRA is to add a command which waits until a certain number of DNs have reported as alive to the NN.

Reason: New feature Author: Todd Lipcon Ref: UNKNOWN commit b301746d45bde2759535549f87c6485f4ee577b2 Author: Aaron Kimball Date: Fri Mar 12 15:02:38 2010 -0800 HADOOP-4936. Improvements to TestSafeMode Description: TestSafeMode
  • needs a detailed description of the test case
  • should not use direct calls to the name-node rather call DistributedFileSystem methods.
Reason: Test coverage improvement Author: Konstantin Shvachko Ref: UNKNOWN commit f04a321596a513e71354f2a6829b44e474077507 Author: Aaron Kimball Date: Fri Mar 12 15:02:22 2010 -0800 HADOOP-5650. Namenode log that indicates why it is not leaving safemode may be confusing Description: A namenode with a large number of datablocks is setup with dfs.safemode.threshold.pct set to 1.0. With a small number of unreported blocks, namenode prints the following as the reason for not leaving safe mode:
The ratio of reported blocks 1.0000 has not reached the threshold 1.0000

With a large number of blocks, precision used for printing the log may not indicate the difference between the actual ratio of safe blocks to total blocks and the configured threshold. Printing number of blocks instead of ratio will improve the clarity.

Reason: User experience improvement Author: Suresh Srinivas Ref: UNKNOWN commit 13e35e654c51a5b1cfe809ef1e2c4d2ca46ed612 Author: Aaron Kimball Date: Fri Mar 12 15:01:52 2010 -0800 HADOOP-4675. Current Ganglia metrics implementation is incompatible with Ganglia 3.1 Description: Ganglia changed its wire protocol in the 3.1.x series; the current implementation only works for 3.0.x. Patched using https://issues.apache.org/jira/secure/attachment/12407207/HADOOP-4675-v7.patch Reason: Compatibility improvement Author: Brian Bockelman Ref: UNKNOWN commit dcf76896b1c8a7b891995b1546eef6ea3018e7ca Author: Philip Zeyliger Date: Tue Jul 28 15:28:18 2009 -0700 HADOOP-4675. Current Ganglia metrics implementation is incompatible with Ganglia 3.1 Patched using https://issues.apache.org/jira/secure/attachment/12407207/HADOOP-4675-v7.patch commit 4305750d026b895b3afbd0d4a4ee4b3b42596016 Author: Aaron Kimball Date: Fri Mar 12 15:01:29 2010 -0800 HADOOP-6269. Missing synchronization for defaultResources in Configuration.addResource Description: Configuration.defaultResources is a simple ArrayList. In two places in Configuration it is accessed without appropriate synchronization, which we've seen to occasionally result in ConcurrentModificationExceptions. Reason: Bugfix (race condition) Author: Sreekanth Ramakrishnan Ref: UNKNOWN commit 90f9c40df18fe464383de52e3d3952638a393e34 Author: Aaron Kimball Date: Fri Mar 12 15:01:08 2010 -0800 CLOUDERA-BUILD. Make some JT methods and classes public for use from within contrib plugins Author: Henry Robinson commit f8e0599a434e1ce94158384f575e912e9f988229 Author: Aaron Kimball Date: Fri Mar 12 14:59:40 2010 -0800 MAPREDUCE-461. Enable ServicePlugins for the JobTracker Description: Allow ServicePlugins (see HADOOP-5257) for the JobTracker. (Relies on HADOOP-5640) Reason: API Improvement Author: Todd Lipcon Ref: UNKNOWN commit c58318cfa6e26b7dbacd4093d646fc8b66f9eda6 Author: Aaron Kimball Date: Fri Mar 12 14:58:23 2010 -0800 HADOOP-5640. Allow ServicePlugins to hook callbacks into key service events Description: HADOOP-5257 added the ability for NameNode and DataNode to start and stop ServicePlugin implementations at NN/DN start/stop. However, this is insufficient integration for some common use cases.

We should add some functionality for Plugins to subscribe to events generated by the service they're plugging into. Some potential hook points are:

NameNode:

  • new datanode registered
  • datanode has died
  • exception caught
  • etc?

DataNode:

  • startup
  • initial registration with NN complete (this is important for HADOOP-4707 to sync up datanode.dnRegistration.name with the NN-side registration)
  • namenode reconnect
  • some block transfer hooks?
  • exception caught

I see two potential routes for implementation:

1) We make an enum for the types of hookpoints and have a general function in the ServicePlugin interface. Something like:

enum HookPoint {
      DN_STARTUP,
      DN_RECEIVED_NEW_BLOCK,
      DN_CAUGHT_EXCEPTION,
     ...
    }
    
    void runHook(HookPoint hp, Object value);

2) We make classes specific to each "pluggable" as was originally suggested in HADDOP-5257. Something like:

class DataNodePlugin {
      void datanodeStarted() {}
      void receivedNewBlock(block info, etc) {}
      void caughtException(Exception e) {}
      ...
    }

I personally prefer option (2) since we can ensure plugin API compatibility at compile-time, and we avoid an ugly switch statement in a runHook() function.

Interested to hear what people's thoughts are here.

Reason: API Improvement Author: Todd Lipcon Ref: UNKNOWN commit 137999a0b48a81bed10a5f30868dbfe6d176956b Author: Aaron Kimball Date: Fri Mar 12 14:58:09 2010 -0800 HADOOP-5257. Export namenode/datanode functionality through a pluggable RPC layer Description: Adding support for pluggable components would allow exporting DFS functionallity using arbitrary protocols, like Thirft or Protocol Buffers. I'm opening this issue on Dhruba's suggestion in HADOOP-4707.

Plug-in implementations would extend this base class:

abstract class Plugin {
    
        public abstract datanodeStarted(DataNode datanode);
    
        public abstract datanodeStopping();
    
        public abstract namenodeStarted(NameNode namenode);
    
        public abstract namenodeStopping();
    }

Name node instances would then start the plug-ins according to a configuration object, and would also shut them down when the node goes down:

public class NameNode {
    
        // [..]
    
        private void initialize(Configuration conf)
            // [...]
            for (Plugin p: PluginManager.loadPlugins(conf))
              p.namenodeStarted(this);
        }
    
        // [..]
    
        public void stop() {
            if (stopRequested)
                return;
            stopRequested = true;
            for (Plugin p: plugins)
                p.namenodeStopping();
            // [..]
        }
    
        // [..]
    }

Data nodes would do a similar thing in DataNode.startDatanode() and DataNode.shutdown

Reason: MISSING: Reason for inclusion Author: Carlos Valiente Ref: UNKNOWN commit 155394ca5eed2e2a6151a5c9d9452e9cfbb30a11 Author: Aaron Kimball Date: Fri Mar 12 14:57:58 2010 -0800 MAPREDUCE-971. distcp does not always remove distcp.tmp.dir Description: Sometimes distcp leaves behind its tmpdir when the target filesystem is s3n. Reason: Bugfix Author: Aaron Kimball Ref: UNKNOWN commit 7575b83ba0cab30394bad0943ff906ab0609dc40 Author: Aaron Kimball Date: Fri Mar 12 14:57:49 2010 -0800 CLOUDERA-BUILD. Package sqoop docs. commit 9321b18352e55d4d37c25335b578151b18f938f2 Author: Aaron Kimball Date: Fri Mar 12 14:57:32 2010 -0800 MAPREDUCE-923. Sqoop's ORM uses URLDecoder on a file, which replaces plus signs in a jar file name with spaces Description: In findThisJar, sqoop runs URLDecoder.decode on the resulting jar, which has the effect of replacing any + signs in the path with a space. This obviously breaks the classpath variable that it's trying to set, and the sqoop-generated code fails to compile. Ironically, Cloudera's hadoop distro is the one that puts + characters in jar files, and so exhibits the bug. Here is an example from running sqoop with log4j at debug level. Note the space in the very last term, which should read hadoop-0.20.0+61-sqoop.jar rather than hadoop-0.20.0 61-sqoop.jar.

09/08/27 18:00:07 DEBUG orm.CompilationManager: Invoking javac with args: -sourcepath ./ -d /tmp/sqoop/compile/ -classpath /usr/lib/hadoop-0.20/conf:/usr/java/jdk1.6.0_06/lib/tools.jar:/usr/lib/hadoop-0.20:/usr/lib/hadoop-0.20/hadoop-0.20.0+61-core.jar:/usr/lib/hadoop-0.20/lib/commons-cli-2.0-SNAPSHOT.jar:/usr/lib/hadoop-0.20/lib/commons-codec-1.3.jar:/usr/lib/hadoop-0.20/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20/lib/commons-httpclient-3.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-net-1.4.1.jar:/usr/lib/hadoop-0.20/lib/core-3.1.1.jar:/usr/lib/hadoop-0.20/lib/hadoop-0.20.0+61-fairscheduler.jar:/usr/lib/hadoop-0.20/lib/hadoop-0.20.0+61-scribe-log4j.jar:/usr/lib/hadoop-0.20/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20/lib/hsqldb.jar:/usr/lib/hadoop-0.20/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20/lib/jetty-6.1.14.jar:/usr/lib/hadoop-0.20/lib/jetty-util-6.1.14.jar:/usr/lib/hadoop-0.20/lib/junit-3.8.1.jar:/usr/lib/hadoop-0.20/lib/junit-4.5.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20/lib/libfb303.jar:/usr/lib/hadoop-0.20/lib/libthrift.jar:/usr/lib/hadoop-0.20/lib/log4j-1.2.15.jar:/usr/lib/hadoop-0.20/lib/mysql-connector-java-5.0.8-bin.jar:/usr/lib/hadoop-0.20/lib/oro-2.0.8.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hadoop-0.20/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop-0.20/lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop-0.20/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-2.1.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-api-2.1.jar:/usr/local/hadoop/lib/hadoop-gpl-compression.jar:/usr/lib/hadoop-0.20/hadoop-0.20.0+61-core.jar:/usr/lib/hadoop-0.20/contrib/sqoop/hadoop-0.20.0 61-sqoop.jar

Reason: Bugfix Author: Aaron Kimball Ref: UNKNOWN commit e97883c5b9c389f82a6447e4cb1678c0a0ed83ba Author: Aaron Kimball Date: Fri Mar 12 14:57:19 2010 -0800 CLOUDERA-BUILD. Sqoop asciidoc syntax error Author: Aaron Kimball commit 520bda2edcb90dfe9461e16b96aa4a048d33ed7b Author: Aaron Kimball Date: Fri Mar 12 14:57:11 2010 -0800 HADOOP-5450. Add support for application-specific typecodes to typed bytes Description: For serializing objects of types that are not supported by typed bytes serialization, applications might want to use a custom serialization format. Right now, typecode 0 has to be used for the bytes resulting from this custom serialization, which could lead to problems when deserializing the objects because the application cannot know if a byte sequence following typecode 0 is a customly serialized object or just a raw sequence of bytes. Therefore, a range of typecodes that are treated as aliases for 0 should be added, such that different typecodes can be used for application-specific purposes. Reason: New feature Author: Klaas Bosteels Ref: UNKNOWN commit b30fc99332c4a444d275731dac4b4245115d65b2 Author: Aaron Kimball Date: Fri Mar 12 14:56:59 2010 -0800 HADOOP-1722. Make streaming to handle non-utf8 byte array Description: Right now, the streaming framework expects the output sof the steam process (mapper or reducer) are line
oriented UTF-8 text. This limit makes it impossible to use those programs whose outputs may be non-UTF-8
(international encoding, or maybe even binary data). Streaming can overcome this limit by introducing a simple
encoding protocol. For example, it can allow the mapper/reducer to hexencode its keys/values,
the framework decodes them in the Java side.
This way, as long as the mapper/reducer executables follow this encoding protocol,
they can output arabitary bytearray and the streaming framework can handle them. Reason: New feature Author: Klaas Bosteels Ref: UNKNOWN commit 921c135653736bcc279700435358058762bc8f78 Author: Aaron Kimball Date: Fri Mar 12 14:56:43 2010 -0800 CLOUDERA-BUILD. More Sqoop documentation updates Author: Aaron Kimball commit be7f1dc031e17dc4f53ebe76d27c1b9242105785 Author: Aaron Kimball Date: Fri Mar 12 14:56:26 2010 -0800 MAPREDUCE-840. DBInputFormat leaves open transaction Description: (Reapplied after HADOOP-4687) Reason: MISSING: Reason for inclusion Author: Aaron Kimball Ref: UNKNOWN commit 89a96d8fff80ac809dbda9582044a7c6b3986d16 Author: Aaron Kimball Date: Fri Mar 12 14:56:07 2010 -0800 MAPREDUCE-906. Updated Sqoop documentation Description: Provides the latest documentation for Sqoop, in both user-guide and manpage form. Built with asciidoc. Reason: Documentation Author: Aaron Kimball Ref: UNKNOWN commit 51f867aea0667d0191b730ea3abf114e75cafa4b Author: Aaron Kimball Date: Fri Mar 12 14:55:54 2010 -0800 MAPREDUCE-907. Sqoop should use more intelligent splits Description: Sqoop should use the new split generation / InputFormat in MAPREDUCE-885 Reason: Performance / scalability improvement Author: Aaron Kimball Ref: UNKNOWN commit 239df04415dba8d12c7d3fbf33c580d473202e94 Author: Aaron Kimball Date: Fri Mar 12 14:55:28 2010 -0800 MAPREDUCE-885. More efficient SQL queries for DBInputFormat Description: DBInputFormat generates InputSplits by counting the available rows in a table, and selecting subsections of the table via the "LIMIT" and "OFFSET" SQL keywords. These are only meaningful in an ordered context, so the query also includes an "ORDER BY" clause on an index column. The resulting queries are often inefficient and require full table scans. Actually using multiple mappers with these queries can lead to O(n^2) behavior in the database, where n is the number of splits. Attempting to use parallelism with these queries is counter-productive.

A better mechanism is to organize splits based on data values themselves, which can be performed in the WHERE clause, allowing for index range scans of tables, and can better exploit parallelism in the database.

Reason: Performance and scalability improvement Author: Aaron Kimball Ref: UNKNOWN commit 23a0d1882c797160cc7b6fae99fc5e686aa30191 Author: Aaron Kimball Date: Fri Mar 12 14:55:16 2010 -0800 MAPREDUCE-938. Postgresql support for Sqoop Description: Sqoop should be able to import from postgresql databases. Reason: Compatability improvement Author: Aaron Kimball Ref: UNKNOWN commit 7b89feb34fafd2365f75ab744db9cb07a5443046 Author: Aaron Kimball Date: Fri Mar 12 14:55:05 2010 -0800 MAPREDUCE-876. Sqoop import of large tables can time out Description: Related to MAPREDUCE-875, Sqoop should use a background thread to ensure that progress is being reported while a database does external work for the MapReduce task. Reason: Scalability improvement Author: Aaron Kimball Ref: UNKNOWN commit 61d4ef5175dca1859a1320f9e7cad1caeab5d982 Author: Aaron Kimball Date: Fri Mar 12 14:54:49 2010 -0800 MAPREDUCE-918. Test hsqldb server should be memory-only. Description: Sqoop launches a standalone hsqldb server for unit tests, but it currently writes its database to disk and uses a connect string of //localhost. If multiple test instances are running concurrently, one test server may serve to the other instance of the unit tests, causing race conditions. Reason: Bugfix in test harness Author: Aaron Kimball Ref: UNKNOWN commit 1fc17ad34e8288b54503eeb15f788eb4e6a070dc Author: Aaron Kimball Date: Fri Mar 12 14:54:37 2010 -0800 MAPREDUCE-875. Make DBRecordReader execute queries lazily Description: DBInputFormat's DBRecordReader executes the user's SQL query in the constructor. If the query is long-running, this can cause task timeout. The user is unable to spawn a background thread (e.g., in a MapRunnable) to inform Hadoop of on-going progress. Reason: Scalability improvement Author: Aaron Kimball Ref: UNKNOWN commit 21fdb7a7fd501fd63e1a540c2b55cf410d057301 Author: Aaron Kimball Date: Fri Mar 12 14:54:27 2010 -0800 MAPREDUCE-825. JobClient completion poll interval of 5s causes slow tests in local mode Description: The JobClient.NetworkedJob.waitForCompletion() method polls for job completion every 5 seconds. When running a set of short tests in pseudo-distributed mode, this is unnecessarily slow and causes lots of wasted time. When bandwidth is not scarce, setting the poll interval to 100 ms results in a 4x speedup in some tests. This interval should be parametrized to allow users to control the interval for testing purposes. Reason: Test performance improvement Author: Aaron Kimball Ref: UNKNOWN commit f996b8a019bffefff183d7d688ccf95b8cb73de5 Author: Aaron Kimball Date: Fri Mar 12 14:54:15 2010 -0800 MAPREDUCE-750. Extensible ConnManager factory API Description: Sqoop uses the ConnFactory class to instantiate a ConnManager implementation based on the connect string and other arguments supplied by the user. This allows per-database logic to be encapsulated in different ConnManager instances, and dynamically chosen based on which database the user is actually importing from. But adding new ConnManager implementations requires modifying the source of a common ConnFactory class. An indirection layer should be used to delegate instantiation to a number of factory implementations which can be specified in the static configuration or at runtime. Reason: API flexibility improvement Author: Aaron Kimball Ref: UNKNOWN commit 39bdff7bd3b83359884c90ae857d3f3144a94803 Author: Aaron Kimball Date: Fri Mar 12 14:54:04 2010 -0800 MAPREDUCE-749. Make Sqoop unit tests more Hudson-friendly Description: Hudson servers (other than Apache's) need to be able to run the sqoop unit tests which depend on thirdparty JDBC drivers / database implementations. The build.xml needs some refactoring to make this happen. Reason: Test coverage improvement Author: Aaron Kimball Ref: UNKNOWN commit 0ca54f2722206685d9e36fcbb2656d0ac1957311 Author: Aaron Kimball Date: Fri Mar 12 14:53:47 2010 -0800 MAPREDUCE-792. javac warnings in DBInputFormat Description: MAPREDUCE-716 introduces javac warnings Reason: Technical debt Author: Aaron Kimball Ref: UNKNOWN commit e39ae9d017e89e4df193b1f8075184320230499b Author: Aaron Kimball Date: Fri Mar 12 14:52:45 2010 -0800 MAPREDUCE-716. org.apache.hadoop.mapred.lib.db.DBInputformat not working with oracle Description: Applied "trunk" version of the patch after incorporating HADOOP-4687's move of DBInputFormat-related files. (Prior patch was 0.20-branch specific) Reason: Branch compatibility improvement Author: Aaron Kimball Ref: UNKNOWN commit 074e824f5d3d2f6ab862083e6eb4b0df8c881bfc Author: Aaron Kimball Date: Fri Mar 12 14:52:27 2010 -0800 MAPREDUCE-910. MRUnit should support counters Description: incrCounter() is currently a dummy stub method in MRUnit that does nothing. Would be good for the mock reporter/context implementations to support counters. Reason: New feature Author: Aaron Kimball Ref: UNKNOWN commit b4b7c5d9b4cba84bc47f4a48074fd295d060ab35 Author: Aaron Kimball Date: Fri Mar 12 14:52:17 2010 -0800 MAPREDUCE-798. MRUnit should be able to test a succession of MapReduce passes Description: MRUnit can currently test that the inputs to a given (mapper, reducer) "job" produce certain outputs at the end of the reducer. It would be good to support more end-to-end tests of a series of MapReduce jobs that form a longer pipeline surrounding some data. Reason: New Feature Author: Aaron Kimball Ref: UNKNOWN commit 59677d22261974560117fa82e74d9a7f80f804d5 Author: Aaron Kimball Date: Fri Mar 12 14:52:06 2010 -0800 MAPREDUCE-800. MRUnit should support the new API Description: MRUnit's TestDriver implementations use the old org.apache.hadoop.mapred-based classes. TestDrivers and associated mock object implementations are required for org.apache.hadoop.mapreduce-based code. Reason: New feature (API Compatibility) Author: Aaron Kimball Ref: UNKNOWN commit 7fda23b419b1c98e84eea43a0f35191d41032e18 Author: Aaron Kimball Date: Fri Mar 12 14:51:53 2010 -0800 MAPREDUCE-799. Some of MRUnit's self-tests were not being run Description: Due to method naming issues, some test cases were not being executed. Reason: Bugfix; test coverage Author: Aaron Kimball Ref: UNKNOWN commit 20d5bf205e9f2864f3da53d30408ba97763a46e9 Author: Aaron Kimball Date: Fri Mar 12 14:51:40 2010 -0800 MAPREDUCE-797. MRUnit MapReduceDriver should support combiners Description: The MapReduceDriver allows you to specify a mapper and a reducer class with a simple sort/"shuffle" between the passes. It would be nice to also support another Reducer implementation being used as a combiner in the middle. Reason: New feature Author: Aaron Kimball Ref: UNKNOWN commit 5c873336b3380e6c8f07ca28230ede9d41e4e840 Author: Aaron Kimball Date: Fri Mar 12 14:50:05 2010 -0800 Integrate with 0.21-branch versions of DBInputFormat Description: In 0.21 there is now a DBInputFormat in the mapred/lib/ package as well as mapreduce/lib/db. This patch backports the new API edition of DBInputFormat to CDH Reason: Cross-branch compatibility improvement Author: Aaron Kimball Ref: UNKNOWN commit 51b650554e3bc8054e8ca966f5f552c522f7483d Author: Aaron Kimball Date: Fri Mar 12 14:49:52 2010 -0800 HADOOP-5170. Set max map/reduce tasks on a per-job basis, either per-node or cluster-wide Description: There are a number of use cases for being able to do this. The focus of this jira should be on finding what would be the simplest to implement that would satisfy the most use cases.

This could be implemented as either a per-node maximum or a cluster-wide maximum. It seems that for most uses, the former is preferable however either would fulfill the requirements of this jira.

Some of the reasons for allowing this feature (mine and from others on list):

  • I have some very large CPU-bound jobs. I am forced to keep the max map/node limit at 2 or 3 (on a 4 core node) so that I do not starve the Datanode and Regionserver. I have other jobs that are network latency bound and would like to be able to run high numbers of them concurrently on each node. Though I can thread some jobs, there are some use cases that are difficult to thread (scanning from hbase) and there's significant complexity added to the job rather than letting hadoop handle the concurrency.
  • Poor assignment of tasks to nodes creates some situations where you have multiple reducers on a single node but other nodes that received none. A limit of 1 reducer per node for that job would prevent that from happening. (only works with per-node limit)
  • Poor mans MR job virtualization. Since we can limit a jobs resources, this gives much more control in allocating and dividing up resources of a large cluster. (makes most sense w/ cluster-wide limit)
Reason: Configuration improvement Author: Matei Zaharia Ref: UNKNOWN commit 99e25a93542251debd248ed71cb380858ca8c9bd Author: Aaron Kimball Date: Fri Mar 12 14:49:40 2010 -0800 HADOOP-6166. Improve PureJavaCrc32 Description: Got some ideas to improve CRC32 calculation. Reason: Performance Improvement Author: Tsz Wo (Nicholas), SZE Ref: UNKNOWN commit 2d0a97cefa559ab9059d976bda66f9dbcf051e79 Author: Aaron Kimball Date: Fri Mar 12 14:49:28 2010 -0800 MAPREDUCE-782. Use PureJavaCrc32 in mapreduce spills Description: HADOOP-6148 implemented a Pure Java implementation of CRC32 which performs better than the built-in one. This issue is to make use of it in the mapred package Reason: Performance improvement Author: Todd Lipcon Ref: UNKNOWN commit bb65cb649c2924b5a20f06deb9ecd66fc219eeeb Author: Aaron Kimball Date: Fri Mar 12 14:49:12 2010 -0800 HDFS-496. Use PureJavaCrc32 in HDFS Description: Common now has a pure java CRC32 implementation which is more efficient than java.util.zip.CRC32. This issue is to make use of it. Reason: Performance improvement Author: Todd Lipcon Ref: UNKNOWN commit ac73e6d51d5ad1df993097349602e5f3199b952a Author: Aaron Kimball Date: Fri Mar 12 14:48:40 2010 -0800 HADOOP-6148. Implement a pure Java CRC32 calculator Description: We've seen a reducer writing 200MB to HDFS with replication = 1 spending a long time in crc calculation. In particular, it was spending 5 seconds in crc calculation out of a total of 6 for the write. I suspect that it is the java-jni border that is causing us grief. This outperforms java.util.zip.CRC32. Reason: Performance improvement Author: Scott Carey and Todd Lipcon Ref: UNKNOWN commit e7430c8cbd2d182716ac7efb08cb2187c1edab95 Author: Aaron Kimball Date: Fri Mar 12 14:48:08 2010 -0800 Updated Sqoop documentation for MAPREDUCE-816, MAPREDUCE-789. Reason: Documentation improvement Author: Aaron Kimball Ref: UNKNOWN commit aa75ab7f749604c354dcdb0b806aca9cd140f504 Author: Aaron Kimball Date: Fri Mar 12 14:47:58 2010 -0800 MAPREDUCE-789. Oracle support for Sqoop Description: A separate ConnManager is needed for Oracle to support its slightly different syntax and configuration Reason: Compatibility improvement Author: Aaron Kimball Ref: UNKNOWN commit 6f017db468a82e336a28f451c7d90bc225130094 Author: Aaron Kimball Date: Fri Mar 12 14:47:33 2010 -0800 MAPREDUCE-840. DBInputFormat leaves open transaction Description: DBInputFormat.getSplits() does not call connection.commit() after the COUNT query. This can leave an open transaction against the database which interferes with other connections to the same table. Reason: bugfix Author: Aaron Kimball Ref: UNKNOWN commit 84b622a5f6f5bd145f19f4c08b6263759ac51756 Author: Aaron Kimball Date: Fri Mar 12 14:47:15 2010 -0800 MAPREDUCE-816. Rename "local" mysql import to "direct" Description: A mysqldump-based fast path known as "local mode" is used in sqoop when users pass the argument -local. The restriction that this only import from localhost was based on an implementation technique that was later abandoned in favor of a more general one, which can support remote hosts as well. Thus, local is a poor name for the flag. -direct is more general and more descriptive. This should be used instead. Reason: Interface clarification Author: Aaron Kimball Ref: UNKNOWN commit ce75318a484615dc7b161a41710884f34db50c86 Author: Aaron Kimball Date: Fri Mar 12 14:46:34 2010 -0800 MAPREDUCE-716. org.apache.hadoop.mapred.lib.db.DBInputformat not working with oracle Description:

The out of the box implementation of the Hadoop is working properly with mysql/hsqldb, but NOT with oracle.
Reason is DBInputformat is implemented with mysql/hsqldb specific query constructs like "LIMIT", "OFFSET".

FIX:
building a database provider specific logic based on the database providername (which we can get using connection).

Reason: Compatibility improvement Author: Aaron Kimball Ref: UNKNOWN commit 338de775796c2102ce680eaa983b719b50e9f3ee Author: Aaron Kimball Date: Fri Mar 12 14:46:18 2010 -0800 HADOOP-5469. Exposing Hadoop metrics via HTTP Description: Implement a "/metrics" URL on the HTTP server of Hadoop daemons, to expose metrics data to users via their web browsers, in plain-text and JSON. Reason: New feature Author: Philip Zeyliger Ref: UNKNOWN commit cad421ec1c51382f81714ccafb96a6bb8bcc8aec Author: Aaron Kimball Date: Fri Mar 12 14:46:11 2010 -0800 HADOOP-5469. Exposing Hadoop metrics via HTTP Description: Implement a "/metrics" URL on the HTTP server of Hadoop daemons, to expose metrics data to users via their web browsers, in plain-text and JSON. Reason: MISSING: Reason for inclusion Author: Philip Zeyliger Ref: UNKNOWN commit 8b09839047997a4b5461703650b5779ec86c1844 Author: Aaron Kimball Date: Fri Mar 12 14:45:49 2010 -0800 CLOUDERA-BUILD. Added Sqoop documentation to installation script Author: Todd Lipcon commit 7e77c6b13f06dec9c742bf76c81e2ec02d81c7cb Author: Aaron Kimball Date: Fri Mar 12 14:45:35 2010 -0800 CLOUDERA-BUILD. Fix the hadoop/sqoop wrapper scripts Author: Matt Massie commit 0caaf80f3a569b91f482de0dcb87f826967f5c7c Author: Aaron Kimball Date: Fri Mar 12 14:45:16 2010 -0800 CLOUDERA-BUILD. Fix a bug in the hadoop/sqoop wrapper generation Author: Matt Massie Ref: UNKNOWN commit bd8ddae402a876fe78cbb1482362935780b57d84 Author: Aaron Kimball Date: Fri Mar 12 14:44:59 2010 -0800 CLOUDERA-BUILD. Update the install hadoop script Author: Matt Massie Ref: UNKNOWN commit 80cf01124877a5aebd742142b10fda45910f0328 Author: Aaron Kimball Date: Fri Mar 12 14:44:42 2010 -0800 CLOUDERA-BUILD. Rename the hadoop man page to be hadoop-0.20 Author: Matt Massie Ref: UNKNOWN commit 78cb9f21a3ddf04f8cef9e37a94f657448d0d111 Author: Aaron Kimball Date: Fri Mar 12 14:43:51 2010 -0800 HADOOP-5745. Allow setting the default value of maxRunningJobs for all pools Description: The <pool> element allows setting the maxRunningJobs for that pool. It wold be nice to be able to set a default value for all pools.

In out configuration, pools are autocreated.. every new uesre gets his own pool. We would like to allow each user to be able to run a max of 5 jobs at a time. For the etl pool, this limit will be set to a greater value,

Reason: Improved configuration flexibility Author: dhruba borthakur Ref: UNKNOWN commit 3c39e1fa8c3c89fc8f11f1faff46397fa82d5116 Author: Aaron Kimball Date: Fri Mar 12 14:43:13 2010 -0800 MAPREDUCE-906. Updated Sqoop documentation. Description: Update Sqoop documentation with user guide and manpage. Reason: Documentation improvement Author: Aaron Kimball Ref: UNKNOWN commit 79a2645bc81894331721ef94c255992075ccf195 Author: Aaron Kimball Date: Fri Mar 12 14:42:14 2010 -0800 CLOUDERA-BUILD. Added MySQL Connector/J library for Sqoop. Description: We can ship MySQL Connector/J with CDH because the licenses are compatible. However, the public Apache project will not include this library in their source repository due to stricter licensing concerns. Reason: Simplifies deployment of Sqoop for mysql users Author: Aaron Kimball Ref: UNKNOWN commit 4a097b35bf1264a0606f2ebe410c45f16f900f03 Author: Aaron Kimball Date: Fri Mar 12 14:42:05 2010 -0800 MAPREDUCE-705. User-configurable quote and delimiter characters for Sqoop records and record reparsing Description: Sqoop needs a mechanism for users to govern how fields are quoted and what delimiter characters separate fields and records. With delimiters providing an unambiguous format, a parse method can reconstitute the generated record data object from a text-based representation of the same record. Reason: New feature Author: Aaron Kimball Ref: UNKNOWN commit 58e23056af0e99ef611ac258719207cc9459a849 Author: Aaron Kimball Date: Fri Mar 12 14:41:47 2010 -0800 MAPREDUCE-710. Sqoop should read and transmit passwords in a more secure manner Description: Sqoop's current support for passwords involves reading passwords from the command line "--password foo", which makes the password visible to other users via 'ps'. An invisible-console approach should be taken.

Related, Sqoop transmits passwords to mysqldump in the same fashion, which is also insecure.

Reason: Security improvement Author: Aaron Kimball Ref: UNKNOWN commit a67a0f77729fb9005b0c47872d6ba677f6434b41 Author: Aaron Kimball Date: Fri Mar 12 14:41:34 2010 -0800 MAPREDUCE-713. Sqoop has some superfluous imports Description: Some classes have vestigial imports that should be removed Reason: Code cleanup Author: Aaron Kimball Ref: UNKNOWN commit 0a4dab2eac0ba8b6da5190bc53a9ce8e4344a336 Author: Aaron Kimball Date: Fri Mar 12 14:41:01 2010 -0800 MAPREDUCE-685. Sqoop will fail with OutOfMemory on large tables using mysql Description: The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client. Reason: bugfix / scalability improvement Author: Aaron Kimball Ref: UNKNOWN commit 499aa76b500136a0e8996898f468b088ca5d7ed3 Author: Aaron Kimball Date: Fri Mar 12 14:40:50 2010 -0800 MAPREDUCE-674. Sqoop should allow a "where" clause to avoid having to export entire tables Description: Sqoop currently only exports at the granularity of a table. This doesn't work well on systems with large tables, where the overhead of performing a full dump each time is significant. Allowing the user to specify a where clause is a relatively simple task which will give Sqoop a lot more flexibility. Reason: New feature Author: Kevin Weil Ref: UNKNOWN commit ed4ba254d7708f363f5f1b4708e9e35061ad936c Author: Aaron Kimball Date: Fri Mar 12 14:40:37 2010 -0800 MAPREDUCE-675. Sqoop should allow user-defined class and package names Description: Currently Sqoop generates a class for each table to be imported; the class names are equal to the table names and they are not part of any package.

This adds --class-name and --package-name parameters to Sqoop, allowing these aspects of code generation to be controlled.

Reason: New feature Author: Aaron Kimball Ref: UNKNOWN commit 16e0ca8119b99b244c9eeafd78bb9eb43e4ba639 Author: Aaron Kimball Date: Fri Mar 12 14:40:20 2010 -0800 MAPREDUCE-703. Sqoop requires dependency on hsqldb in ivy Description: Sqoop builds crash without explicit dependency on hsqldb. Reason: build system bugfix Author: Aaron Kimball Ref: UNKNOWN commit b8e54791e990328db983f070e9a04952301eda35 Author: Aaron Kimball Date: Fri Mar 12 14:40:04 2010 -0800 MAPREDUCE-692. Make Hudson run Sqoop unit tests Description: Running 'ant test-contrib' didn't test Sqoop because it wasn't explicitly listed in the build.xml file in src/contrib/ Reason: Test coverage Author: Aaron Kimball Ref: UNKNOWN commit 8a3b6472ae00542dadf7f7d60991ec0f21b38177 Author: Aaron Kimball Date: Fri Mar 12 14:39:40 2010 -0800 HADOOP-5968. Sqoop should only print a warning about mysql import speed once Description: After HADOOP-5844, Sqoop can use mysqldump as an alternative to JDBC for importing from MySQL. If you use the JDBC mechanism, it prints a warning if you could have enabled the mysqldump path instead. But the warning is printed multiple times (every time the LocalMySQLManager is instantiated), and also when the MySQL manager is used for informational queries (e.g., listing tables) rather than true imports.

It should only emit the warning once per session, and only then if it's actually doing an import.

Reason: User experience improvement Author: Aaron Kimball Ref: UNKNOWN commit 86211e3714dc5b1dbcb7a3c328336277f6657de7 Author: Aaron Kimball Date: Fri Mar 12 14:38:44 2010 -0800 HADOOP-5967. Sqoop should only use a single map task Description: The current DBInputFormat implementation uses SELECT ... LIMIT ... OFFSET statements to read from a database table. This actually results in several queries all accessing the same table at the same time. Most database implementations will actually use a full table scan for each such query, starting at row 1 and scanning down until the OFFSET is reached before emitting data to the client. The upshot of this is that we see O(n^2) performance in the size of the table when using a large number of mappers, when a single mapper would read through the table in O(n) time in the number of rows.

This patch sets the number of map tasks to 1 in the MapReduce job sqoop launches.

Reason: Performance improvement Author: Aaron Kimball Ref: UNKNOWN commit 410db7130a8e85ceed46850f73e74f480d45994e Author: Aaron Kimball Date: Thu Jul 23 16:10:21 2009 -0700 HADOOP-5967: Sqoop should only use a single map task commit b8f5d1d3a30a7461936f3f92bd9f007ed2db43e8 Author: Aaron Kimball Date: Fri Mar 12 14:38:23 2010 -0800 HADOOP-5887. Sqoop should create tables in Hive metastore after importing to HDFS Description: Sqoop (HADOOP-5815) imports tables into HDFS; it is a straightforward enhancement to then generate a Hive DDL statement to recreate the table definition in the Hive metastore and move the imported table into the Hive warehouse directory from its upload target.

This feature enhancement makes this process automatic. An import is performed with sqoop in the usual way; providing the argument "--hive-import" will cause it to then issue a CREATE TABLE .. LOAD DATA INTO statement to a Hive shell. It generates a script file and then attempts to run "$HIVE_HOME/bin/hive" on it, or failing that, any "hive" on the $PATH; $HIVE_HOME can be overridden with --hive-home. As a result, no direct linking against Hive is necessary.

The unit tests provided with this enhancement use a mock implementation of 'bin/hive' that compares the script it's fed with one from a directory full of "expected" scripts. The exact script file referenced is controlled via an environment variable. It doesn't actually load into a proper Hive metastore, but manual testing has shown that this process works in practice, so the mock implementation is a reasonable unit testing tool.

Reason: New feature Author: Aaron Kimball Ref: UNKNOWN commit 50993494fdc7b2284837562b500e2840106bb3bb Author: Aaron Kimball Date: Fri Mar 12 14:37:48 2010 -0800 CLOUDERA-BUILD. Address issue where docs were not properly copied through to release tarball Description: This was caused by some cleanup in build.xml early on in the CDH 0.20 branch Reason: bugfix Author: Todd Lipcon Ref: UNKNOWN commit 3ecb9c07279302d18f7367d49bcd98c4391cbb68 Author: Aaron Kimball Date: Fri Mar 12 14:37:27 2010 -0800 CLOUDERA-BUILD. Decrease build time by only rebuilding the native code for each platform Reason: build system improvement Author: Todd Lipcon Ref: UNKNOWN commit f0c6a810ba7237ec7cc570ecad8a8665768b3d06 Author: Aaron Kimball Date: Fri Mar 12 14:37:07 2010 -0800 CLOUDERA-BUILD. Run jdiff against vanilla Hadoop during Cloudera release build Author: Todd Lipcon Ref: UNKNOWN commit 9cf8f0cb6ed744439d8e90e3ba376edb5d9521f3 Author: Aaron Kimball Date: Fri Mar 12 14:36:22 2010 -0800 MAPREDUCE-415. JobControl Job does always has an unassigned name Description: When creating and adding org.apache.hadoop.mapred.jobcontrol.Job(s) they don't use the names specified in their respective JobConf files. Instead it's just hardcoded to "unassigned". Reason: bugfix Author: Xavier Stevens Ref: UNKNOWN commit 330f009bae260ac990426a988fc56913897a50ca Author: Aaron Kimball Date: Fri Mar 12 14:35:03 2010 -0800 HADOOP-5805. problem using top level s3 buckets as input/output directories Description: When I specify top level s3 buckets as input or output directories, I get the following exception.

hadoop jar subject-map-reduce.jar s3n://infocloud-input s3n://infocloud-output

java.lang.IllegalArgumentException: Path must be absolute: s3n://infocloud-output
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.pathToKey(NativeS3FileSystem.java:246)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:319)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:109)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:738)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.run(SubjectMRDriver.java:63)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.main(SubjectMRDriver.java:25)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)

The workaround is to specify input/output buckets with sub-directories:

hadoop jar subject-map-reduce.jar s3n://infocloud-input/input-subdir s3n://infocloud-output/output-subdir

Reason: bugfix Author: Ian Nowland Ref: UNKNOWN commit 35fa82b5c743e34d62449e0f4abffd885e0dfe4c Author: Aaron Kimball Date: Fri Mar 12 14:34:42 2010 -0800 HADOOP-5656. Counter for S3N Read Bytes does not work Description: Counter for S3N Read Bytes does not work on trunk. On 0.18 branch neither read nor write byte counters work. Reason: Bugfix Author: Ian Nowland Ref: UNKNOWN commit a6670de0a1c4b03c293ae47d1595e8c33764aaa5 Author: Aaron Kimball Date: Fri Mar 12 14:33:43 2010 -0800 HADOOP-5613. change S3Exception to checked exception Description: Currently the S3 filesystems can throw unchecked exceptions (S3Exception) which are not declared in the interface of FileSystem. These aren't caught by the various callers and can cause unpredictable behavior. IOExceptions are caught by most users of FileSystem since it is declared in the interface and hence is handled better. S3Exception now extends IOException. Reason: Improved error-checking at compile time for user applications. Author: Andrew Hitchcock Ref: UNKNOWN commit 1f11b63a42ae441eb8d0693ed0e4e01aca553e42 Author: Aaron Kimball Date: Fri Mar 12 14:33:09 2010 -0800 HADOOP-5528. Binary partitioner Description: It would be useful to have a BinaryPartitioner that partitions BinaryComparable keys by hashing a configurable part of the bytes array corresponding to each key. Reason: New feature Author: Klaas Bosteels Ref: UNKNOWN commit 716d3598e5a4a18cdfcfcf0dc800e263ef7c7685 Author: Aaron Kimball Date: Fri Mar 12 14:32:47 2010 -0800 HADOOP-5240. 'ant javadoc' does not check whether outputs are up to date and always rebuilds Description: Running 'ant javadoc' twice in a row calls the javadoc program both times; it doesn't check to see whether this is redundant work. Reason: Build system improvement Author: Aaron Kimball Ref: UNKNOWN commit 2bb607d29d9080a7ca3bce72739ccef654d5392d Author: Aaron Kimball Date: Fri Mar 12 14:30:46 2010 -0800 HADOOP-5175. Option to prohibit jars unpacking Description: The task tracker moves all unpacked jars into ${hadoop.tmp.dir}/mapred/local/taskTracker. When using a lot of external libraries via -libjars, this results in several thousand unpacked files. The amount of time needed to `du` these directories can increase to the point where tasks time out before starting. This patch provides an option to suppress jar unpacking. Reason: Scalability improvement Author: Todd Lipcon Ref: UNKNOWN commit 349281bfa0243f5adbbd459266f4a9ac7ac8c1cc Author: Aaron Kimball Date: Fri Mar 12 14:30:16 2010 -0800 CLOUDERA-BUILD. Fix scribe-log4j's ivy.xml to properly get log4j on the compile classpath Author: Todd Lipcon Reason: bugfix to build system Ref: UNKNOWN commit b07aec5129e618bfeda8ba753fb5138e612b1a8b Author: Aaron Kimball Date: Fri Mar 12 14:29:33 2010 -0800 HADOOP-4829. Allow FileSystem shutdown hook to be disabled Description: FileSystem sets a JVM shutdown hook so that it can clean up the FileSystem cache. This is great behavior when you are writing a client application, but when you're writing a server application, like the Collector or an HBase RegionServer, you need to control the shutdown of the application and HDFS much more closely. If you set your own shutdown hook, there's no guarantee that your hook will run before the HDFS one, preventing you from taking some shutdown actions. Reason: Integration improvement. Author: Todd Lipcon Ref: UNKNOWN commit 154c6a6474b02e68c3418fddf9a8ee5d476a8b7d Author: Aaron Kimball Date: Fri Mar 12 14:28:14 2010 -0800 HADOOP-3327. Shuffling fetchers waited too long between map output fetch re-tries Description: Improves handling of READ_TIMEOUT during map output copying. Author: Amareshwari Sriramadasu Reason: bugfix Ref: UNKNOWN commit 8a6293fc5c3733035dde8e4d3a68c414a1f800f8 Author: Devaraj Das Date: Thu Feb 5 05:35:09 2009 +0000 HADOOP-3327. Improves handling of READ_TIMEOUT during map output copying. Contributed by Amareshwari Sriramadasu. git-svn-id: https://svn.apache.org/repos/asf/hadoop/core/trunk@741009 13f79535-47bb-0310-9956-ffa450edef68 commit 4ee0ecf4760d7adb3e1a81e018a3b5cd6d2e9775 Author: Aaron Kimball Date: Fri Mar 12 14:27:44 2010 -0800 MAPREDUCE-680. Reuse of Writable objects is improperly handled by MRUnit Description: As written, MRUnit's MockOutputCollector simply stores references to the objects passed in to its collect() method. Thus if the same Text (or other Writable) object is reused as an output containiner multiple times with different values, these separate values will not all be collected. MockOutputCollector needs to properly use io.serializations to deep copy the objects sent in. Reason: Bugfix; see description. Author: Aaron Kimball Ref: UNKNOWN commit 51bdfdcf947bc8447aa36d68ae802f154516b0b6 Author: Aaron Kimball Date: Wed Jul 15 10:40:47 2009 -0700 MAPREDUCE-680. Reuse of Writable objects is improperly handled by MRUnit. commit c2026460d4cf7049c67da65d3a2db2e9bcd9c848 Author: Aaron Kimball Date: Fri Mar 12 14:27:14 2010 -0800 HADOOP-5518. MRUnit unit test library Description: MRUnit is a tool to help authors of MapReduce programs write unit tests. Testing map() and reduce() methods requires some repeated work to mock the inputs and outputs of a Mapper or Reducer class, and ensure that the correct values are emitted to the OutputCollector based on inputs. Also, testing a mapper and reducer together requires running them with the sorted ordering guarantees made by the shuffle process. This library provides the above functionality to authors of maps and reduces; it allows you to test maps, reduces, and map-reduce pairs without needing to perform all the setup and teardown work associated with running a job. Reason: New feature Author: Aaron Kimball Ref: UNKNOWN commit 6991a0eb635953bf3729bce330c426ed7d8b996a Author: Aaron Kimball Date: Fri Mar 12 14:26:29 2010 -0800 CLOUDERA-BUILD. Add sqoop wrapper to bin Description: Adds a '/usr/bin/sqoop' wrapper script for users Reason: User-experience improvement Author: Aaron Kimball Ref: UNKNOWN commit c365162d7db1ee70c8607ad84a11e4aa594224e7 Author: Aaron Kimball Date: Fri Mar 12 14:25:56 2010 -0800 HADOOP-5844. Use mysqldump when connecting to local mysql instance in Sqoop Description: Sqoop uses MapReduce + DBInputFormat to read the contents of a table into HDFS. On many databases, this implementation is O(N^2) in the number of rows. Also, the use of multiple mappers has low value in terms of throughput, because the database itself is inherently singlethreaded. While DBInputFormat/JDBC provides a useful fallback mechanism for importing from databases, db-specific dump utilities will nearly always provide faster throughput, and should be selected when available. This patch allows users to use mysqldump to read from local mysql instances instead of the MapReduce-based input. Reason: Performance improvement Author: Aaron Kimball Ref: UNKNOWN commit eddbfbca420bfb81a3a565e4324f6189bfd97e41 Author: Aaron Kimball Date: Fri Mar 12 14:24:58 2010 -0800 HADOOP-5815. Sqoop: A database import tool for Hadoop Description: Sqoop is a tool designed to help users import existing relational databases into their Hadoop clusters. Sqoop uses JDBC to connect to a database, examine the schema for tables, and auto-generate the necessary classes to import data into HDFS. It then instantiates a MapReduce job to read the table from the database via the DBInputFormat (JDBC-based InputFormat). The table is read into a set of files loaded into HDFS. Both SequenceFile and text-based targets are supported. Reason: New feature Author: Aaron Kimball Ref: UNKNOWN commit b33265ff77c71af61899a4b3add1e82cc195fdb7 Author: Aaron Kimball Date: Fri Mar 12 14:23:53 2010 -0800 MAPREDUCE-714. JobConf.findContainingJar unescapes unnecessarily on Linux Description: In JobConf.findContainingJar, the path name is decoded using URLDecoder.decode(...). This was done by Doug in r381794 (commit msg "Un-escape containing jar's path, which is URL-encoded. This fixes things primarily on Windows, where paths are likely to contain spaces.") Unfortunately, jar paths do not appear to be URL encoded on Linux. If you try to use "hadoop jar" on a jar with a "+" in it, this function decodes it to a space and then the job cannot be submitted. Reason: Cloudera-based packages include a '+' in the filename; Hadoop's URL escaper will not properly handle jar filenames with a '+' without this patch. Author: Todd Lipcon Ref: UNKNOWN commit d9767d2cefab288e581732f71779f3ce8e3267e4 Author: Todd Lipcon Date: Mon Jul 6 19:36:11 2009 -0700 MAPREDUCE-714: Fix JobConf.findContainingJars to work with jars with + in the name commit aaeb69f8dda72a2e7aecacd622e99c00bc961efa Author: Aaron Kimball Date: Fri Mar 12 14:23:23 2010 -0800 CLOUDERA-BUILD. Add dependency libraries for Scribe/log4j Author: Todd Lipcon commit cb7a3677942c1d2f9e0d2a75dbffa09fa6125e61 Author: Aaron Kimball Date: Fri Mar 12 14:22:41 2010 -0800 CLOUDERA-BUILD. Apply Scribe patches to Hadoop Description: scribe_hadoop_trunk.patch Also, add empty ivy infrastructure for scribe-log4j Author: Todd Lipcon commit d5ead434b221076fb830308d2d112d53aa6dc59f Author: Aaron Kimball Date: Fri Mar 12 14:22:26 2010 -0800 CLOUDERA-BUILD. Use cloudera's versioning info from cloudera.hash in saveVersion.sh Description: This should make the "hadoop version" output far more useful for determing exactly what code is running. The cloudera.hash property is set by cloudera/build.properties which is generated during the build process. commit bf10e46e425395145dcc4b85db66d45cbf9797b0 Author: Aaron Kimball Date: Fri Mar 12 14:21:45 2010 -0800 CLOUDERA-BUILD. Move saveVersion.sh in build.xml to ensure build Description: This error is due to ant 1.7.1 not compiling package-info.java if the timestamp of the output class directory is newer than the package-info file itself. Since other compiles were happening after package-info.java was generated, the build dir was newer and compilation was being skipped. Move cloudera hooks inside the package task of build.xml Fixes an issue where the fair scheduler jar was not built before the hooks were run, and therefore was not included in the target lib/ directory. Ref: CLOUDERA-436 commit 5359a3bbd2b09644825be99fdd354ff3276a5d59 Author: Aaron Kimball Date: Fri Mar 12 14:21:36 2010 -0800 CLOUDERA-BUILD. New versions of cloudera packaging scripts commit ee255f3909b9938b1023be6a2c59a8429227c766 Author: Aaron Kimball Date: Fri Mar 12 14:21:27 2010 -0800 CLOUDERA-BUILD. Change paths to point to hadoop-0.20 where necessary commit a2d051bcf456fde45c0a0c3aa512872ce6059a97 Author: Aaron Kimball Date: Fri Mar 12 14:21:08 2010 -0800 CLOUDERA-BUILD. Add Hadoop manpage to Hadoop 0.20 repository commit 9600765ec5d6c3cef9ab34ecb573cbb876acf7ee Author: Aaron Kimball Date: Fri Mar 12 14:21:01 2010 -0800 CLOUDERA-BUILD. Move install_hadoop.sh into hadoop repo commit 77ac6923ad6e63874a429e7dd13c4a084b6a9556 Author: Aaron Kimball Date: Fri Mar 12 14:20:52 2010 -0800 CLOUDERA-BUILD. Add example-confs directory for storing configuration of conf.pseudo commit 14256386d4cb155fea0f5745dd6c49fba74ff40f Author: Aaron Kimball Date: Fri Mar 12 14:20:43 2010 -0800 CLOUDERA-BUILD. Replace hadoop-config.sh with Cloudera version commit f7d0a20e0d74f1aac1fb96f3c08ce31e9b9ca5d9 Author: Aaron Kimball Date: Fri Mar 12 14:20:25 2010 -0800 CLOUDERA-BUILD. Remove redundant code in build.xml between package and bin-package commit 0fa65091ecd9dd150d6afb93845d3fb10d80e115 Author: Aaron Kimball Date: Fri Mar 12 14:16:59 2010 -0800 CLOUDERA-BUILD. Hook build.xml to enable contrib modules