CDH 5.13.0 Release Notes
The following lists all Apache Crunch Jiras included in CDH 5.13.0
that are not included in the Apache Crunch base version 0.11.0. The
crunch-0.11.0-cdh5.13.0.CHANGES.txt
file lists all changes included in CDH 5.13.0. The patch for each
change can be found in the cloudera/patches directory in the release tarball.
Changes Not In Apache Crunch 0.11.0
Crunch
Bug
- [CRUNCH-616] - Test file maugham.txt still copyrighted outside the US?
- [CRUNCH-592] - Job fails for null ByteBuffer value in Avro tables
- [CRUNCH-583] - Scrunch classloader failure in distcache
- [CRUNCH-577] - NumberFormatException when parsing dfs.block.size
- [CRUNCH-568] - Aggregators fail on SparkPipeline
- [CRUNCH-567] - close() triggers NPE if initialize() fails in 2 classes
- [CRUNCH-475] - Compilation problem caused by KeyValue -> Cell conversion
- [CRUNCH-524] - Tests leave hbase, test.h2.db files around; pipeline leaves temp crunch-xxx dirs around.
- [CRUNCH-571] - Scrunch functions fail serialization check in the REPL
- [CRUNCH-561] - Scrunch case classes fail in the REPL
- [CRUNCH-553] - From.formattedFile may cause records to be dropped.
- [CRUNCH-552] - Enable AvroParquet to work with Crunch-on-Spark
- [CRUNCH-551] - Make the use of Configuration objects inside of CrunchRecordReader/CrunchInputSplit consistent
- [CRUNCH-548] - getDetachedValue calls to AvroReflectDeepCopier throw InstantiationException on non-concrete types
- [CRUNCH-547] - AvroType operators can create nested unions
- [CRUNCH-540] - AvroReflectDeepCopier not serializable (but crunch is trying!)
- [CRUNCH-539] - Use of TupleWritable.setConf fails in mapper/reducer
- [CRUNCH-536] - crunch jobs fail to use hbase api of secured hbase
- [CRUNCH-535] - crunch-hbase doesn't work with secured hbase (kerberos)
- [CRUNCH-531] - Fix small bug in split graph writer
- [CRUNCH-528] - Pair: Integer overflow during comparison can cause inconsistent sort.
- [CRUNCH-527] - Improve distribution of keys when using default (hash-based) partitioning
- [CRUNCH-525] - The ExtractKeyFn is has an incorrect scale factor
- [CRUNCH-530] - Fix object reuse bug in GenericRecordToTuple
- [CRUNCH-509] - Crunch with Spark doesn't name all outputs
- [CRUNCH-517] - Make FileSourceImpl implement ReadableSource
- [CRUNCH-518] - Can't build crunch-spark due to protobuf dependency
- [CRUNCH-513] - HFileSource not calculating size correctly for nested pathes
- [CRUNCH-516] - Scrunch needs some additional null checks
- [CRUNCH-514] - AvroDerivedDeepCopier should initialize delegate MapFns
- [CRUNCH-511] - Scrunch product type support should use derived() instead of derivedImmutable()
- [CRUNCH-503] - Behavior of MAX_N Aggregator for duplicate values is counter-intuitive
- [CRUNCH-502] - OutputFormat has inconsistent context state in interface functions
- [CRUNCH-494] - Unable to union large number of PCollections
- [CRUNCH-501] - Object reuse issue in combineValues(Aggregator)
- [CRUNCH-481] - Support independent output committers for multiple outputs
- [CRUNCH-499] - DoFns.detach(...) does not propogate context to wrapped DoFn
- [CRUNCH-498] - Remove the deprecated RedundantThrows from checkstyle.xml
- [CRUNCH-495] - Fix case class/SpecificRecord interactions in Scrunch
- [CRUNCH-493] - Throw RuntimeException when reading MaterializableIterable after pipeline failure
- [CRUNCH-492] - Add create methods to Scrunch Pipeline APIs
- [CRUNCH-489] - Add methods to create PCollections from Java Iterable to Pipeline interface
- [CRUNCH-490] - Prefer mapreduce.framework.name for determining which version of MR is running
- [CRUNCH-486] - Join with custom Writable PType registered using Writables.registerComparable NPEs during shuffle
- [CRUNCH-485] - groupByKey on Spark incorrect if key is Avro record with defined sort order
- [CRUNCH-483] - Scrunch .map does not allow mapping to a PCollection[(A,B)]
- [CRUNCH-227] - Write to sequence file ignores destination path.
- [CRUNCH-480] - AvroParquetFileSource doesn't properly configure user-supplied read schema
- [CRUNCH-479] - Writing to target with WriteMode.APPEND merges values into PCollection
- [CRUNCH-477] - Fix HFileTargetIT failures on hadoop1 under Java 1.7/1.8
- [CRUNCH-473] - Use specific class type for case class serialization
- [CRUNCH-471] - Add synchronization checks to DoFnIterator
- [CRUNCH-469] - ClassCastException in crunch-spark when reading InputTables
Improvement
- [CRUNCH-515] - Decrease probability of collision on Crunch temp directories
- [CRUNCH-562] - Support one output file per key for Parquet
- [CRUNCH-558] - Add name to Spark Accumulators
- [CRUNCH-578] - Extend Scala Serialization to Support Mutable Variants of List and Set
- [CRUNCH-574] - Update commons-lang from both 2.4/2.5 to 2.6
- [CRUNCH-546] - Avoid CellUtil.cloneXXX in HFileUtils and HFileInputFormat
- [CRUNCH-544] - PTable.materializeToMap returns non-Serializable object
- [CRUNCH-543] - AvroPathPerKeyTarget copy nested subdirectories
- [CRUNCH-542] - Wider tolerance for flaky scrunch PCollectionTest
- [CRUNCH-507] - Potential NPE in SparkPipeline constructor and additional constructor
- [CRUNCH-508] - Improve performance of Scala Enumeration counters in Scrunch
- [CRUNCH-438] - Visualizations of some important internal/intermediate pipeline planning states
- [CRUNCH-484] - Add library features from spotify/crunch-lib into crunch-core
- [CRUNCH-472] - Add Scrunch serialization support for Java Enums
New Feature
- [CRUNCH-636] - Make replication factor for temporary files configurable
- [CRUNCH-497] - Add union methods to Scrunch's PipelineLike