CDH 5.3.3 Release Notes
The following lists all Kite Software Development Kit Jiras included in CDH 5.3.3
that are not included in the Kite Software Development Kit base version 0.15.0. The
kite-0.15.0-cdh5.3.3.CHANGES.txt
file lists all changes included in CDH 5.3.3. The patch for each
change can be found in the cloudera/patches directory in the release tarball.
Changes Not In Kite Software Development Kit 0.15.0
Parquet
Bug
- [PARQUET-145] - InternalParquetRecordReader.close() should not throw an exception if initialization has failed
Kite
Bug
- [CDK-874] - Kite CLI csv-import HDFS temp file path not multiuser safe
- [CDK-750] - Fix unit tests when using JDK8
- [CDK-676] - TestURIBuilder fails on java8
- [CDK-742] - kite-dataset csv-import|copy|transform don't work on CDH5.2
- [CDK-694] - Log4jAppender doesn't handle a null namespace correctly
- [CDK-693] - DatasetDescriptor incorrectly qualifies file:/ URIs used to open schema files
- [CDK-652] - DatasetKeyOutputFormat: getLocation() will be override unexpectedly
- [CDK-686] - Include fix for AVRO-1589
- [CDK-664] - View URI is invalid if constraint value contains reserved/encoded characters
- [CDK-684] - kite-dataset help doesn't print the actual command used
- [CDK-640] - Log deprecation warning if dataset name contains "."
- [CDK-633] - Allow external Hive URIs to use any dataset name or namespace
- [CDK-666] - Hive tests sometimes fail with ConnectionExceptions
- [CDK-600] - Sporadic DatasetNotFoundException
- [CDK-556] - Reader schemas derived from reflect types should allow nulls
- [CDK-667] - HDFS configuration is not initialized
- [CDK-657] - Schema URI call to makeQualified incorrectly copies authority
- [CDK-670] - Rename dataset CLI tool to "kite-dataset"
- [CDK-665] - Moving URIBuilder breaks downstream Flume
- [CDK-663] - Old serde Parquet tables are no longer recognized by Kite
- [CDK-655] - Support webhdfs dataset URIs
- [CDK-659] - Logic for allow local metastore is wrong
- [CDK-639] - Impala requires fully-qualified FS URIs for Avro tables
- [CDK-651] - Hive implementation should fail by default when it connects to a local metastore.
- [CDK-518] - Consolidate and replace use of PropertyDescriptor
- [CDK-583] - Refining a view that includes an identity-based partition incorrectly filters all results
- [CDK-599] - Fix missing @since tags
- [CDK-591] - parquet.hive.serde.ParquetHiveSerDe is deprecated
- [CDK-579] - Update maven plugin to use Dataset URIs
- [CDK-577] - Optionally use Hive's native Parquet support
- [CDK-576] - Crunch HCatalog tests fail when run against Hive 0.13
- [CDK-572] - MapReduce should not write to a non-empty dataset or view
- [CDK-571] - Check URI#relativize() for empty URIs before constructing Path objects
- [CDK-560] - Default javaVersion should be 1.7
- [CDK-557] - Kite tools has the wrong variable names for some plugin versions
- [CDK-473] - csv-schema: Illegal character in header caught by Avro with unhelpful message before it reaches Kite
- [CDK-550] - Kite does not compile against Hive 0.13.0
Epic
- [CDK-435] - Add support for partitioning by sub-field
- [CDK-598] - Remove deprecated methods from FieldPartitioner
- [CDK-539] - Convert demo example to use views
- [CDK-549] - Create a Kite-based ETL tool
- [CDK-347] - Implement DatasetTarget#handleExisting and WriteMode support
- [CDK-515] - Expose Kite datasets as Spark RDDs
Improvement
- [CDK-770] - Improve resource cleanup for clients of SolrLocator
- [CDK-741] - Enable Morphline job drivers to send a commit to Solr on Job success
- [CDK-734] - Upgrade maxmind-db from version 0.3.3 to 1.0.0
- [CDK-644] - Move URIBuilder to the API
- [CDK-575] - Update examples to use CDH5
- [CDK-637] - Add a descriptor property for HBase replication scope
- [CDK-636] - Enable setting descriptor properties in CLI create command
- [CDK-140] - Add namespaces
- [CDK-526] - Include CLI version information in --help
- [CDK-566] - Add dataset info command
- [CDK-501] - View URIs
- [CDK-489] - Move DatasetRepository to SPI
New Feature
- [CDK-803] - Add ability to register custom extension functions with xquery and xslt morphline commands
- [CDK-733] - Add morphline support for deleting documents stored in Solr by unique id and by query
- [CDK-578] - Add a generic settings Map<String,Object> to the MorphlineContext
- [CDK-672] - Add morphline command that removes all record field values for which the field name and value matches a blacklist but not a whitelist.
- [CDK-299] - Add ability to set compression codec
- [CDK-203] - Support sync in DatasetWriter
Task
- [CDK-713] - Don't use DurableParquetAppender by default
- [CDK-602] - Remove deprecated methods for 0.17.0
- [CDK-531] - Remove classes and methods deprecated in 0.15.0
- [CDK-452] - Rename kite-data-hcatalog to kite-data-hive
- [CDK-653] - Add DefaultConfiguration to avoid new Configuration calls
- [CDK-580] - Expose partition methods on CrunchDatasets
- [CDK-479] - Add a CDH 5 kite-app-parent POM
- [CDK-565] - Expose parquet writer options as properties
- [CDK-551] - Use Hive's public API for metastore operations
- [CDK-552] - Move the contributor guide to the wiki and expand it
- [CDK-519] - Switch Travis builds to Oracle JDK7