commit 63a854a670e7657b40da4b4ec9cef522cae23224 Author: Jenkins slave Date: Mon Sep 15 08:21:09 2014 -0700 Preparing for CDH5.1.3 release commit 42de2426e25899a198f9c3dd66ee148e5437944f Author: Jenkins slave Date: Tue Aug 26 16:49:19 2014 -0700 Preparing for CDH5.1.3 development commit 59c80b392ac9c4e8dc32714698a8734fc59e4a57 Author: Jenkins slave Date: Mon Aug 25 10:05:26 2014 -0700 Prepare for CDH5.1.2 release commit fab00423c736bb84bf5c23d8e1fd39e8e2d8d2b8 Author: Sandy Ryza Date: Wed Jul 23 14:29:35 2014 +0100 PARQUET-25. Pushdown predicates only work with hardcoded arguments. Pull request for Sandy Ryza's fix for PARQUET-25. Author: Sandy Ryza Closes #22 from tomwhite/PARQUET-25-unbound-record-filter-configurable and squashes the following commits: a9d3fdc [Sandy Ryza] PARQUET-25. Pushdown predicates only work with hardcoded arguments. commit 7d0153d746472c8858505bd5f894254bc02ad255 Author: Ryan Blue Date: Fri Aug 1 20:56:20 2014 -0700 PARQUET-62: Fix binary dictionary write bug. The binary dictionary writers keep track of written values in memory to deduplicate and write dictionary pages periodically. If the written values are changed by the caller, then this corrupts the dictionary without an error message. This adds a defensive copy to fix the problem. commit 3955a6b2e7472a99a03d4727c09559fadce3eb84 Author: Ryan Blue Date: Fri Jul 18 16:19:25 2014 -0700 PARQUET-18: Fix all-null value pages with dict encoding. TestDictionary#testZeroValues demonstrates the problem, where a page of all null values is decoded using the DicitonaryValuesReader. Because there are no non-null values, the page values section is 0 byte, but the DictionaryValuesReader assumes there is at least one encoded value and attempts to read a bit width. The test passes a byte array to initFromPage with the offset equal to the array's length. The fix is to detect that there are no input bytes to read. To avoid adding validity checks to the read path, this sets the internal decoder to one that will throw an exception if any reads are attempted. Author: Ryan Blue Closes #18 from rdblue/PARQUET-18-fix-nulls-with-dictionary and squashes the following commits: 0711766 [Ryan Blue] PARQUET-18: Fix all-null value pages with dict encoding. Conflicts: parquet-column/src/main/java/parquet/column/values/dictionary/DictionaryValuesReader.java commit 1262243faa77b57bb48d0c72c87a4c99814bfb3c Author: Matthieu Martin Date: Fri Jul 18 16:02:09 2014 -0700 PARQUET-4: Use LRU caching for footers in ParquetInputFormat. Reopening https://github.com/Parquet/parquet-mr/pull/403 against the new Apache repository. Author: Matthieu Martin Closes #2 from matt-martin/master and squashes the following commits: 99bb5a3 [Matthieu Martin] Minor javadoc and whitespace changes. Also added the FileStatusWrapper class to ParquetInputFormat to make sure that the debugging log statements print out meaningful paths. 250a398 [Matthieu Martin] Be less aggressive about checking whether the underlying file has been appended to/overwritten/deleted in order to minimize the number of namenode interactions. d946445 [Matthieu Martin] Add javadocs to parquet.hadoop.LruCache. Rename cache "entries" as cache "values" to avoid confusion with java.util.Map.Entry (which contains key value pairs whereas our old "entries" really only refer to the values). a363622 [Matthieu Martin] Use LRU caching for footers in ParquetInputFormat. Conflicts: parquet-hadoop/src/test/java/parquet/hadoop/TestInputFormat.java commit 7119cb838e67ec49d37954c69b90a19fc0aa583c Author: Jenkins slave Date: Wed Jul 16 14:30:15 2014 -0700 Preparing for CDH5.1.1 development commit de40122c1ef1dae1ef80b813b188843214da8244 Author: Jenkins slave Date: Fri Jul 11 10:51:27 2014 -0700 Preparing for CDH5.1.0 release commit 6a2fadb0fa44690ae67090e1d0691fe17deccb89 Author: Tom White Date: Thu Jun 19 13:02:42 2014 +0100 Minimal fix commit f4d94a6f00d8cf9a7fc2fdbde8f60f269f1c1cb8 Author: Steven Willis Date: Wed Jun 18 13:58:02 2014 -0400 Test for filtering records across multiple blocks commit a1cdcc943b7494f33959bc1c8ad1b6716f9723dd Author: Ryan Blue Date: Mon Jun 9 15:59:55 2014 -0700 Enable dictionary encoding for FIXED. This uses the existing dictionary support introduced for int96. Encoding and ParquetProperties have been updated to use the dictionary supporting classes, when requested for write or present during read. This also fixes a bug in the fixed dictionary values writer, where the length was hard-coded for int96, 12 bytes. Conflicts: parquet-column/src/main/java/parquet/column/ParquetProperties.java Changes applied to ColumnWriterImpl instead of ParquetProperties. The change to separate out ParquetProperties was not backported. commit 6f403eff1b5a96df4d949b40c676f1026c57e9db Author: Daniel Weeks Date: Tue May 20 17:18:29 2014 -0700 Updated test and remove shortcut return statement in loader commit a74a6be52ef4d3b85dcd8a488250697b8734dbea Author: Daniel Weeks Date: Tue May 20 16:15:27 2014 -0700 Fixed issue with column pruning when using requested schema commit 85e70844eae01cd3a382661715cfe5e311b20024 Author: Daniel Weeks Date: Mon May 19 09:53:02 2014 -0700 Added test for null padding Conflicts: parquet-pig/src/test/java/parquet/pig/TestParquetLoader.java commit c4ac10be3b2ef6ae4ee433e0973a83da411ef6e9 Author: Daniel Weeks Date: Tue May 13 11:57:34 2014 -0700 Added padding for columns not found in file schema commit 164d00eb07d7152a5e2597e13016f8ed752dd42c Author: Ryan Blue Date: Wed Apr 30 15:36:23 2014 -0700 Use parameterized to test with and without dictionary. Conflicts: parquet-column/src/test/java/parquet/io/TestColumnIO.java commit 3f4a4d33aa1e72135a7511ef21d40e06aa4e1831 Author: Ryan Blue Date: Mon Apr 7 12:29:25 2014 -0700 Fix bug #350, fixed length argument out of order. ParquetProperties not backported, so the fix should be in ColumnWriterImpl where PlainFixedLenArrayDictionaryValuesWriter is instantiated. Conflicts: parquet-column/src/main/java/parquet/column/ParquetProperties.java commit 9ec0d8601b74d263a457c9a787cd9bc099019407 Author: Ryan Blue Date: Fri Mar 21 18:34:59 2014 -0700 CLOUDERA-BUILD. Add source jar to normal build. The public project's pom only builds source jars when deploying to sonatype-oss. This adds the maven-source-plugin configuration from that profile to the regular build. commit f2bf2c94e3f9c817530bd8490a0b85ea3052aab8 Author: julien Date: Tue Apr 29 15:52:45 2014 -0700 fix metadata concurency problem commit 362c1171df01dc0eb9dcc52848c92321a1613857 Author: Ryan Blue Date: Fri Apr 18 15:44:46 2014 -0700 Fix more code review finds. commit 8cf686bcf057eb8203b7f6e46c54bc82e07d15bb Author: Ryan Blue Date: Fri Apr 18 09:22:58 2014 -0700 Remove unchecked casts from Types.Builder. This simplifies the logic so that either a return object is supplied when a Builder is constructed, or the expected type is supplied so that the code can check the return type is valid. commit 573c3f6ff99d8158d10a8a9afcabed61c8927079 Author: Ryan Blue Date: Tue Apr 15 14:21:55 2014 -0700 Implement code review changes. commit 597be6300c889010f4c3f75e512e818ece026917 Author: Ryan Blue Date: Tue Apr 15 09:45:09 2014 -0700 Add INT32 and INT64 as supported types for DECIMAL. commit 4547ec443abffe62226cb0d15bbfc882b22cc713 Author: Ryan Blue Date: Mon Apr 14 10:57:48 2014 -0700 Fix maximum precision calculation, account for sign bit. commit 690d98f8bf04b36de76c22cf573340ce30600deb Author: Ryan Blue Date: Sat Apr 12 12:39:04 2014 -0700 Update documentation and formatting. commit bc3dada3233f249418218a018554f9c929814685 Author: Ryan Blue Date: Sat Apr 12 12:35:17 2014 -0700 Simplify Types API by moving repetition. commit 716faa046205ec83c048ae7c7193f1427897a671 Author: Ryan Blue Date: Sat Apr 12 12:05:20 2014 -0700 Add Types builder API documentation. Also add check that scale <= precision and test. commit aaee4ee1d2ac0b9ea5b9f64bf8395726f61f5701 Author: Ryan Blue Date: Fri Apr 11 15:17:20 2014 -0700 Add test for decimal with unsupported primitive types. commit 433a20f11b44d40e4967c204e9363b646e4e7671 Author: Ryan Blue Date: Fri Apr 11 15:04:48 2014 -0700 Add more tests for type builders. commit 61b906af88da23ee1b9d2923bf536d123d0f0978 Author: Ryan Blue Date: Fri Apr 11 14:32:34 2014 -0700 Fix primitive type equality for fixed with different lengths. commit 965c65f7ed2aac347243dfe4ba626c58a106f12e Author: Ryan Blue Date: Fri Apr 11 14:08:46 2014 -0700 Add support for DECIMAL type annotation. Changes: * Add Types builder API to consolidate type building, consistency checks * Update schema parser to support precision and scale on DECIMAL: required binary aDecimal (DECIMAL(9,2)); * Update writeToStringBuilder methods to add precision and scale * Add DECIMAL conversion in ParquetMetadataConverter * Add precision, scale conversion in ParquetMetadataConverter * Add OriginalTypeMeta to hold type annotation metadata (e.g., scale) * Add more testing to ensure compatibility commit 21368938fca5f66d821c4587d2c039fafe671c73 Author: Tom White Date: Tue May 6 12:06:49 2014 +0100 CLOUDERA-BUILD. Don't remove shading in parquet-hadoop (mistakenly removed in 7b5a93cb). commit 07be81489008825450f0af85b99e840852c3669f Author: julien Date: Tue Apr 8 14:31:06 2014 -0700 adding comments commit 8ddefccb89ecbe4909537155862bc33c9ccb8190 Author: julien Date: Fri Apr 4 16:26:37 2014 -0700 fix header bug commit 0a7e6862a5aeac50ac8f178bc0866d8012b4df12 Author: julien Date: Fri Dec 20 16:05:34 2013 -0800 refactor dictionary page handling commit a163a23592102dfd3cc675ab41264d2a856ad605 Author: julien Date: Fri Dec 20 11:04:43 2013 -0800 optimize consecutive row groups scans commit 1a654d46f6f65954e8b309849fceb7f42e96f43d Author: Tom White Date: Thu May 1 11:37:21 2014 +0100 CLOUDERA-BUILD. Revert change to commons-codec version change from 'compress schemas in input splits' commit. commit 7b5a93cb0a1fadcfb6bdd61aeab331e2970cc48a Author: Dmitriy Ryaboy Date: Wed Apr 2 22:10:46 2014 -0700 stop using strings and b64 for compressed input splits commit 67c1db16eb64c333133ea3d36e8f8397e5e9f075 Author: Dmitriy Ryaboy Date: Tue Apr 1 18:51:00 2014 -0700 compress kv pairs in ParquetInputSplits commit ad9912f2c190c019e7267e6ccd52f2f010bcf0ab Author: Dmitriy Ryaboy Date: Sun Mar 23 22:39:11 2014 -0700 close gzip stream in finally commit f60396e90fa721756c1e5499b72c3235da0d3ae8 Author: Dmitriy Ryaboy Date: Sun Mar 23 20:08:11 2014 -0700 a bit of jar size optimization commit 0f38ea60fd47a31974e16c5d4026bb2e35bce443 Author: Dmitriy Ryaboy Date: Sun Mar 23 19:43:48 2014 -0700 compress schemas in input splits commit baacf52767d8da4e6632c9b261f1ad00d5d75ed5 Author: ledbit Date: Wed Jan 8 12:36:31 2014 -0800 prettify a few lines commit beeaa93e42f4e0c5fd1ab6ff30b7bbdc33c503fa Author: ledbit Date: Mon Jan 6 15:42:58 2014 -0800 Make ParquetInputSplit extend FileSplit commit b5399fa9ec86ccce66b8dbc3ba3f83b1a15a9b75 Author: julien Date: Thu Mar 20 10:41:40 2014 -0700 fix filesystem resolution commit d823d2a935d1ae6e703a0a367477574efdf331a9 Author: Ryan Blue Date: Tue Feb 25 11:21:41 2014 -0800 Fix avro schema conv for arrays of optional type for #312. commit aebf25af702ce8f5c009171ff7fb7e2671359aa2 Author: Jenkins slave Date: Mon Mar 3 11:02:38 2014 -0800 Preparing for CDH5.1.0 development commit 820436b3cf71b525c51c230c681a55928a26dce4 Author: Ryan Blue Date: Fri Feb 14 13:16:24 2014 -0800 Add NanoTime to example. This adds NanoTime to the example objects, stored as an int96, for testing. commit c82cc8997b7e0d369ecb4e877432254e448f0732 Author: Ryan Blue Date: Fri Feb 14 09:51:05 2014 -0800 Use toStringUsingUTF8 to fix tests. Binary values will not necessarily decode with UTF8, but the ExpectationValidatingRecordConsumer can decode because its inputs are controlled for testing. commit 0e68450feca37597f5d24d27f581536828fa74ca Author: Ryan Blue Date: Fri Feb 14 09:10:13 2014 -0800 Factoring out common Binary impl in dictionary writer. commit c96c916010da05da26e55d940dcacf845734bca7 Author: Ryan Blue Date: Fri Feb 14 09:08:45 2014 -0800 Merge Fixed dictionary with Binary dictionary. commit 0bb917880ec7cff11f96b6c79117fa83aa2ea7d3 Author: Ryan Blue Date: Fri Feb 14 09:07:03 2014 -0800 Delegate fixed and int96 types to convertBINARY. commit ec331cdf17c945c40c736b21129a55c2aece2005 Author: Ryan Blue Date: Tue Feb 11 14:03:43 2014 -0800 Remove int96 references from RecordConsumer and Converters. This commit removes int96-specific code from the RecordConsumer and the Converters. Implementations are responsible for checking the Type of columns. Because Binary is used for int96 values, it is no longer assumed that a Binary is printable as a UTF8 string in methods like Binary#toString. commit f81264cc200fd6559e820a4d2c3ea50e442f34ca Author: Ryan Blue Date: Mon Feb 10 16:41:43 2014 -0800 Removing Int96 class, using Binary instead. This removes all references to the Int96 class and uses Binary instead. Int96 calls are still used at the RecordConsumer and Converter level, specifically used by PrimitiveType.PrimitiveTypeName.INT96. Conflicts: parquet-column/src/main/java/parquet/column/ParquetProperties.java commit 0c68846d6700042475e5a7568490e54959e65b0f Author: Ryan Blue Date: Mon Feb 3 18:17:27 2014 -0800 Extending example and group classes for int96. This commit gets TestColumnIO#testOneOfEach passing with an int96 column. commit 0fbbcf3beaa6dd2358a49fbf56d869a49989eb8c Author: Ryan Blue Date: Mon Feb 3 17:40:58 2014 -0800 Initial int96 implementation. This primarily adds int96 calls throughout the read and write paths. Int96 is mostly a place-holder class that wraps a ByteBuffer. This adds int96 support to the PLAIN and PLAIN_DICTIONARY encodings. Existing tests are passing. Conflicts: parquet-column/src/main/java/parquet/column/ParquetProperties.java commit b6db1fffc1dd0ec60a5a2b19fcdd391f2c922f6b Author: Ryan Blue Date: Mon Feb 24 11:50:31 2014 -0800 Add Configuration constructor in thrift writer for #295. Conflicts: parquet-thrift/src/main/java/parquet/thrift/ThriftParquetWriter.java commit 9391731dc8a1c60891b34d761d7fc189a94a0302 Author: Ryan Blue Date: Mon Feb 24 11:37:38 2014 -0800 Add avro constructors with Configuration for #295. To avoid doubling the number of constructors in ParquetWriter, this creates more defaults that subclasses can use. The new AvroParquetWriter constructors call the most specific constructor directly and use the default constants from ParquetWriter to match the default behavior of its constructors. Also fixed a few doc mistakes. Conflicts: parquet-hadoop/src/main/java/parquet/hadoop/ParquetWriter.java commit 0ad7315cc68cdbd030a371d8fb683112a9c3a231 Author: Tom White Date: Wed Feb 19 10:40:23 2014 -0800 Don't fail if no default value specified for a new value in the read schema. commit 82736f9ded65787863b9d89aa58b9ab2c580188e Author: Tom White Date: Wed Feb 12 12:07:04 2014 +0000 Don't deep copy immutable primitive types. commit bcff1a25c6819e72077a8f09a874a2cbb62bdf7a Author: Tom White Date: Wed Jan 15 13:58:51 2014 +0000 Fill in default values for new fields in the read schema that were not in the write schema. Some of the implementation was inspired by https://issues.apache.org/jira/browse/AVRO-1228. commit 819907160078c248b7dbf05ca00fb2443b5f8395 Author: allanyan Date: Thu Jan 30 18:46:04 2014 -0800 use utility method from Configuration class to load class to avoid ClassNotFoundException commit 16e19ad0be6e018506c4a5cd2baa9ef881cdd0e0 Author: allanyan Date: Thu Jan 30 09:49:17 2014 -0800 first use current thread's classloader to load a class, if current thread does not have a classloader, use the class's current classloader to load a class. This will make sure a class not packaged in parquet but on classpath loaded properly. Otherwise, for example, if you set your own ReadSupport class to the Configuration object and expect it to be loaded by ParquetInputFormat, it will fail and throw ClassNotFoundException. commit 9a461d7f240e2892f92b998d050ac1f371d85f68 Author: E. Sammer Date: Sun Feb 2 22:08:09 2014 -0800 Added ParquetWriter() that takes an instance of Hadoop's Configuration. Conflicts: parquet-hadoop/src/main/java/parquet/hadoop/ParquetWriter.java commit b61729352de93bc0a2fa6186a645e77a5703e5e5 Author: Tom White Date: Thu Feb 20 15:28:19 2014 -0800 Don't shade Jackson since Avro exposes Jackson classes in its public API for representing default values for fields. commit f26f829b29beb98ff94e65a56d0f826b4acd612b Author: Tom White Date: Wed Feb 12 12:47:31 2014 +0000 Add explicit blank namespaces to account for change in AVRO-1295 in Avro 1.7.5. commit 398cc5d806ce65cd0db25e0c3ee08300a5d0383c Author: Tom White Date: Wed Jan 22 11:36:14 2014 +0000 Support field renaming for Avro read schemas, by means of field aliases. Avro 1.7.6 is required since it fixes https://issues.apache.org/jira/browse/AVRO-1433 But note that this is only to allow the test to run correctly. Conflicts: parquet-avro/pom.xml commit 97131656221fcac3af880eabde2bf74e0925d4ef Author: Tom White Date: Wed Feb 26 15:04:59 2014 +0000 CLOUDERA-BUILD. Remove references to Parquet ENUM which is not supported until Parquet 2 commit 5b916a6266d10dbcbd4d330adae5d695566b727d Author: David Z. Chen Date: Thu Oct 3 11:57:59 2013 -0700 Plumb OriginalType through to ConvertedType in file in ParquetMetadataConverter. Conflicts: parquet-hadoop/src/main/java/parquet/format/converter/ParquetMetadataConverter.java Removed case for ENUM since it is not a ConvertedType in Parquet 1.0 format. commit 8535c797eab871c3bdb43fb3568b6106d6751ec2 Author: Tom White Date: Tue Feb 4 16:28:18 2014 +0000 Revert change making field final that failed compatibility test. commit b7a6e3f540da50e08a5f00c7e59f31c323570699 Author: Tom White Date: Tue Feb 4 16:19:01 2014 +0000 Minor changes following Julien's review commit 62484cfb0d3e2d9e401e4c9fe679dbc20adb4ffa Author: Tom White Date: Tue Jan 21 10:24:32 2014 +0000 Add tests for reading Parquet files using the default Avro schema. commit fa2173759d22ef83b598dfc17d1e6c8626be97fe Author: Tom White Date: Thu Jan 16 15:14:27 2014 +0000 Use a default Avro read schema when none specified in Parquet-Avro. commit 56fe95ead6f8746bc679887a45084a25330efff9 Author: julien Date: Fri Dec 20 15:37:55 2013 -0800 add unit test commit 6d31efaf5cb887bf72d9c0274ef5559ee565780a Author: julien Date: Wed Dec 18 15:38:55 2013 -0800 adress comments commit 8cbc0211812807ea8c01f86681dd47b9ca0e7d2d Author: julien Date: Tue Dec 17 13:23:10 2013 -0800 make summary files read in parallel; improve memory footprint of metadata commit b13eadd70797121973e7eb1212f63f0d3ff77764 Author: Tom White Date: Tue Dec 17 10:38:12 2013 +0000 Use ContextUtil in tests to avoid dependency on parts of new MR API that are incompatible between MR1 and MR2. Conflicts: parquet-cascading/src/test/java/parquet/cascading/TestParquetTupleScheme.java commit 0216f17ca87d478f96b80d72ec1f90423ae82553 Author: Tianshuo Deng Date: Thu Dec 5 11:51:16 2013 -0800 restore getCompression methods in ParquetOutputFormat for compatibility commit 9e33f2b517a84fd20e5b5f59eb9e04baab3011b6 Author: Tianshuo Deng Date: Thu Dec 5 11:36:53 2013 -0800 make CodecConfig a factory commit 0c2e6bb30ce0792a9581bc3a76b4c761bfda1032 Author: Tianshuo Deng Date: Wed Dec 4 22:11:03 2013 -0800 license header commit 87407b1b172afdd50806c34fdcfd2f3af1e6e70e Author: Tianshuo Deng Date: Wed Dec 4 16:42:54 2013 -0800 remove lzo test and lzo dependency commit 834ec11a530b6720fafd86991f505f76b1d9aeca Author: Tianshuo Deng Date: Wed Dec 4 15:00:34 2013 -0800 fix missing codec commit 3a6ac5a9c4fbe3f6cd37388148a59dec45911a44 Author: Tianshuo Deng Date: Tue Dec 3 22:50:46 2013 -0800 refactor get codec logic to remove duplication in DeprecatedParquetOutputFormat commit 665eb681088abc5df2dd54d29e37897e57b8b965 Author: julien Date: Fri Dec 6 12:45:55 2013 -0800 make the cache use a SoftReference commit 064ed4395a80a48bc84cd6112d839b35ec590e9e Author: julien Date: Fri Dec 6 11:38:07 2013 -0800 fix loader cache commit 2e16ef5a3248f9fb291d636e077fccdc10b3799a Author: julien Date: Tue Dec 3 11:54:55 2013 -0800 optimize chunk scan; fix compressed size commit 3fb970604754a38d1578dbfd5b218cf1f5d3ed5a Author: Tianshuo Deng Date: Tue Dec 3 16:28:56 2013 -0800 format commit 88d73f80edba1cb9830707af6826bdd1f6271021 Author: Tianshuo Deng Date: Tue Dec 3 14:35:54 2013 -0800 check if pig is loaded when writing pig metadata commit d6f816a971969031c3a4e2da2e75b3ab80d57d78 Author: dave2718 Date: Fri Nov 22 02:31:12 2013 -0800 Changing read and write methods in ParquetInputSplit so that they can deal with large schemas (avoiding use of writeUTF and readUTF which are limited to 65536 characters). commit 2e6f225c91e61c22ef7d8ca9a3a0772f25266fe1 Author: Aniket Mokashi Date: Wed Nov 20 14:50:10 2013 -0800 refactor encoded values changes and test that resetDictionary works commit 1a34af90a4d29d796bbc2f1702cbc80bbdc1294c Author: Tianshuo Deng Date: Wed Nov 20 13:30:36 2013 -0800 fix bug: set raw data size to 0 after reset commit f2b478306043296ee114a6864531883496661bc4 Author: Aniket Mokashi Date: Mon Nov 11 17:51:01 2013 -0800 group parquet-format version in one property Conflicts: parquet-avro/pom.xml parquet-hadoop/pom.xml parquet-pig/pom.xml pom.xml commit 50a17001396abe5b599e120d88058ff75c4a1901 Author: Wesley Peck Date: Wed Nov 6 14:19:34 2013 -0500 One of the constructors in ParquetWriter ignores the enable dictionary and validating flags. commit dd2c66f22f051f4173b7d56169489454366ea5b2 Author: Tianshuo Deng Date: Fri Nov 1 15:32:44 2013 -0700 revert revert.. use rawDataByteSize as buffered size in DictionaryValuesWriter commit 51a88fe77399cbe5fd48078d8e7cc87cff5661dc Author: Tianshuo Deng Date: Fri Nov 1 15:10:17 2013 -0700 revert fixing page cutting, fix bug, raw data size should be long commit ce8a97d88f194b6fded092197f6dce9965d12d5e Author: Tianshuo Deng Date: Fri Nov 1 14:07:27 2013 -0700 more comment commit 582cbeca2f07df7eda42909cdfa377f730f2a577 Author: Tianshuo Deng Date: Fri Nov 1 14:04:57 2013 -0700 return raw data size as bufferSize in dictionaryValuesWriter commit f8225530f31deb2006c59c3198e9b1f6f31814ab Author: Tianshuo Deng Date: Fri Nov 1 13:49:09 2013 -0700 remove hash lookup and unused comments commit 0db80c22bb1eb4315ff029de3990e1576e6dea93 Author: Tianshuo Deng Date: Thu Oct 31 14:16:09 2013 -0700 remove unused import commit 8f082cde7ce7cb2326c26a9cfed75ef79de3f800 Author: Tianshuo Deng Date: Thu Oct 31 14:13:23 2013 -0700 bug fix: separate fallBackDictionaryEncodedData to a method, will always be called when fallbacking to plainEncoding commit f30ec3c05752e473a8dcac4314cf951f06067707 Author: Tianshuo Deng Date: Thu Oct 31 13:49:39 2013 -0700 improve binary fallback commit f753bb41a1ef8c4e711e622691b07884ef0d8b38 Author: Tianshuo Deng Date: Thu Oct 31 12:05:22 2013 -0700 improve long fallback commit 41923b3890ccd6fa0facdcd754bbb4383435248a Author: Tianshuo Deng Date: Thu Oct 31 11:43:09 2013 -0700 use primitve array for int, float , double, get rid of auto boxing,unboxing commit 6797f10f0e0b6aaefd468ce7fc3f2fc9b6efbcce Author: Tianshuo Deng Date: Thu Oct 31 09:50:38 2013 -0700 improve fallback for double commit e46a5279d4737656b80fa5830033c8e2ee2f4174 Author: Tianshuo Deng Date: Tue Oct 29 15:11:05 2013 -0700 improve fallback for float commit cf3d30b560e1a5ad9243debb5d24617619d656d5 Author: Tianshuo Deng Date: Tue Oct 29 14:56:51 2013 -0700 fix bug: reverse dictionary lookup for fallbacking to plain encoding commit 66fa979b9fc8e7ab917dc51ca7fcadfad993b9fc Author: Tianshuo Deng Date: Tue Oct 29 13:58:52 2013 -0700 fix bug, add rawDataByteSize for dictionaryValuesWriter to decide if fall back to Plain encoding or not commit afe251efc1a1b4d951284e3fbfcb0d0b9ed03c01 Author: Tianshuo Deng Date: Tue Oct 29 13:18:36 2013 -0700 improve fallback for IntDictionaryWriter commit 3d7a1241e8f492ec7d6b0a33a485b9284c608b94 Author: Tianshuo Deng Date: Wed Oct 30 16:07:36 2013 -0700 format commit ec52e33d7049c5dc76293d025583d0ccb840fb3b Author: Tianshuo Deng Date: Wed Oct 30 16:06:25 2013 -0700 minor fix, the length used in RLEValuesReader commit 5affe0b0fa38ae23be857259512044b2a310aebb Author: Tom White Date: Wed Jan 15 14:10:51 2014 +0000 Support promotion of int, long and float to wider types. This is specified in http://avro.apache.org/docs/current/spec.html#Schema+Resolution commit 26e3bce3a6fd52fc86d77fdaa1e028d6ef2a7607 Author: Tom White Date: Wed Dec 18 15:38:40 2013 +0000 Make setting requested projection and avro schema more independent, so that you only need to set the Avro schema if it is different to the writer's schema. commit 76f58a12349a2eea59dd8da975ddc6d1f33abc2f Author: Alex Kozlov Date: Mon Dec 16 10:04:41 2013 -0800 Fix to read a new avro schema... commit cd1c3eb5016d2e081a532274c4f12f0d02486fe1 Author: Tom White Date: Thu Jan 16 11:30:52 2014 +0000 Fix dependencies for mr1 profile. commit 109b03a1dc83b5588e6ff3da0cb66081007384d3 Author: Ryan Blue Date: Tue Dec 17 16:13:10 2013 -0800 CLOUDERA-BUILD. Add mr1 profile for tests. This also fixes parquet-pig when using the mr1 API. Tests fail with CNF for LocalClientProtocolProvider in hadoop-mapreduce-client-common, the version of which needs to be the non-MR1 hadoop version. commit 01cc5b3b01e6e2b8a4c630b66d461796c009c479 Author: Brock Noland Date: Fri Dec 20 12:55:43 2013 -0600 CLOUDERA-BUILD: CDH-16396 - Comment out parquet-hive* from parquet pom commit a98a777e9cc2e8e0a8299cd54472d22618f3cdf1 Author: Brock Noland Date: Thu Dec 19 14:37:22 2013 -0600 In HIVE-5783 we will need a bundle jar to depend on that does not include the Hive Serde since Hive trunk will contain the Hive Serde. This change introduces such a bundle which would be generally useful for anyone writing Parquet within Hadoop. commit b786ab25921b5aa1ab66226bb57b81cb4c2c402d Author: Ryan Blue Date: Tue Dec 17 10:27:55 2013 -0800 CLOUDERA-BUILD. Use commons-lang 2.5 to match CDH. commit 8cb9786bda14203358f0c8a037a4659c0f9248a1 Author: Tom White Date: Tue Dec 17 10:38:12 2013 +0000 CLOUDERA-BUILD. Use ContextUtil in tests to avoid dependency on incompatible Hadoop 1 MR API. Conflicts: parquet-cascading/src/test/java/parquet/cascading/TestParquetTBaseScheme.java parquet-cascading/src/test/java/parquet/cascading/TestParquetTupleScheme.java parquet-hadoop/src/test/java/parquet/hadoop/codec/CodecConfigTest.java parquet-hive/parquet-hive-storage-handler/src/main/java/parquet/hive/MapredParquetOutputFormat.java parquet-pig/src/test/java/parquet/pig/PerfTest2.java parquet-scrooge/src/test/java/parquet/scrooge/ParquetScroogeSchemeTest.java parquet-thrift/src/test/java/parquet/hadoop/thrift/TestParquetToThriftReadProjection.java parquet-thrift/src/test/java/parquet/hadoop/thrift/TestThriftToParquetFileWriter.java commit fdee5c138c13d85f40384c01df42b4a447475f04 Author: Ryan Blue Date: Mon Dec 16 17:29:37 2013 -0800 CLOUDERA-BUILD. Update parquet-test-hadoop2 for CDH. commit 85f93d16290000340bb720b3c2b7759cf897fef3 Author: Ryan Blue Date: Mon Dec 16 17:57:39 2013 -0800 Revert faac67: causing build to fail. commit 367cd36eeed71f79b8e8ff1bfb1fcffef03bed0b Author: Ryan Blue Date: Mon Dec 16 17:11:46 2013 -0800 CLOUDERA-BUILD. Update parquet-thrift for CDH dependencies. commit 32ad0baaa2a30b76a0fb8263ad28552a417a6626 Author: Ryan Blue Date: Mon Dec 16 17:08:39 2013 -0800 CLOUDERA-BUILD. Update pig for CDH dependencies. TestSummary needed to be modified because null is no longer allowed in a Bag. Three nulls were removed and the validation method updated to reflect the new structure of the test data. Conflicts: pom.xml commit d75885ea8babe9be90fd03495a64810fe4ef23fa Author: Ryan Blue Date: Mon Dec 16 17:04:25 2013 -0800 CLOUDERA-BUILD. Update parquet-hadoop for hadoop-2. Conflicts: parquet-hadoop/src/test/java/parquet/hadoop/codec/CodecConfigTest.java commit 04375bf3b58942325364221ca3588537ec837a38 Author: Tom White Date: Mon Dec 16 14:17:36 2013 +0000 CLOUDERA-BUILD. Use common CDH version of Avro. commit 6a1a89aef88e82b580cd165172dcea4054ca4cd5 Author: Ryan Blue Date: Thu Dec 12 18:49:04 2013 -0800 Update parquet-format to C5 version. commit 6f99097c8a6e9c656a104cb41059d26e0a2c4d23 Author: Ryan Blue Date: Thu Dec 12 18:23:23 2013 -0800 Enable parquet-hive-bundle. commit cdd6b75ab22a9bead9b1202ef811fa1d7774ea22 Author: Ryan Blue Date: Wed Dec 11 14:30:52 2013 -0800 CDH-16101: Fix and backport parquet-mr-230. Updated parquet-hive-storage-handler hive version to cdh.hive.version. This required updating primitive type references from PrimitiveObjectInspectorUtils to TypeInfoFactory (HIVE-5372). I think this is the only required change because tests are passing. commit 0d66b43efd94855952ca1a71fadb8aec5046905f Author: Brock Noland Date: Tue Nov 26 13:30:13 2013 -0600 CDH-16048: Backport parquet-mr-225 to fix 0.12 compatibility. Cherry-picking d2ccc72 and 60c6512 from parquet-mr-255. Cherry-picking 0d47734 for smartCheckSchema test validation. d2ccc72: Breaks parquet-hive up into several submodules, creating infrastructure to handle various versions of Hive going forward. * parquet-hive-storage-handler - this is almost all the previous code * parquet-hive-binding - contains the various binding modules for specific hive versions * parquet-hive-binding-interface - the interface the storage handler compiles to * parquet-hive-binding-factory - factory which can depend on interface, 0.10, and 0.12 * parquet-hive-0.10-binding - binding layer for 0.10 (and 0.11) * parquet-hive-0.12-binding - binding layer for 0.12 (and 0.13) Conflicts: * parquet-hive-bundle/pom.xml * parquet-hive/parquet-hive-binding/parquet-hive-0.10-binding/src/main/java/parquet/hive/internal/Hive010Binding.java * parquet-hive/pom.xml 60c6512: Updates Hive 0.12 compatability patch by adressing all comments from Julien's review plus a few additional cleanups, specifically: * If hive is version 0.12 or newer return 0.12 binding * Adds javadoc and inheritDoc statements where appropiate * Add's link to Hive*Binding implementations describing where code came from * Renames Deprecated{Input,Output}Format to Mapred{Input,Output}Format * Creates shell classes Deprecated{Input,Output}Format inheriting from Mapred{Input,Output}Format * Moves TestMapred{Input,Output}Format to the JUnit 4 API * Replaces Apache licenses in files touched with the version capped at a shorter line length * Add's debug log statements to the binding layers to log items of interest Conflicts: * parquet-hive/parquet-hive-storage-handler/src/main/java/parquet/hive/DeprecatedParquetInputFormat.java * parquet-hive/parquet-hive-storage-handler/src/main/java/parquet/hive/DeprecatedParquetOutputFormat.java * parquet-hive/parquet-hive-storage-handler/src/test/java/parquet/hive/TestMapredParquetInputFormat.java 0d47734: Add test on DeprecatedParquetInputFormat.getSplit() Conflicts: * parquet-hive/parquet-hive-storage-handler/src/main/java/parquet/hive/MapredParquetInputFormat.java * parquet-hive/parquet-hive-storage-handler/src/test/java/parquet/hive/TestMapredParquetInputFormat.java commit c4deb11a31f12828e9a382ed01d1d708d953ed76 Author: Sean Mackrory Date: Tue Nov 19 09:29:47 2013 -0800 CLOUDERA-BUILD. Update to use Hive 0.11.0 changes commit debe6dc6f522b5533e09b88cc09b7b6105904472 Author: Jenkins slave Date: Mon Nov 18 17:09:05 2013 -0800 CLOUDERA-BUILD. Hardcoding Hive to older 0.10.0 version for now commit f80acaf2582b33c74eebfa84f8e8ea7ba4f5c455 Author: Jenkins slave Date: Mon Nov 18 13:24:14 2013 -0800 CLOUDERA-BUILD. Use CDH5 parent POM. commit 2d84303a04b517404ea6fee796b6c8cf7ef8dd49 Author: Nong Li Date: Wed Nov 13 17:32:55 2013 -0800 Fix Binary.equals(). commit 677d3ca908a6c742abff778b85a1d00771af2dd0 Author: Sean Mackrory Date: Mon Nov 4 10:51:09 2013 -0800 Another update to use Hadoop 23 breaking changes commit faac6760d946e162a5c035b77ceec27569a6ee50 Author: Sean Mackrory Date: Thu Aug 1 13:55:17 2013 -0700 Overriding version of Hadoop-LZO that is pulled in as a dependency during the build commit 701e5b38c4e6c5541eaf3dc42053f610cce02029 Author: Sean Mackrory Date: Thu Aug 1 07:37:28 2013 -0700 Update to use Hadoop 23 breaking changes commit 29966b37d5c79ce584671f2965c6ddb03de1a516 Author: Sean Mackrory Date: Thu Aug 1 07:37:28 2013 -0700 CLOUDERA-BUILD. Modifying POM