CDH 5.13.1 Release Notes
The following lists all Lightning-Fast Cluster Computing Jiras included in CDH 5.13.1
that are not included in the Lightning-Fast Cluster Computing base version 1.6.0. The
spark-1.6.0-cdh5.13.1.CHANGES.txt
file lists all changes included in CDH 5.13.1. The patch for each
change can be found in the cloudera/patches directory in the release tarball.
Changes Not In Lightning-Fast Cluster Computing 1.6.0
Spark
Bug
- [SPARK-17460] - Dataset.joinWith broadcasts gigabyte sized table, causes OOM Exception
- [SPARK-17511] - Dynamic allocation race condition: Containers getting marked failed while releasing
- [SPARK-17531] - Don't initialize Hive Listeners for the Execution Client
- [SPARK-20898] - spark.blacklist.killBlacklistedExecutors doesn't work in YARN
- [SPARK-21522] - Flaky test: LauncherServerSuite.testStreamFiltering
- [SPARK-15067] - YARN executors are launched with fixed perm gen size
- [SPARK-13278] - Launcher fails to start with JDK 9 EA
- [SPARK-20904] - Task failures during shutdown cause problems with preempted executors
- [SPARK-20393] - Strengthen Spark to prevent XSS vulnerabilities
- [SPARK-19688] - Spark on Yarn Credentials File set to different application directory
- [SPARK-20922] - Unsafe deserialization in Spark LauncherConnection
- [SPARK-16138] - YarnAllocator tries to cancel executor requests when we have none
- [SPARK-20435] - More thorough redaction of sensitive information from logs/UI, more unit tests
- [SPARK-19263] - DAGScheduler should avoid sending conflicting task set.
- [SPARK-19178] - convert string of large numbers to int should return null
- [SPARK-19720] - Redact sensitive information from SparkSubmit console output
- [SPARK-4105] - FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle
- [SPARK-19652] - REST API does not perform user auth for individual apps
- [SPARK-19220] - SSL redirect handler only redirects the server's root
- [SPARK-18372] - .Hive-staging folders created from Spark hiveContext are not getting cleaned up
- [SPARK-18750] - spark should be able to control the number of executor and should not throw stack overslow
- [SPARK-19033] - HistoryServer still uses old ACLs even if ACLs are updated
- [SPARK-18546] - UnsafeShuffleWriter corrupts encrypted shuffle files when merging
- [SPARK-18535] - Redact sensitive information from Spark logs and UI
- [SPARK-17304] - TaskSetManager.abortIfCompletelyBlacklisted is a perf. hotspot in scheduler benchmark
- [SPARK-16078] - from_utc_timestamp/to_utc_timestamp may give different result in different timezone
- [SPARK-17884] - In the cast expression, casting from empty string to interval type throws NullPointerException
- [SPARK-17850] - HadoopRDD should not swallow EOFException
- [SPARK-15062] - Show on DataFrame causes OutOfMemoryError, NegativeArraySizeException or segfault
- [SPARK-17721] - Erroneous computation in multiplication of transposed SparseMatrix with SparseVector
- [SPARK-17618] - Dataframe except returns incorrect results when combined with coalesce
- [SPARK-17418] - Spark release must NOT distribute Kinesis related assembly artifact
- [SPARK-17617] - Remainder(%) expression.eval returns incorrect result
- [SPARK-17547] - Temporary shuffle data files may be leaked following exception in write
- [SPARK-17465] - Inappropriate memory management in `org.apache.spark.storage.MemoryStore` may lead to memory leak
- [SPARK-17245] - NPE thrown by ClientWrapper.conf
- [SPARK-17356] - A large Metadata filed in Alias can cause OOM when calling TreeNode.toJSON
- [SPARK-11301] - filter on partitioned column is case sensitive even the context is case insensitive
- [SPARK-17038] - StreamingSource reports metrics for lastCompletedBatch instead of lastReceivedBatch
- [SPARK-16656] - CreateTableAsSelectSuite is flaky
- [SPARK-17027] - PolynomialExpansion.choose is prone to integer overflow
- [SPARK-17003] - release-build.sh is missing hive-thriftserver for scala 2.11
- [SPARK-16831] - PySpark CrossValidator reports incorrect avgMetrics
- [SPARK-16939] - Fix build error by using `Tuple1` explicitly in StringFunctionSuite
- [SPARK-16409] - regexp_extract with optional groups causes NPE
- [SPARK-16925] - Spark tasks which cause JVM to exit with a zero exit code may cause app to hang in Standalone mode
- [SPARK-16873] - force spill NPE
- [SPARK-15541] - SparkContext.stop throws error
- [SPARK-16664] - Spark 1.6.2 - Persist call on Data frames with more than 200 columns is wiping out the data.
- [SPARK-16751] - Upgrade derby to 10.12.1.1 from 10.11.1.1
- [SPARK-16440] - Undeleted broadcast variables in Word2Vec causing OoM for long runs
- [SPARK-16313] - Spark should not silently drop exceptions in file listing
- [SPARK-16375] - [Spark web UI]:The wrong value(numCompletedTasks) has been assigned to the variable numSkippedTasks
- [SPARK-16489] - Test harness to prevent expression code generation from reusing variable names
- [SPARK-16488] - Codegen variable namespace collision for pmod and partitionBy
- [SPARK-16514] - RegexExtract and RegexReplace crash on non-nullable input
- [SPARK-16385] - NoSuchMethodException thrown by Utils.waitForProcess
- [SPARK-16372] - Retag RDD to tallSkinnyQR of RowMatrix
- [SPARK-16353] - Intended javadoc options are not honored for Java unidoc
- [SPARK-16329] - select * from temp_table_no_cols fails
- [SPARK-16182] - Utils.scala -- terminateProcess() should call Process.destroyForcibly() if and only if Process.destroy() fails
- [SPARK-16257] - spark-ec2 script not updated for 1.6.2 release
- [SPARK-16044] - input_file_name() returns empty strings in data sources based on NewHadoopRDD.
- [SPARK-16148] - TaskLocation does not allow for Executor ID's with underscores
- [SPARK-13023] - Check for presence of 'root' module after computing test_modules, not changed_modules
- [SPARK-16214] - fix the denominator of SparkPi
- [SPARK-16193] - Address flaky ExternalAppendOnlyMapSuite spilling tests
- [SPARK-16173] - Can't join describe() of DataFrame in Scala 2.10
- [SPARK-16077] - Python UDF may fail because of six
- [SPARK-6005] - Flaky test: o.a.s.streaming.kafka.DirectKafkaStreamSuite.offset recovery
- [SPARK-15606] - Driver hang in o.a.s.DistributedSuite on 2 core machine
- [SPARK-16086] - Python UDF failed when there is no arguments
- [SPARK-16035] - The SparseVector parser fails checking for valid end parenthesis
- [SPARK-15892] - Incorrectly merged AFTAggregator with zero total count
- [SPARK-15395] - Use getHostString to create RpcAddress
- [SPARK-15975] - Improper Popen.wait() return code handling in dev/run-tests
- [SPARK-15915] - CacheManager should use canonicalized plan for planToCache.
- [SPARK-12712] - test-dependencies.sh script fails when run against empty .m2 cache
- [SPARK-12655] - GraphX does not unpersist RDDs
- [SPARK-15736] - Gracefully handle loss of DiskStore files
- [SPARK-14204] - [SQL] Failure to register URL-derived JDBC driver on executors in cluster mode
- [SPARK-15601] - CircularBuffer's toString() to print only the contents written if buffer isn't full
- [SPARK-15528] - conv function returns inconsistent result for the same data
- [SPARK-14261] - Memory leak in Spark Thrift Server
- [SPARK-15260] - UnifiedMemoryManager could be in bad state if any exception happen while evicting blocks
- [SPARK-15262] - race condition in killing an executor and reregistering an executor
- [SPARK-13522] - Executor should kill itself when it's unable to heartbeat to the driver more than N times
- [SPARK-13519] - Driver should tell Executor to stop itself when cleaning executor's state
- [SPARK-14495] - Distinct aggregation cannot be used in the having clause
- [SPARK-15209] - Web UI's timeline visualizations fails to render if descriptions contain single quotes
- [SPARK-13566] - Deadlock between MemoryStore and BlockManager
- [SPARK-14757] - Incorrect behavior of Join operation in Spqrk SQL JOIN : "false" in the left table is joined to "null" on the right table
- [SPARK-14965] - StructType throws exception for missing field
- [SPARK-14671] - Pipeline.setStages needs to handle Array non-covariance
- [SPARK-14159] - StringIndexerModel sets output column metadata incorrectly
- [SPARK-14665] - PySpark StopWordsRemover default stopwords are Java object
- [SPARK-14544] - Spark UI is very slow in recent Chrome
- [SPARK-14563] - SQLTransformer.transformSchema is not implemented correctly
- [SPARK-14298] - LDA should support disable checkpoint
- [SPARK-14454] - Better exception handling while marking tasks as failed
- [SPARK-14357] - Tasks that fail due to CommitDeniedException (a side-effect of speculation) can cause job failure
- [SPARK-14468] - Always enable OutputCommitCoordinator
- [SPARK-14322] - Use treeAggregate instead of reduce in OnlineLDAOptimizer
- [SPARK-14243] - updatedBlockStatuses does not update correctly when removing blocks
- [SPARK-14368] - Support python.spark.worker.memory with upper-case unit
- [SPARK-11327] - spark-dispatcher doesn't pass along some spark properties
- [SPARK-14138] - Generated SpecificColumnarIterator code can exceed JVM size limit for cached DataFrames
- [SPARK-11507] - Error thrown when using BlockMatrix.add
- [SPARK-14232] - Event timeline on job page doesn't show if an executor is removed with multiple line reason
- [SPARK-13845] - BlockStatus and StreamBlockId keep on growing result driver OOM
- [SPARK-14219] - Fix `pickRandomVertex` not to fall into infinite loops for graphs with one vertex
- [SPARK-14187] - Incorrect use of binarysearch in SparseMatrix
- [SPARK-14074] - Use fixed version of install_github in SparkR build
- [SPARK-13642] - Properly handle signal kill of ApplicationMaster
- [SPARK-13806] - SQL round() produces incorrect results for negative values
- [SPARK-14006] - Builds of 1.6 branch fail R style check
- [SPARK-13772] - DataType mismatch about decimal
- [SPARK-13958] - Executor OOM due to unbounded growth of pointer array in Sorter
- [SPARK-13901] - We get wrong logdebug information when jump to the next locality level.
- [SPARK-13207] - _SUCCESS should not break partition discovery
- [SPARK-13327] - colnames()<- allows invalid column names
- [SPARK-13242] - Moderately complex `when` expression causes code generation failure
- [SPARK-13631] - getPreferredLocations race condition in spark 1.6.0?
- [SPARK-13755] - Escape quotes in SQL plan visualization node labels
- [SPARK-13711] - Apache Spark driver stopping JVM when master not available
- [SPARK-13648] - org.apache.spark.sql.hive.client.VersionsSuite fails NoClassDefFoundError on IBM JDK
- [SPARK-13705] - UpdateStateByKey Operation documentation incorrectly refers to StatefulNetworkWordCount
- [SPARK-13697] - TransformFunctionSerializer.loads doesn't restore the function's module name if it's '__main__'
- [SPARK-13444] - QuantileDiscretizer chooses bad splits on large DataFrames
- [SPARK-13454] - Cannot drop table whose name starts with underscore
- [SPARK-12874] - ML StringIndexer does not protect itself from column name duplication
- [SPARK-12316] - Stack overflow with endless call of `Delegation token thread` when application end.
- [SPARK-13473] - Predicate can't be pushed through project with nondeterministic field
- [SPARK-13482] - `spark.storage.memoryMapThreshold` has two kind of the value.
- [SPARK-13475] - HiveCompatibilitySuite should still run in PR builder even if a PR only changes sql/core
- [SPARK-13410] - unionAll AnalysisException with DataFrames containing UDT columns.
- [SPARK-11972] - [Spark SQL] the value of 'hiveconf' parameter in CLI can't be got after enter spark-sql session
- [SPARK-11823] - HiveThriftBinaryServerSuite tests timing out, leaves hanging processes
- [SPARK-12006] - GaussianMixture.train crashes if an initial model is not None
- [SPARK-12672] - Streaming batch ui can't be opened in jobs page in yarn mode.
- [SPARK-12966] - Postgres JDBC ArrayType(DecimalType) 'Unable to find server array type'
- [SPARK-16230] - Executors self-killing after being assigned tasks while still in init
- [SPARK-13112] - CoarsedExecutorBackend register to driver should wait Executor was ready
- [SPARK-16930] - ApplicationMaster's code that waits for SparkContext is race-prone
- [SPARK-15891] - Make YARN logs less noisy
- [SPARK-16533] - Spark application not handling preemption messages
- [SPARK-12447] - Only update AM's internal state when executor is successfully launched by NM
- [SPARK-17549] - InMemoryRelation doesn't scale to large tables
- [SPARK-16414] - Can not get user config when calling SparkHadoopUtil.get.conf in other places, such as DataSourceStrategy
- [SPARK-10722] - Uncaught exception: RDDBlockId not found in driver-heartbeater
- [SPARK-13328] - Possible poor read performance for broadcast variables with dynamic resource allocation
- [SPARK-16625] - Oracle JDBC table creation fails with ORA-00902: invalid datatype
- [SPARK-12941] - Spark-SQL JDBC Oracle dialect fails to map string datatypes to Oracle VARCHAR datatype
- [SPARK-17644] - The failed stage never resubmitted due to abort stage in another thread
- [SPARK-12330] - Mesos coarse executor does not cleanup blockmgr properly on termination if data is stored on disk
- [SPARK-12009] - Avoid re-allocate yarn container while driver want to stop all Executors
- [SPARK-17611] - YarnShuffleServiceSuite swallows exceptions, doesn't really test a few things
- [SPARK-17433] - YarnShuffleService doesn't handle moving credentials levelDb
- [SPARK-16711] - YarnShuffleService doesn't re-init properly on YARN rolling upgrade
- [SPARK-13850] - TimSort Comparison method violates its general contract
- [SPARK-15865] - Blacklist should not result in job hanging with less than 4 executors
- [SPARK-14881] - pyspark and sparkR shell default log level should match spark-shell/Scala
- [SPARK-16505] - YARN shuffle service should throw errors when it fails to start
- [SPARK-16106] - TaskSchedulerImpl does not correctly handle new executors on existing hosts
- [SPARK-15754] - org.apache.spark.deploy.yarn.Client changes the credential of current user
- [SPARK-4452] - Shuffle data structures can starve others on the same thread for memory
- [SPARK-14363] - Executor OOM due to a memory leak in Sorter
- [SPARK-14739] - Vectors.parse doesn't handle dense vectors of size 0 and sparse vectors with no indices
- [SPARK-14679] - UI DAG visualization causes OOM generating data
- [SPARK-13352] - BlockFetch does not scale well on large block
- [SPARK-13780] - SQL "incremental" build in maven is broken
- [SPARK-14477] - Allow custom mirrors for downloading artifacts in build/mvn
- [SPARK-13622] - Issue creating level db file for YARN shuffle service if URI is used in yarn.nodemanager.local-dirs
- [SPARK-12614] - Don't throw non fatal exception from RpcEndpointRef.send/ask
- [SPARK-13652] - TransportClient.sendRpcSync returns wrong results
- [SPARK-13478] - Fetching delegation tokens for Hive fails when using proxy users
- [SPARK-13390] - Java Spark createDataFrame with List parameter bug
- [SPARK-13355] - Replace GraphImpl.fromExistingRDDs by Graph
- [SPARK-12746] - ArrayType(_, true) should also accept ArrayType(_, false)
- [SPARK-13298] - DAG visualization does not render correctly for jobs
- [SPARK-13371] - TaskSetManager.dequeueSpeculativeTask compares Option[String] and String directly.
- [SPARK-13312] - ML Model Selection via Train Validation Split example uses incorrect data
- [SPARK-13300] - Spark examples page gives errors : Liquid error: pygments
- [SPARK-12363] - PowerIterationClustering test case failed if we deprecated KMeans.setRuns
- [SPARK-13142] - Problem accessing Web UI /logPage/ on Microsoft Windows
- [SPARK-13153] - PySpark ML persistence failed when handle no default value parameter
- [SPARK-13047] - Pyspark Params.hasParam should not throw an error
- [SPARK-13265] - Refactoring of basic ML import/export for other file system besides HDFS
- [SPARK-12921] - Use SparkHadoopUtil reflection to access TaskAttemptContext in SpecificParquetRecordReaderBase
- [SPARK-10524] - Decision tree binary classification with ordered categorical features: incorrect centroid
- [SPARK-13210] - NPE in Sort
- [SPARK-13195] - PairDStreamFunctions.mapWithState fails in case timeout is set without updating State[S]
- [SPARK-12629] - SparkR: DataFrame's saveAsTable method has issues with the signature and HiveContext
- [SPARK-12611] - test_infer_schema_to_local depended on old handling of missing value in row
- [SPARK-12222] - deserialize RoaringBitmap using Kryo serializer throw Buffer underflow exception
- [SPARK-12520] - Python API dataframe join returns wrong results on outer join
- [SPARK-12682] - Hive will fail if the schema of a parquet table has a very wide schema
- [SPARK-12546] - Writing to partitioned parquet table can fail with OOM
- [SPARK-12617] - socket descriptor leak killing streaming app
- [SPARK-13101] - Dataset complex types mapping to DataFrame (element nullability) mismatch
- [SPARK-12739] - Details of batch in Streaming tab uses two Duration columns
- [SPARK-13122] - Race condition in MemoryStore.unrollSafely() causes memory leak
- [SPARK-13056] - Map column would throw NPE if value is null
- [SPARK-12711] - ML StopWordsRemover does not protect itself from column name duplication
- [SPARK-13121] - java mapWithState mishandles scala Option
- [SPARK-12780] - Inconsistency returning value of ML python models' properties
- [SPARK-13087] - Grouping by a complex expression may lead to incorrect AttributeReferences in aggregations
- [SPARK-12989] - Bad interaction between StarExpansion and ExtractWindowExpressions
- [SPARK-12231] - Failed to generate predicate Error when using dropna
- [SPARK-13088] - DAG viz does not work with latest version of chrome
- [SPARK-13082] - sqlCtx.real.json() doesn't work with PythonRDD
- [SPARK-10847] - Pyspark - DataFrame - Optional Metadata with `None` triggers cryptic failure
- [SPARK-12486] - Executors are not always terminated successfully by the worker.
- [SPARK-12961] - Work around memory leak in Snappy library
- [SPARK-10582] - using dynamic-executor-allocation, if AM failed. the new AM will be started. But the new AM does not allocate executors to dirver
- [SPARK-12112] - Upgrade to SBT 0.13.9
- [SPARK-12696] - Dataset serialization error
- [SPARK-12755] - Spark may attempt to rebuild application UI before finishing writing the event logs in possible race condition
- [SPARK-12624] - When schema is specified, we should give better error message if actual row length doesn't match
- [SPARK-12760] - inaccurate description for difference between local vs cluster mode in closure handling
- [SPARK-12859] - Names of input streams with receivers don't fit in Streaming page
- [SPARK-12747] - Postgres JDBC ArrayType(DoubleType) 'Unable to find server array type'
- [SPARK-12841] - UnresolvedException with cast
- [SPARK-12346] - GLM summary crashes with NoSuchElementException if attributes are missing names
- [SPARK-12558] - AnalysisException when multiple functions applied in GROUP BY clause
- [SPARK-12708] - Sorting task error in Stages Page when yarn mode
- [SPARK-12784] - Spark UI IndexOutOfBoundsException with dynamic allocation
- [SPARK-9844] - File appender race condition during SparkWorker shutdown
- [SPARK-12026] - ChiSqTest gets slower and slower over time when number of features is large
- [SPARK-12690] - NullPointerException in UnsafeInMemorySorter.free()
- [SPARK-12268] - pyspark shell uses execfile which breaks python3 compatibility
- [SPARK-12685] - word2vec trainWordsCount gets overflow
- [SPARK-12805] - Outdated details in doc related to Mesos run modes
- [SPARK-7615] - MLLIB Word2Vec wordVectors divided by Euclidean Norm equals to zero
- [SPARK-12582] - IndexShuffleBlockResolverSuite fails in windows
- [SPARK-12638] - Parameter explaination not very accurate for rdd function "aggregate"
- [SPARK-12734] - Fix Netty exclusions and use Maven Enforcer to prevent bug from being reintroduced
- [SPARK-12654] - sc.wholeTextFiles with spark.hadoop.cloneConf=true fails on secure Hadoop
- [SPARK-12591] - NullPointerException using checkpointed mapWithState with KryoSerializer
- [SPARK-12598] - Bug in setMinPartitions function of StreamFileInputFormat
- [SPARK-12662] - Add a local sort operator to DataFrame used by randomSplit
- [SPARK-12678] - MapPartitionsRDD should clear reference to prev RDD
- [SPARK-12673] - Prepending base URI of job description is missing
- [SPARK-12016] - word2vec load model can't use findSynonyms to get words
- [SPARK-12453] - Spark Streaming Kinesis Example broken due to wrong AWS Java SDK version
- [SPARK-12511] - streaming driver with checkpointing unable to finalize leading to OOM
- [SPARK-12647] - 1.6 branch test failure o.a.s.sql.execution.ExchangeCoordinatorSuite.determining the number of reducers: aggregate operator
- [SPARK-12589] - result row size is wrong in UnsafeRowParquetRecordReader
- [SPARK-12579] - User-specified JDBC driver should always take precedence
- [SPARK-12470] - Incorrect calculation of row size in o.a.s.sql.catalyst.expressions.codegen.GenerateUnsafeRowJoiner
- [SPARK-12562] - DataFrame.write.format("text") requires the column name to be called value
- [SPARK-12327] - lint-r checks fail with commented code
- [SPARK-12399] - Display correct error message when accessing REST API with an unknown app Id
- [SPARK-12300] - Fix schema inferance on local collections
- [SPARK-12526] - `ifelse`, `when`, `otherwise` unable to take Column as value
- [SPARK-11394] - PostgreDialect cannot handle BYTE types
- [SPARK-12489] - Fix minor issues found by Findbugs
- [SPARK-12424] - The implementation of ParamMap#filter is wrong.
- [SPARK-12517] - No default RDD name for ones created by sc.textFile
- [SPARK-12010] - Spark JDBC requires support for column-name-free INSERT syntax
- [SPARK-12502] - Script /dev/run-tests fails when IBM Java is used
- [SPARK-12499] - make_distribution should not override MAVEN_OPTS
- [SPARK-12477] - [SQL] Tungsten projection fails for null values in array fields
- [SPARK-12012] - Show more comprehensive PhysicalRDD metadata when visualizing SQL query plan
- [SPARK-12350] - VectorAssembler#transform() initially throws an exception
- [SPARK-11783] - When deployed against remote Hive metastore, HiveContext.executionHive points to wrong metastore
- [SPARK-7743] - Upgrade Parquet to 1.7
- [SPARK-11153] - Turns off Parquet filter push-down for string and binary columns
- [SPARK-9407] - Parquet shouldn't fail when pushing down predicates over a column whose underlying Parquet type is an ENUM
Documentation
- [SPARK-15223] - spark.executor.logs.rolling.maxSize wrongly referred to as spark.executor.logs.rolling.size.maxBytes
- [SPARK-14618] - RegressionEvaluator doc out of date
- [SPARK-13439] - Document that spark.mesos.uris is comma-separated
- [SPARK-13350] - Configuration documentation incorrectly states that PYSPARK_PYTHON's default is "python"
- [SPARK-13274] - Fix Aggregator Links on GroupedDataset Scala API
- [SPARK-13214] - Fix dynamic allocation docs
- [SPARK-12894] - Add deploy instructions for Python in Kinesis integration doc
- [SPARK-12814] - Add deploy instructions for Python in flume integration doc
- [SPARK-12722] - Typo in Spark Pipeline example
- [SPARK-12758] - Add note to Spark SQL Migration section about SPARK-11724
- [SPARK-12429] - Update documentation to show how to use accumulators and broadcasts with Spark Streaming
- [SPARK-12487] - Add docs for Kafka message handler
- [SPARK-12507] - Expose closeFileAfterWrite and allowBatching configurations for Streaming
Improvement
- [SPARK-19537] - Move the pendingPartitions variable from Stage to ShuffleMapStage
- [SPARK-16654] - UI Should show blacklisted executors & nodes
- [SPARK-19554] - YARN backend should use history server URL for tracking when UI is disabled
- [SPARK-17874] - Additional SSL port on HistoryServer should be configurable
- [SPARK-12523] - Support long-running of the Spark On HBase and hive meta store.
- [SPARK-12241] - Improve failure reporting in Yarn client obtainTokenForHBase()
- [SPARK-18547] - Decouple I/O encryption key propagation from UserGroupInformation
- [SPARK-8425] - Add blacklist mechanism for task scheduling
- [SPARK-17648] - TaskSchedulerImpl.resourceOffers should take an IndexedSeq, not a Seq
- [SPARK-17623] - Failed tasks end reason is always a TaskFailedReason, types should reflect this
- [SPARK-17649] - Log how many Spark events got dropped in LiveListenerBus
- [SPARK-17485] - Failed remote cached block reads can lead to whole job failure
- [SPARK-17316] - Don't block StandaloneSchedulerBackend.executorRemoved
- [SPARK-15091] - Fix warnings and a failure in SparkR test cases with testthat version 1.0.1
- [SPARK-15761] - pyspark shell should load if PYSPARK_DRIVER_PYTHON is ipython an Python3
- [SPARK-15827] - Publish Spark's forked sbt-pom-reader to Maven Central
- [SPARK-14897] - Upgrade Jetty to latest version of 8/9
- [SPARK-14787] - Upgrade Joda-Time library from 2.9 to 2.9.3
- [SPARK-14149] - Log exceptions in tryOrIOException
- [SPARK-14107] - PySpark spark.ml GBT algs need seed Param
- [SPARK-14058] - Incorrect docstring in Window.orderBy
- [SPARK-3411] - Improve load-balancing of concurrently-submitted drivers across workers
- [SPARK-13760] - Fix BigDecimal constructor for FloatType
- [SPARK-13599] - Groovy-all ends up in spark-assembly if hive profile set
- [SPARK-13601] - Invoke task failure callbacks before calling outputstream.close()
- [SPARK-16796] - Visible passwords on Spark environment page
- [SPARK-12392] - Optimize a location order of broadcast blocks by considering preferred local hosts
- [SPARK-17171] - DAG will list all partitions in the graph
- [SPARK-1239] - Improve fetching of map output statuses
- [SPARK-13904] - Add support for pluggable cluster manager
- [SPARK-5847] - Allow for configuring MetricsSystem's use of app ID to namespace all metrics
- [SPARK-14963] - YarnShuffleService should use YARN getRecoveryPath() for leveldb location
- [SPARK-15205] - Codegen can compile the same source code more than twice
- [SPARK-13810] - Add Port Configuration Suggestions on Bind Exceptions
- [SPARK-14242] - avoid too many copies in network when a network frame is large
- [SPARK-13459] - Separate Alive and Dead Executors in Executor Totals Table
- [SPARK-7729] - Executor which has been killed should also be displayed on Executors Tab.
- [SPARK-12149] - Executor UI improvement suggestions - Color UI
- [SPARK-12716] - Executor UI improvement suggestions - Totals
- [SPARK-12759] - Spark should fail fast if --executor-memory is too small for spark to start
- [SPARK-13279] - Scheduler does O(N^2) operation when adding a new task set (making it prohibitively slow for scheduling 200K tasks)
- [SPARK-12645] - SparkR support hash function
- [SPARK-13280] - FileBasedWriteAheadLog logger name should be under o.a.s namespace
- [SPARK-7889] - Jobs progress of apps on complete page of HistoryServer shows uncompleted
- [SPARK-12967] - NettyRPC races with SparkContext.stop() and throws exception
- [SPARK-13094] - No encoder implicits for Seq[Primitive]
- [SPARK-11780] - Provide type aliases in org.apache.spark.sql.types for backwards compatibility
- [SPARK-12834] - Use type conversion instead of Ser/De of Pickle to transform JavaArray and JavaList
- [SPARK-12932] - Bad error message with trying to create Dataset from RDD of Java objects that are not bean-compliant
- [SPARK-12701] - Logging FileAppender should use join to ensure thread is finished
- [SPARK-12450] - Un-persist broadcasted variables in KMeans
- [SPARK-12411] - Reconsider executor heartbeats rpc timeout
- [SPARK-12120] - Improve exception message when failing to initialize HiveContext in PySpark
- [SPARK-5273] - Improve documentation examples for LinearRegression
- [SPARK-11155] - Stage summary json should include stage duration
- [SPARK-12471] - Spark daemons should log their pid in the log file
- [SPARK-11929] - spark-shell log level customization is lost if user provides a log4j.properties file
New Feature
- [SPARK-16554] - Spark should kill executors when they are blacklisted
- [SPARK-16956] - Make ApplicationState.MAX_NUM_RETRY configurable
- [SPARK-11515] - QuantileDiscretizer should take random seed
- [SPARK-13465] - Add a task failure listener to TaskContext
- [SPARK-11206] - Support SQL UI on the history server
- [SPARK-3611] - Show number of cores for each executor in application web UI
- [SPARK-12393] - Add read.text and write.text for SparkR
- [SPARK-5682] - Add encrypted shuffle in spark
- [SPARK-9516] - Improve Thread Dump page
- [SPARK-10359] - Enumerate Spark's dependencies in a file and diff against it for new pull requests
- [SPARK-2750] - Add Https support for Web UI
- [SPARK-2805] - Update akka to version 2.3.4
Story
Task
- [SPARK-12297] - Add work-around for Parquet/Hive int96 timestamp bug.
- [SPARK-17675] - Add Blacklisting of Executors & Nodes within one TaskSet
- [SPARK-13474] - Update packaging scripts to stage artifacts to home.apache.org
Test
- [SPARK-18922] - Fix more resource-closing-related and path-related test failures in identified ones on Windows
- [SPARK-13693] - Flaky test: o.a.s.streaming.MapWithStateSuite
- [SPARK-18117] - Add Test for Interaction of TaskSchedulerImpl with TaskSetBlacklist
- [SPARK-17102] - bypass UserDefinedGenerator for json format check
- [SPARK-14391] - Flaky Test org.apache.spark.launcher.LauncherServerSuite.testCommunication
- [SPARK-15783] - Fix more flakiness: o.a.s.scheduler.BlacklistIntegrationSuite
- [SPARK-15714] - Fix Flaky Test: o.a.s.scheduler.BlacklistIntegrationSuite
Umbrella
- [SPARK-15613] - Incorrect days to millis conversion
- [SPARK-15723] - SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ name
- [SPARK-10372] - Add end-to-end tests for the scheduling code
- [SPARK-11031] - SparkR str() method on DataFrame objects
- [SPARK-11563] - Use RpcEnv to transfer generated classes in spark-shell