Enhancements After 13th June 2018

Release Version: 52.50.0

  • PRES-1351: The Qubole Presto Server Bootstrap is an alternative to Node Bootstrap Script to install external jars such as presto-udfs before the Presto Server is started. This prevents the additional restart of the Presto Server that happens when you install such jars through the Node Bootstrap Script which causes query failures during the node startup. It is only supported in Presto 0.180 and later versions.
  • PRES-1928: Qubole ensures that Presto would not start if the Qubole Presto Server bootstrap is specified through bootstrap.properties or the bootstrap-file-path in Presto overrides and when running the Qubole Presto Server bootstrap fails.

Release Version: 52.49.0

  • ACM-3307: Qubole supports the AWS London Region.

Release Version: 52.36.0

Qubole Spark 2.3.1 is the first release of Apache Spark 2.3.x on Qubole. It is displayed as 2.3 latest (2.3.1) in the Spark Version field of the Create New Cluster page on QDS UI.

Apache Spark 2.3 has the following major features and enhancements:

  • Vectorization of PySpark UDFs significantly improves execution performance of PySpark.
  • Structured Streaming
    • Stream-stream joins add the ability to join streaming data from multiple streaming sources.
    • Continuous processing provides millisecond latency stream processing on certain Spark functions.
  • Vectorized ORC Reader is a new ORC reader with improved performance of ORC file read.

For more information about all the features and enhancements of Apache Spark 2.3, see:

Enhancements

Apart from the Apache Spark 2.3 enhancements, Qubole Spark 2.3 has the following enhancements:

  • SPAR-2527: Integration with newer autoscaling APIs introduced in Apache Spark 2.3.
  • SPAR-2274: Refactored collecting S3 metrics by integrating with the newer AppStatusListener APIs in Apache Spark 2.3.
  • SPAR-2603: Refactored the Qubole idle timeout functionality which shuts down Spark applications that are idle for more than the idle timeout minutes. The default value is 60 minutes.

Known Issues and Limitations

  • SPAR-2827: HiveServer2(HS2) fails to come up with Spark 2.3.1

    HS2 fails to come up with Spark 2.3.1 due to a known issue in Open Source Spark (OSS). For details, see https://issues.apache.org/jira/browse/SPARK-22918

  • ZEP-2769: %knitr Interpreter is not supported with Spark 2.3.1