- Spark provides Dynamic Filtering for join query performance improvement. Learn more. Via Support Disabled
- The Sparklens experimental open-source tool is available on http://sparklens.qubole.net. Learn more
- Proactive cleanup of shuffle block data allows faster downscaling of nodes. Learn more. Via Support. Disabled
- Autoscaling is enabled by default for Qubole Spark clusters. The default value for the maximum number of autoscaling nodes has been increased from 2 to 10 for a new Spark cluster. Learn more.
- Large Spark SQL commands are now supported in the API and on the Analyze page of the QDS UO. Learn more. Via Support. Disabled
- Spark commands of sub-type
command line, and
sqlnow support macros in a script file. Learn more. Via Support. Disabled
Spark 2.3.2 will be supported in a patch following the R54 release.
Shuffle Data Cleanup¶
SPAR-2658: Autoscaling is enabled by default for Spark clusters. The default value for the maximum number of autoscaling nodes has been increased from 2 to 10 for a new Spark cluster.
Support for HiveServer2¶
SPAR-2827: HiveServer2 is now supported with Spark 2.3.x. JDBC and ODBC clients can execute SQL and Hive queries over JDBC and ODBC protocols on Spark 2.3.x.
Support for Large SQL Commands¶
Support for Qubole Macros¶
- SPAR-2500: Optimizes
INSERT OVERWRITEinto dynamic partitions in Hive tables via Spark direct writes. Spark writes files directly to the final destination instead of writing to a temporary staging directory, which improves performance. Supported on Spark 2.2.x and 2.3.x. Via Support. Disabled
- SPAR-3042 and SPAR-3060: If the cluster uses a custom package, the package is identified in the Custom Spark Package field under the Configuration tab of the Edit Cluster Settings page. You can remove a custom package and choose a mainline Spark version instead; the default is 2.3.
- SPAR-2975: The following Spark versions are deprecated: 1.5.1, 1.6.0, 1.6.1, 2.0.0, and 2.1.0. QDS continues to support Spark 1.6.2, and the latest maintenance versions of each minor version of Spark 2.x. See the Supported Versions page. Spark 2.3-latest is now the default Spark version.
- SPAR-2127: Spark commands using a query path are now supported.
- SPAR-2866: The legacy Hadoop
aws-sdk jarwas causing conflicts with the Spark
aws-sdk jar. The legacy JAR has been removed.