What’s New

Important new features and improvements are as follows.

Note

Blue text next to a description in these Release Notes indicates the launch state, availability, and default state of the item. For more information, click the label. Unless otherwise stated, features are generally available, available as self-service (without intervention by Qubole support), and enabled by default.

Azure-Specific Improvements

QDS on Azure now supports:

  • RubiX caching in Presto clusters. Provides faster read performance for recently accessed files in Presto. This is a Beta feature, disabled by default. Enable it from the Clusters page of the QDS UI.
  • Disk upscaling. Adds more storage when a node’s available disk space falls below a configured threshold. Enable it from the Clusters page of the QDS UI.
  • Static IP. Allows you to use a static IP address for access to a cluster’s master node. Enable it from the Clusters page of the QDS UI.

Cluster start-up improvement: R54 uses ARM template-based cluster deployment and UDF improvements to speed cluster start-up, reducing average start times on Azure to under five minutes. Note that this is an average; some clusters (particularly Spark clusters) may still take longer than five minutes to come up.

Root disk size of 60 GB: As of R54, Azure nodes have a root disk size of 60 GB. This leaves at least 15 GB free to install packages as needed, and resolves cluster-start issues that arose when the root disk became full.

Other Important Improvements

Hive

  • Hive 2.1 is now generally available. Cluster Restart Required
  • QDS now uses HAProxy on the cluster Master node to balance the load when there are multiple connections between the cluster and a QDS-managed Hive Metastore. Learn more. Via Support

Presto

Spark

  • Spark Dynamic Filtering improves JOIN performance. Via Support. Disabled
  • The Sparklens experimental open-source tool is available on http://sparklens.qubole.net. Learn more.
  • Proactive cleanup of shuffle block data allows faster downscaling of nodes. Learn more. Via Support. Disabled
  • Autoscaling is enabled by default for Qubole Spark clusters. The default value for the maximum number of autoscaling nodes has been increased from 2 to 10 for a new Spark cluster.
  • Large Spark SQL commands are now supported in the API and from the Analyze page of the QDS UI. Via Support. Disabled
  • Spark commands of sub-type scala, python, R, command line and sql now support macros in a script file. Learn more. Via Support. Disabled

Deprecated Spark Versions as of R54: 1.5.1, 1.6.0, 1.6.1, 2.0.0, 2.1.0.

QDS continues to support Spark 1.6.2, and the latest maintenance versions of each minor version of Spark 2.x. See the Supported Versions page.

Notebooks

Administration

  • QDS has a new Service user type. Beta, Via Support, Disabled
  • Administrators can now allow Data Preview (for Hive tables) from the Manage Roles page of the QDS UO.

Learn more.

Data Analytics

  • QDS now allows you to set a maximum command concurrent limit percentage for all users of an account. Via Support, Disabled

Learn more.

Data Engineering

Airflow

  • QDS now allows you to monitor the health of Airflow clusters using integrated Monit, and turn certain services on and off. Cluster Restart Required

Learn more.

Security

  • R54 provides Apache Ranger integration for Hive workloads to help security administrators define fine-grained data-access policies for users and groups.
  • Security administrators can define and enforce RBAC policies across multiple QDS artifacts that contain data and metadata, such as commands, data stores connections, data previews, and results.

Learn more.