What is New¶
The new features and enhancements are listed in the section below.
Note
The label (in blue) that is against the description indicates the launch state, availability, and the default state of the feature/enhancement. For more information, click the label.
Unless stated otherwise, features are generally available, available as self-service and enabled by default.
Cluster Management¶
- Qubole now supports
c5d
,m5d
,r5
,r5d
, andz1d
instances. Learn more. - QDS has introduced a new cluster console providing granular visibility, enhanced governance and ease of use with features such as an ability to track a specific cluster’s activity, view history of its configuration changes, cluster usage, and instances. Learn more. Beta
- QDS supports configuring multiple subnets in a VPC when clusters are configured with all compositions including the heterogeneous clusters. Learn more. Cluster Restart Required
Hive¶
- Hive 2.1 version is now generally available. Learn more. Cluster Restart Required
- Qubole has added HA Proxy in the cluster coordinator node for balancing the load when there are multiple connections between the cluster and the Qubole-managed Hive Metastore. This removes a single point of failure and provides more stability. Learn more. Via Support
Hadoop 1¶
Hadoop 1 as-a-service is deprecated from this QDS version. Qubole will support Hadoop 1 on existing clusters until 31 December 2018. Creating a new Hadoop 1 cluster or cloning an existing Hadoop 1 cluster is not supported through the API and the UI. Learn more.
Presto¶
- Qubole has added a new feature to automatically retry the unsuccessful Presto queries (if possible) when the nodes are being added as part of autoscaling. Learn more. Disabled Cluster Restart Required
- Presto Notebooks are now generally available. Learn more.
- The latest supported version is Presto 0.208. Learn more. Beta Cluster Restart Required
Spark¶
- Qubole Spark supports RubiX distributed file caching system. Learn more. Beta Via Support Disabled
- Qubole Spark provides Dynamic Filtering for join query performance improvement. Learn more. Via Support Disabled
- Sparklens experimental open service tool is available on http://sparklens.qubole.net. Learn more.
- Parquet footer metadata caching to improve query execution performance. Learn more. Via Support Disabled
- Proactive cleanup of shuffle block data to enable faster downscaling of nodes. Learn more. Via Support Disabled
- Autoscaling is enabled by default for clusters. The default value for the maximum number of autoscaling nodes has been increased from 2 to 10 for a new Spark cluster. Learn more.
- Large Spark SQL commands are now supported in API and Analyze page. Learn more. Via Support Disabled
- Qubole Spark command subtypes are now supported with script files containing macros. Learn more. Via Support Disabled
Deprecated Spark Versions¶
In this release, the following Spark versions are deprecated: 1.5.1, 1.6.0, 1.6.1, 2.0.0, and 2.1.0. Qubole will continue to support Spark 1.6.2 and latest maintenance versions of each minor version in Spark 2.x. See version support documentation.
Spark Structured Streaming¶
Qubole provides comprehensive support for Kinesis connector in Structured Streaming.
- Support to read from Kinesis Source in micro-batch streaming and continuous streaming modes.
- Support to ingest data into Kinesis.
- Support for IAM roles in Kinesis Connector.
Support to write a structured streaming query to a Spark data source table. Learn more.
Streaming query progress graphs are now displayed in notebooks. Learn more. Via Support Disabled
Direct writes for checkpointing is supported. Learn more. Via Support Disabled
Notebooks¶
- Notebooks can be exported as PNG, PDF, and HTML files. Learn more. Beta Disabled
- Cluster Status is available on the Notebooks page. Learn more. Beta Via Support Disabled
- Spark application status is available on the Notebooks page. Learn more. Beta Via Support Disabled
Administration¶
- Administrators can now track usage on Qubole using the QCUH dashboard. Beta
- Qubole has introduced a new Service user type. Beta, Via Support, Disabled
- Administrators can now configure an additional resource, Data Preview, on HIVE tables while managing resources on the Manage Roles page.
Data Analytics¶
- QDS has introduced a functionality where customers can request setting a maximum command concurrent limit percentage for all users of an account. Via Support, Disabled
Data Engineering¶
Explore
- Qubole has enabled support for AWS Aurora-MySQL RDS as a database backend for Airflow contents.
Airflow
- Python 3.5 is now supported on Airflow clusters. You can now manage add-on packages using package management. Beta, Via Support, Disabled
- Qubole now enables you to monitor the health of Airflow clusters using integrated Monit, and lets you turn-on/turn-off certain services. Cluster Restart Required
- Continuous development, integration, and deployment of Airflow DAGs with the Qubole UI. Beta, Via Support, Disabled Cluster Restart Required
Security¶
- Apache Ranger integration for Hive workloads to help security administrators in defining fine-grained data access policies across users and user groups.
- Security administrators can define and enforce RBAC policies across multiple Qubole artifacts that contain data and metadata such as commands, data stores connections, data previews, and results.