What is New

Changes in Package Management

Package Management has introduced these features: BetaImageP

  • Package Management supports uninstalling packages. For more information, see the UI documentation and API documentation.
  • It supports account-level ACLs and object-level ACLs. For more information, see the UI documentation and API documentation.
  • It supports deleting an environment. For more information, For more information, see the UI documentation and API documentation.
  • It supports adding more packages via the conda-forge channel.

Read more here.

QDS Supports Multipart File Output Committer

QDS now supports Multipart File Output Committer. Currently, it is available only for Hadoop jobs when the s3a file system is enabled. It provides an improved performance with S3 data writes when compared to File Output Committer V2. BetaSH

To enable this committer, set the following properties as Hadoop overrides on the cluster configuration UI page. ClusterR

mapreduce.outputcommitter.factory.class=org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory
fs.s3a.committer.name=directory
mapreduce.fileoutputcommitter.algorithm.version= 1

For more information, see the documentation.

Updates to Cluster Monitoring with Datadog

In addition to Ganglia, QDS supports cluster monitoring with Datadog to help users track of cluster health and operational metrics. When configured with a DataDog account - QDS automatically creates dashboards and alerts for each clusters. BetaC

Earlier, dashboards and alerts were already available for Presto and Hadoop. Now, the following features are added:

  • Notebooks - Zeppelin metrics are now published to Datadog.
  • HiveServer2 - Qubole Hive has now added alerts and dashboards in Datadog for HiveServer2 Metrics. For more information, see the documentation.

Changes in Cluster Management

  • Qubole has made optimizations to reduce the number of AWS API calls to start/upscale a cluster which alleviates the API rate limiting problem faced in large AWS accounts. BetaImage

    For more information, see the documentation.

  • These features/enhancements are GA4.

    • QDS clusters support C5, H1, and M5 AWS instance types.

    • Qubole has moved to HVM image for better reliability and performance.

    • QDS supports soft enforcement of cluster permissions at the object level. On the Manage Permissions dialog of a specific cluster, when you select one permission, then additional cluster permissions are automatically selected. You can still disable those additional permissions in the UI before saving.

      Qubole highly recommends a user to accept the enforced permissions. For example, the Read permission is enforced with the Start permission. If you decide to uncheck the Read permission on the UI, Qubole warns you that the product experience is not optimal. For more information, see this documentation.

Read more here.

Changes in Hive

  • Qubole supports running Hive 2.1 via QDS servers. BetaImage1

    This is in addition to the existing support for running Hive 2.1 via HiveServer2 running on Hive-on-coordinator. Running Hive 2.1 queries via QDS servers is scalable. It does not overload the coordinator node as opposed to running Hive-on-coordinator.

  • Qubole supports external authentication to use HiveServer2 in Hive 2.1. HiveServer2 will authenticate requests that are sent directly to it. GA

    Read more here.

Changes in Presto

  • QDS supports Presto 0.193 on Presto clusters OpenBeta.

  • The following changes are GAP:

    • Changes in supported Presto versions:

      • Presto 0.180 is the default version of Presto.
      • Qubole has deprecated the Presto 0.142 version. It will not be available for spawning new clusters or version change. Although existing clusters will continue to work until the configuration is changed.
    • Qubole now supports the Dynamic Filter feature. It is a join optimization to improve performance of JOIN queries. It has been introduced to optimize Hash JOINs in Presto which can lead to significant speedup in relevant cases. It is not enabled by default and it is supported only in Presto 0.180 or later versions ClusterRestart.

      Enable the Dynamic Filter feature:

      • As a session-level property by setting this property - set session dynamic_filtering = true.
      • As a Presto override in the Presto cluster by setting experimental.dynamic-filtering-enabled=true.
    • All Presto connectors are now available in the Presto 0.180 version.

    • GeoSpatial functions from 0.193 have been back ported into 0.180 in QDS. Apart from GeoSpatial functions, the hammingDistance string function has also been ported.

    • The JOIN REORDER support based on Table statistics has been added. It enables capability to pick optimal order for joining tables. SQL workload runtime speeds up through JOIN reordering optimizations. It is not enabled by default and it is supported only in Presto 0.180 or later versions ClusterRestart1.

      Enable the JOIN Reordering:

      • As a session-level property by setting qubole_reorder_joins = true.
      • As a Presto override in the Presto cluster by setting qubole-reorder-joins=true.
    • To authenticate direct connections to the cluster coordinator, the basic file-based authentication has been added.

    • Qubole Presto has added a new Hive connector configuration property, hive.skip-corrupt-records to skip corrupt records in input formats other than ORC, Parquet and RCfile. It is supported only in Presto 0.180 or later versions. Set hive.skip-corrupt-records=true for all queries on a Presto cluster to ignore corrupt records ClusterRestart2.

      This configuration can also be set as a session property as hive.skip_corrupt_records=true.

    Read more here.

Changes in Spark

  • The default value of max-executors for a Spark application has been increased from 2 to 1000. Note that this only applies to Spark applications that run from the Analyze interface or through REST API. Notebooks interface will continue to cap default max-executors to 10 unless overridden using Interpreter properties. If you want to use a different value, set the spark.dynamicAllocation.maxExecutors configuration explicitly at the Spark application level BetaImage2.

    If you want a different value for all Spark applications run on a cluster, set the value as a Spark override on that cluster.

  • Spark 2.2.1 is the latest version supported on Qubole Spark and it is reflected as 2.2 latest (2.2.1) on the Spark cluster UI. All 2.2.0 clusters are automatically upgraded to 2.2.1 with the cluster restart, in accordance with Qubole’s Spark versioning policy GA1.

    Note

    Spark 2.2.1 as the latest version will be rolled out in a subsequent patch post the R52 release.

  • In case of DirectFileOutputCommitter (DFOC) with Spark, if a task fails after writing partial files, the reattempt also fails with FileAlreadyExistsException and the job fails. This issue is fixed in Spark versions 2.1.x and 2.2.x GA2.

    At a Spark cluster level, enable ClusterRestart3:

    spark.hadoop.mapreduce.output.textoutputformat.overwrite true
    spark.qubole.outputformat.overwriteFileInWrite true
    

    At a Spark job level, enable:

    spark.hadoop.mapreduce.output.textoutputformat.overwrite=true
    spark.qubole.outputformat.overwriteFileInWrite=true
    

    Qubole Spark would enable both the options by default in the near future.

  • Qubole Spark supports Hive 2.1 metastore for Spark 2.2.x BetaImage3.

    Read more here.

Changes in Notebooks

  • Qubole has improved usability of a markdown paragraph BetaImage4:
    • The editor auto hides and auto runs (whenever a user makes changes) after the markdown paragraph is out of focus.
    • Double clicking a markdown paragraph displays the editor.
  • Qubole has improved the user experience with faster Zeppelin bring up. This will also help in resolving the following intermittent issues reported by customers BetaImage6:
    • Loss of interpreters
    • The Zeppelin service not being up
  • Existing Spark interpreters have been made compact. Properties whose values are not explicitly overridden, will be removed from the interpreters. These properties would be picked from the cluster defaults BetaImage7.
  • A new field called Default Language is added in Spark notebooks. Using this field, now a user has the flexibility to choose the default supported language for Spark interpreter while creating the notebook. This default language is persisted when the notebook is detached from one cluster and is attached to another cluster and it is also persisted when this notebook is imported or exported BetaImage8.

These enhancements are GA3:

  • There is a new option at a notebook level to show/hide line numbers for all paragraphs.

  • Based on the feedback from multiple users, QDS will gradually deprecate the support for Internal Notebook Scheduler post R52. After its deprecation, users should only use Qubole’s Scheduler to schedule notebooks.

  • The z.show() functionality now supports tabular view of dataframe head.

  • The permalink option is available in Example Notebook’s settings dropdown as it is available in other notebooks.

  • Qubole will pre cache the left navigation content in Notebook and Dashboards so that a user does not have to wait for the data to be loaded.

  • In the Interpreters page, a Log link is added for each Spark interpreter. If the interpreter is not started as part of the current cluster instance, it redirects to the logs folder.

  • For ease of debugging the TTransport exception, a hyperlink to the FAQ that contains the solution has been added in the paragraph output.

    Read more here.

Changes in Dashboards

The following changes are GAD:

  • QDS automatically hides the paragraphs in the Dashboard if the output is empty or user has chosen to hide the output. QDS has improved the user experience by auto hiding the dashboard paragraphs with no output or paragraphs with hidden output.

  • Qubole will pre-cache the left navigation content in Notebook and Dashboards so that a user does not have to wait for the data to be loaded.

  • The default location for a new dashboard is changed to the Home folder.

    Read more here.

Changes in the QDS UI

On the Control Panel > Account Settings, a new field datadog_alert_email_list is added for receiving Datadog alerts. This parameter is used to update the default Datadog email that would receive alerts. The alert email is applicable only to the new clusters. In case of existing clusters, if you want to change the email, you can change it on the Datadog UI. GA5

This feature allows you to add a comma-separated list of emails, who would want to receive Datadog alerts.

Read more in:

GDPR Readiness

As part of GDPR readiness, Qubole has updated its privacy policy. The updated privacy policy now states how Qubole will collect, use, disclose, and share personal data that belongs to QDS users.

As part of the Sign up process, when personal information such as name, email ID, company, title, location, and phone number is collected from a new user, he will be asked to give consent to Qubole’s Privacy Policy and Terms of Service, by selecting the associated check box before proceeding further in the Sign up process.

An existing user of an existing account, during a fresh login, will be asked to provide consent to Qubole for processing his personal data by selecting the Privacy Policy check box. This will be required to successfully log into the QDS platform. The consent will be collected using a one-time pop-up which will not be shown to users during subsequent sign-ins.

The same pop-up with the Privacy Policy check box requesting for consent will be displayed when a user is invited to join an existing QDS account.