Cluster Management

The new features and enhancements are:

Other enhancements and bug fixes are listed in:

Qubole Supports New Instance Types

ACM-5266: Qubole supports i3en.large, i3en.xlarge, i3en.2xlarge, i3en.3xlarge, i3en.6xlarge, i3en.12xlarge, and i3en.24xlarge instances.

ACM-5413: Qubole supports m5.8xlarge, m5.16xlarge, m5a.8xlarge, m5a.16xlarge, r5.8xlarge, r5a.8xlarge, r5a.16xlarge, and r5.16xlarge instances.

Configuring Coordinator and Minimum Number of Nodes Separately in Cluster

ACM-5369: You can now configure coordinator and minimum number of nodes separately on the cluster configuration through the UI/API. This provides the flexibility to configure the coordinator node as On-Demand node type while rest of the nodes (minimum number of nodes and autoscaled nodes) as Spot Nodes (or Spot Block Nodes). However, Maximum Price (%) and Request Timeout common configuration applies to all three types of nodes (coordinator, minimum number of nodes and autoscaling). The stability of nodes has to be in the descending order from the coordinator, minimum number of nodes, and autoscaling nodes. Gradual Rollout

The following table describes the supported combinations of coordinator, minimum number of nodes, and autoscaling nodes.

../../_images/MasterMinAutoscaledNodes.png

Disable SSH Access to Clusters

ACM-5436: You can now disable SSH access through Control Panel > Account Features UI. You can also contact Qubole Support to disable SSH access to clusters. Qubole Support may request to temporarily open inbound SSH port access through the security group associated with the cluster’s coordinator node to debug any issues. Feature to opt in | Cluster Restart Required | Beta

The following conditions must be true to support this configuration:

  • The cluster must be in a private subnet.
  • The Hive metastore must be directly accessible from the cluster. This configuration does not work with the Qubole-managed Hive metastore.

If SSH is disabled, the SSH port 22 must be opened by the user to allow Qubole Support to debug issues when you seek support.

Cluster Health Tile Card with Metrics

ACM-4294: A new cluster health tile card is visible in the Cluster details page is displayed now with cluster health metrics for all cluster types. It indicates the metrics with its severity shown in different colors.

Configuring Notification Channels to a Given Cluster

ACM-5514: Users can now configure a notification channel to a cluster where they can receive alerts and notifications about the cluster. Gradual Rollout

Hadoop 2 (Hive) clusters to be Renamed as Hadoop (Hive) Clusters

ACM-4221 and ACM-5016: Qubole supports Hive 3.1.1 (beta) on a Hive cluster. Starting cluster API v2.1, Hadoop 2 (Hive) clusters are renamed as Hadoop (Hive) clusters. You can set Hive 3.1.1 (beta) version while creating/editing a cluster. Via Support

Enhancements

  • ACM-515: Qubole supports changing cloud authentication credentials when clusters are stuck in the terminating state due to invalid credentials. Qubole automatically tries to use account credentials if cluster credentials do not work to terminate the cluster. You can also contact Qubole support to forcefully terminate the cluster that is stuck in the terminating state.

  • ACM-1493: In the Clusters’ landing page, email of the user, who has manually terminated the cluster will now be visible.

  • ACM-2552: Account-level tags are attached to the test instances created while validating account credentials.

  • ACM-4323: Qubole supports these query runtime configurations for a given cluster: Via Support

    • The configuration that sets the query execution timeout in minutes for a cluster. Qubole auto terminates a query if its runtime exceeds the timeout.
    • The configuration that sets a warning about the query runtime in minutes for a given cluster. Qubole notifies the user of the account through an email if a query’s runtime exceeds the configured time.
  • ACM-4801: Qubole provides the details of last cleanup activity for a given cluster through the API. The API provides the reason why the cluster was selected or skipped for auto termination. For more information, see View Cluster Cleanup Activity.

  • ACM-4881: Users can now see the cluster node termination reason in the nodes table on the Cluster Details page.

  • ACM-5096: You can now configure a combination of spot block and spot clusters in auto-scaling nodes while the coordinator and minimum number of nodes can be On-Demand nodes (the Spot block rotation feature), for Presto version 0.193 onwards. Beta, Via Support.

  • ACM-5142: The Spot block rotation feature is now also supported on Spark clusters from Spark version 2.4 onwards. Via Support

    You can now configure a combination of spot block and spot clusters in auto-scaling nodes while the coordinator and minimum number of nodes can be On-Demand nodes.

  • ACM-5194: In Clusters list and Cluster details UI pages, the Node vs Time Graph now displays the Spot Block count too.

  • ACM-5253: Information of user who started or terminated the cluster will now be available in Cluster State API and Cluster Usage Report API.

  • ACM-5368: New cluster API version 2.2 is created. It will support the creation of clusters with different coordinator, minimum and autoscaling configurations. Gradual Rollout

  • ACM-5405: There is a new option for Spot request timeout called auto, which is the default timeout. It will let Qubole decide the timeout at runtime on behalf of the user. This will optimize Spot fulfillment and minimize Spot losses. It is default only for new clusters. Gradual Rollout

  • ACM-5519: Attaching tags to Spot instances has been made more robust by retrying in case of a failure.

  • ACM-5558: During provisioning of nodes on a heterogeneous cluster, if a cheapest spot instance type is not available within the configured timeout, Qubole tries to get the cheapest instance type for the timeout duration, and only if it is not available, it tries to get the other instance types which are not the cheapest ones synchronously using the new ec2 fleet API. For more information, see Additional Permissions.

  • ACM-5649: Qubole allows its users to configure notifications in the Clusters page on the UI or by API for certain specific cluster events.

  • TOOLS-1178: Qubole has removed the deprecated pycrypto package as part of this version. Pycryptodome 3.0, a replacement for pycrypto, is already part of AMIs from R53.

  • TOOLS-1440: The s3cmd version is upgraded from 1.5.2 to 2.0.2.

Bug Fixes

  • ACM-3869: The issue where cluster logs and YARN logs were stored without encryption in the S3 location when SSE-KMS is enabled, has been resolved now.
  • ACM-3984: The issue where a cluster startup used to continue to fail due to usage of stale SSH keys has been resolved.
  • ACM-4163: The issue where the cluster did not start for a very long time but was not cleaned up has been resolved.
  • ACM-4380: The Hadoop reserved memory now has the value 2 GB for 16 GB total memory, 4 GB for 64 GB total memory, 6 GB for 72 GB total memory, 4 GB for 96 GB total memory, 8 GB for 128 GB total memory, and 16 GB for 256 GB total memory.
  • ACM-5059: The issue where the Run Adhoc Scripts API was not working has been resolved.
  • ACM-5121: The issue where an incorrect error message was displayed when none of the Availability Zones (AZs) support the configured instance type has been resolved.
  • ACM-5149: Hadoop Job History Server memory is now configured based on the total memory available on the instance set for the coordinator node. Earlier, the memory was picked based on the instance type and that issue has been resolved.
  • ACM-5403: For heterogeneous cluster configuration, earlier the UI prompted for EBS volumes even for instances with storage and this issue has been resolved.

For a list of bug fixes between versions R56 and R57, see Changelog for api.qubole.com.