Configuring RubiX in Presto and Spark Clusters

To use RubiX in an existing Presto or Spark cluster, perform these steps:

  1. Navigate to the Clusters UI page.

  2. Click Edit against the Presto or Spark cluster on which you want to turn on RubiX.

  3. Go to the cluster’s Advanced Configuration tab.

  4. Select Enable Rubix that is above PRESTO SETTINGS on a Presto cluster. (On a Spark cluster, Enable Rubix is above SPARK SETTINGS.)

    The Enable Rubix checkbox is above PRESTO SETTINGS in a Presto cluster’s Advanced Configuration tab as illustrated below.

    ../../../_images/RubiXcheckbox.png

After you select Enable Rubix, QDS automatically configures RubiX to cache data in the cluster. To turn off RubiX on the cluster, unselect the Enable RubiX checkbox.

Understanding the RubiX Configuration

Qubole has open-sourced RubiX. For more information on configuration, see RubiX Cache Manager.

Achieving Best Query Performance Using RubiX with Presto

Qubole has developed scheduling optimizations in Presto 0.208 and later versions to take advantage of the data cached on worker nodes with RubiX. This blog post provides explanation. It is part of Gradual Rollout.

To enable this capability, perform these steps:

  1. Qubole strongly recommends you to use SSD (Solid State Drives) or NVMe (Non-Volatile Memory Express) disks in the cluster with RubiX for optimum performance. For the list of EC2 instance types provisioned with local NVMe or SSD storage volume, see instance store.

    Using EBS (Elastic Block Storage) disks are discouraged with RubiX as they may cause performance degradation.

  2. Add node-scheduler.optimized-local-scheduling=true under the config.properties header in the Override Presto Configuration under the Advanced Configuration tab of the Presto cluster UI. Click Update after adding it as an override. This configuration is only required with Presto version 0.208 as optimized scheduler is used by default from Presto version 317. Configuring a Presto Cluster describes the Override Presto Configuration.

    For more information on cluster configuration options that are common to all cluster types, see Managing Clusters.

  3. Restart the cluster to apply the configuration.