Configuring RubiX in Presto and Spark Clusters

To use RubiX, select Enable Rubix for a Presto or Spark cluster in the QDS UI; QDS then automatically configures RubiX to cache data in the cluster.

Understanding the RubiX Configuration

Qubole has open-sourced RubiX. For more information on configuration, see RubiX Cache Manager.

Achieving Best Query Performance Using RubiX with Presto

QDS can schedule tasks to take the best advantage of where RubiX caches the data. This blog post provides explanation. This capability is supported in Presto 0.208. It is part of Gradual Rollout.

To enable this capability, perform these steps:

  1. Enable RubiX on the Presto-0.208 cluster.

    Note

    Qubole strongly recommends you to use SSD (Solid State Drives) or NVMe (Non-Volatile Memory Express) disks in the cluster with RubiX for optimum performance. For the list of EC2 instance types provisioned with local NVMe or SSD storage volume, see instance store.

    Using EBS (Elastic Block Storage) disks are discouraged with RubiX as they may cause performance degradation.

  2. Add node-scheduler.optimized-local-scheduling=true under the config.properties header in the Override Presto Configuration under the Advanced Configuration tab of the Presto cluster UI. Click Update after adding it as an override. Configuring a Presto Cluster describes the Override Presto Configuration.

    For more information on cluster configuration options that are common to all cluster types, see Managing Clusters.

  3. Restart the cluster to apply the configuration.