Strict Mode for Presto Queries

The Qubole Data Service (QDS) platform orchestrates thousands of clusters in the cloud for its customers on a daily basis. From cluster admins’ experience, it is learnt that even with Qubole’s accurate workload-aware autoscaling, there is always a ceiling for the cluster operation budget (so it is necessary to set the maximum number of nodes for the cluster). In general, a laxly-written SQL statement can lead to a lot of resource wastage and in case of resource contention, it can affect other workloads too.

While handling massive data workloads, issues that arise typically include:

  • Scanning a large amount of data from the entire table

  • Having a massive CROSS JOIN between two large tables without CONDITIONS

  • Sorting millions of ROWs without LIMITS or reduced scope

The above mentioned issues not only result in a poor user-experience but also inflate the cloud cost significantly, which can only be found in a hindsight.

To overcome above issues, Qubole provides a feature known as Presto Strict Mode, which, once enabled, restricts users from executing certain queries.

It supports three types of restrictions as mentioned here:

  • MANDATORY_PARTITION_CONSTRAINT: It restricts queries on a partitioned table to have a PREDICATE on at least one of the partition columns.

    Example:

    SELECT * FROM <TABLE_NAME> - This query fails with an error:

    Table scan on partitioned table: <TABLE_NAME> without filter or constraint.

    Whereas SELECT * FROM <TABLE_NAME> WHERE <predicate on partition> gets executed successfully.

    Qubole has fixed the MANDATORY_PARTITION_CONSTRAINT rule of Strict Mode in Presto 0.208 to allow queries, which use a predicate expression on any partitioned column while scanning a partitioned table.

  • DISALLOW_CROSS_JOIN: It restricts queries with a CROSS JOIN and thus such queries fail.

    Example:

    SELECT * FROM <TABLE_1> CROSS JOIN <TABLE_2> - This query fails with an error:

    Cross joins are not allowed when strict mode is enabled.

  • LIMITED_SORT: Queries are allowed to sort only a limited number of output rows.

    Example:

    SELECT * FROM <TABLE_1> ORDER BY <COL_2> - This query does fail displaying an error:

    Sorting without limit clause is not allowed when strict mode is enabled.

Overtime, Qubole plans to extend this list of restrictions by adding more such constraints based on users’ feedback.

Configuring Presto Strict Mode

To enable Presto Strict Mode at the cluster level, set qubole-strict-mode-restrictions in etc/config.properties to add a semicolon separated list of restrictions.

Example:

config.properties:
qubole-strict-mode-restrictions= MANDATORY_PARTITION_CONSTRAINT;LIMITED_SORT

This restriction fails queries that are without partition constraints or the queries doing an unlimited SORT operation.

Values supported for qubole-strict-mode-restrictions are:

  • NONE

  • MANDATORY_PARTITION_CONSTRAINT

  • DISALLOW_CROSS_JOIN

  • LIMITED_SORT

You can add any combination of the above values as a semicolon-separated list. But if you set NONE as a value, then other restrictions are not applied.

Note

In case, if a query violates the Presto strict mode conditions and if the Presto strict mode is not enabled, then Qubole displays warnings in the specific query’s query info.

To enable Presto Strict Mode at the account level, create a ticket with Qubole Support.