Strict Mode for Presto Queries
The Qubole Data Service (QDS) platform orchestrates thousands of clusters in the cloud for its customers on a daily basis. From cluster admins’ experience, it is learnt that even with Qubole’s accurate workload-aware autoscaling, there is always a ceiling for the cluster operation budget (so it is necessary to set the maximum number of nodes for the cluster). In general, a laxly-written SQL statement can lead to a lot of resource wastage and in case of resource contention, it can affect other workloads too.
While handling massive data workloads, issues that arise typically include:
Scanning a large amount of data from the entire table
Having a massive CROSS JOIN between two large tables without CONDITIONS
Sorting millions of ROWs without LIMITS or reduced scope
The above mentioned issues not only result in a poor user-experience but also inflate the cloud cost significantly, which can only be found in a hindsight.
To overcome above issues, Qubole provides a feature known as Presto Strict Mode, which, once enabled, restricts users from executing certain queries.
It supports three types of restrictions as mentioned here:
MANDATORY_PARTITION_CONSTRAINT: It restricts queries on a partitioned table to have a PREDICATE on at least one of the partition columns.
Example:
SELECT * FROM <TABLE_NAME>
- This query fails with an error:Table scan on partitioned table: <TABLE_NAME> without filter or constraint
.Whereas
SELECT * FROM <TABLE_NAME> WHERE <predicate on partition>
gets executed successfully.Qubole has fixed the
MANDATORY_PARTITION_CONSTRAINT
rule of Strict Mode in Presto 0.208 to allow queries, which use a predicate expression on any partitioned column while scanning a partitioned table.DISALLOW_CROSS_JOIN: It restricts queries with a CROSS JOIN and thus such queries fail.
Example:
SELECT * FROM <TABLE_1> CROSS JOIN <TABLE_2>
- This query fails with an error:Cross joins are not allowed when strict mode is enabled
.LIMITED_SORT: Queries are allowed to sort only a limited number of output rows.
Example:
SELECT * FROM <TABLE_1> ORDER BY <COL_2>
- This query does fail displaying an error:Sorting without limit clause is not allowed when strict mode is enabled
.
Overtime, Qubole plans to extend this list of restrictions by adding more such constraints based on users’ feedback.
Configuring Presto Strict Mode
To enable Presto Strict Mode at the cluster level, set qubole-strict-mode-restrictions
in etc/config.properties
to add a semicolon separated list of restrictions.
Example:
config.properties:
qubole-strict-mode-restrictions= MANDATORY_PARTITION_CONSTRAINT;LIMITED_SORT
This restriction fails queries that are without partition constraints or the queries doing an unlimited SORT operation.
Values supported for qubole-strict-mode-restrictions
are:
NONE
MANDATORY_PARTITION_CONSTRAINT
DISALLOW_CROSS_JOIN
LIMITED_SORT
You can add any combination of the above values as a semicolon-separated list. But if you set NONE
as a value, then
other restrictions are not applied.
Note
In case, if a query violates the Presto strict mode conditions and if the Presto strict mode is not enabled, then Qubole displays warnings in the specific query’s query info.
To enable Presto Strict Mode at the account level, create a ticket with Qubole Support.