Using the Qubole Presto Server Bootstrap

The Qubole Presto Server Bootstrap is an alternative to the Node Bootstrap Script to install external jars such as presto-udfs before the Presto Server is started. The Presto server comes up before the node bootstrap process is completed. As such, installing external jars for example, Presto UDFs through the node bootstrap requires explicit restart of the Presto daemons. This can get problematic because the server may have already started running a task and thus restarting Presto daemons can cause query failures. Hence, Qubole Presto Server Bootstrap is better suited for such changes.

The Qubole Presto Server Bootstrap is only supported in Presto 0.180 and later versions.

Warning

Use the Qubole Presto Server Bootstrap only if you want to execute some script before starting the Presto server. Any script that is part of this bootstrap increases the time taken to bring up the Presto server. Hence, the time taken by the Presto server to accept a query also increases. If there is no dependency in the current cluster node bootstrap script which requires restart of the Presto daemon to pick changes, then it is recommended to use cluster’s node bootstrap only.

There are two ways to define the Qubole Presto Server Bootstrap:

  • bootstrap.properties - You can add the bootstrap script in it.

  • bootstrap-file-path - It is the location of the Presto Server Bootstrap file in the cloud object storage that contains the bootstrap. Specifying a bootstrap-file-path is recommended when the script is too long.

To configure Qubole Presto Server Bootstrap for a given cluster, follow any one the below steps:

  • Through the cluster UI, add it in Advanced Configuration > PRESTO SETTINGS > Override Presto Configuration.

  • Through the REST API, add it using the custom_config parameter under presto_settings. For more information, see presto_settings.

Caution

Qubole Presto Server Bootstrap eliminates the need to restart the Presto daemons as such. Ensure that any explicit commands to restart or exit the Presto server are not included in the bootstrap script. The Presto server is brought up only after the Server Bootstrap is successfully executed. So it is important to verify that there are no errors in the bootstrap script. In addition, if any script or part of the script is migrated/copied from the existing cluster node bootstrap, then remove that bootstrap script or modify it appropriately to avoid the same script from running twice.

Example of a Bootstrap Script Specified in the bootstrap.properties

bootstrap.properties:
mkdir /usr/lib/presto/plugin/udfs
hadoop dfs -get <scheme>bucket/udfs_custom.jar /usr/lib/presto/plugin/udfs/

Example of Specifying a Qubole Presto Server Bootstrap Location

In the following examples, <scheme> is the Cloud-specific URI scheme: s3:// for AWS; wasb[s]://, adl://, or abfs[s]:// for Azure.

bootstrap-file-path:
<scheme>bucket/existing-node-bootstrap-file.sh

The existing-node-bootstrap-file.sh can contain the script that is mentioned in Example of a Bootstrap Script Specified in the bootstrap.properties that is you can view the content of the existing-node-bootstrap-file.sh as illustrated below.

$ hadoop fs -cat <scheme>my-bucket/boostraps/existing-node-bootstrap-file.sh
mkdir /usr/lib/presto/plugin/udfs
hadoop dfs -get <scheme>bucket/udfs_custom.jar /usr/lib/presto/plugin/udfs/
$

Using Presto UDFs as a Bootstrap Script

Presto on Qubole provides UDFs as external jars, presto-udfs. You can add them through a Presto Server bootstrap under Advanced Configuration > PRESTO SETTINGS of the Presto cluster UI. You can pick one of the following UDFs (based on Presto version) and pass them as overrides in the Override Presto Configuration text box:

Note

The Presto jars below are in the AWS S3 storage location.

  • UDFs for Presto version 0.208

    bootstrap.properties:
    mkdir /usr/lib/presto/plugin/udfs
    hadoop dfs -get s3://paid-qubole/presto-udfs/udfs-2.0.3.jar /usr/lib/presto/plugin/udfs/
    
  • UDFs for Presto version 317

    bootstrap.properties:
    mkdir /usr/lib/presto/plugin/udfs
    hadoop dfs -get s3://paid-qubole/presto-udfs/udfs-3.0.0.jar /usr/lib/presto/plugin/udfs/
    

Presto Server Bootstrap Logs

An ec2-user can see the Presto Server Bootstrap logs in /media/ephemeral0/presto/var/log/bootstrap.log. The QDS account admin can see the Presto Server Bootstrap logs by logging into the cluster when the Customer Public SSH Key is configured in the cluster’s security settings. For more information, see Advanced Configuration: Modifying Security Settings (AWS).

For information on how to log into the clusters, see: