Run Adhoc Scripts on a Cluster

To run an adhoc script, you can use a REST API to execute a script located in S3 on the cluster.

REST API Endpoint Details

  • RequestType: PUT
  • RequestEndpoint: https://api.qubole.com/api/v2.2/clusters/9545/runscript.json/
  • Request Parameters: script:{location of s3 path}

Note: The adhoc script is executed as the root user on the cluster.

Required Role

The following roles can make this API call:

  • A user who is part of the system-admin group.
  • A user invoking this API must be part of a group associated with a role that allows editing an existing cluster’s configuration. See Managing Groups and Managing Roles for more information.

You can use the Curl utility to spawn adhoc scripts. You can use the following command to execute the script.

curl -X PUT -H "X-AUTH-TOKEN: $Auth-token" -H "Content-Type: application/json" -H "Accept: application/json" -d '{"script":"$script"}' "$Host-Name/api/v2.1/clusters/9545/runscript.json/"

Note

The above syntax uses https://api.qubole.com as the endpoint. Qubole provides other endpoints to access QDS that are described in Supported Qubole Endpoints on Different Cloud Providers.

Example:

curl -X PUT -H "X-AUTH-TOKEN: xxXXXXXxx" -H "Content-Type: application/json" -H "Accept: application/json"
-d '{"script":"s3://paid-qubole/dataset/script_location/testScript.sh"}' \
"https://api.qubole.com/api/v2.2/clusters/9545/runscript.json/"

After the script execution (provided as an example), the following message is displayed:

Successfully spawned script on the cluster, Please check the logs for each node at: /media/ephemeral0/logs/{GUID}

Check the log location in the cluster (coordinator/worker) nodes for error log messages in case of any cluster issue.

The log from each machine is uploaded into the S3 directory using the Cron utility. The logs on S3 are at: s3://<DEFLOC>/logs/hadoop/CLUSTER_ID/CLUSTER_INST_ID/.

Where:

  • DEFLOC refers to the default location of an account.
  • CLUSTER_INST_ID is the cluster instance ID. It is the latest folder in the location, s3://DEFLOC/logs/hadoop/CLUSTER_ID/ for a running cluster or the last-terminated cluster.

Note that every execution of this API creates a new GUID, which can be used to distinguish between the differently executing instances on the API in the cluster. The corresponding logs are being located in the GUID directory inside the /media/ephemeral0/logs/ directory.