Submit a Pig Command

POST /api/v1.2/commands/

This API is for submitting a pig command.

Note

You can configure the Pig version on an Hadoop 2 (Hive) cluster. Pig 0.11 is the default version. Pig 0.15 and Pig 0.17 (beta) are the other supported versions. You can also choose between MapReduce and Tez as the execution engine when you set the Pig 0.17 (beta) version. Pig 0.17 (beta) is only supported with Hive 1.2.0.

Required Role

The following users can make this API call:

  • Users who belong to the system-user or system-admin group.

  • Users who belong to a group associated with a role that allows submitting a command. See Managing Groups and Managing Roles for more information.

Parameters

Note

Parameters marked in bold below are mandatory. Others are optional and have default values.

Parameter

Description

script_location

S3 location of the Pig script. The request must contain either latin_statements or script_location.

parameters

JSON hash of Pig params.

latin_statements

PigLatin statements to execute. The request must contain either latin_statements or script_location.

command_type

PigCommand

label

Specify the cluster label on which this command is to be run.

retry

Denotes the number of retries for a job. Valid values of retry are 1, 2, and 3.

retry_delay

Denotes the time interval between the retries when a job fails. The unit of measurement is minutes.

name

Add a name to the command that is useful while filtering commands from the command history. It does not accept & (ampersand), < (lesser than), > (greater than), “ (double quotes), and ‘ (single quote) special characters, and HTML tags as well. It can contain a maximum of 255 characters.

pool

Use this parameter to specify the Fairscheduler pool name for the command to use.

tags

Add a tag to a command so that it is easily identifiable and searchable from the commands list in the Commands History. Add a tag as a filter value while searching commands. It can contain a maximum of 255 characters. A comma-separated list of tags can be associated with a single command. While adding a tag value, enclose it in square brackets. For example, {"tags":["<tag-value>"]}.

macros

Denotes the macros that are valid assignment statements containing the variables and its expression as: macros: [{"<variable>":<variable-expression>}, {..}]. You can add more than one variable. For more information, see Macros.

timeout

It is a timeout for command execution that you can set in seconds. Its default value is 129600 seconds (36 hours). QDS checks the timeout for a command every 60 seconds. If the timeout is set for 80 seconds, the command gets killed in the next minute that is after 120 seconds. By setting this parameter, you can avoid the command from running for 36 hours.

Response

A JSON object representing the newly created command.

Examples

Files Used In The Examples

These files are cloned from The Apache Pig Tutorial

Example Name

Location

Dataset

s3://paid-qubole/PigAPIDemo/data/excite-small.log

Parametrized Pig Script

s3://paid-qubole/PigAPIDemo/scripts/script1-hadoop-parametrized.pig

Java UDF jar

s3://paid-qubole/PigAPIDemo/jars/tutorial.jar

Pig Script

s3://paid-qubole/PigAPIDemo/scripts/script1-hadoop-s3-small.pig

Sample Request – Non-Parametrized script

curl -i -X POST -H "X-AUTH-TOKEN: $AUTH_TOKEN" -H "Content-Type: application/json" \
-H "Accept: application/json" \
-d '{
"script_location":"s3://paid-qubole/PigAPIDemo/scripts/script1-hadoop-s3-small.pig",
"command_type": "PigCommand"}' "https://api.qubole.com/api/v1.2/commands"

Note

The above syntax uses https://api.qubole.com as the endpoint. Qubole provides other endpoints to access QDS that are described in Supported Qubole Endpoints on Different Cloud Providers.

Sample Request – Parametrized script

export $output_location=<your s3 output location>

curl -i -X POST -H "X-AUTH-TOKEN: $AUTH_TOKEN" -H "Content-Type: application/json" \
-H "Accept: application/json" \
-d '{
  "command_type": "PigCommand",
  "parameters": {
    "output": "s3://paid-qubole/PigAPIDemo/output",
    "input": "s3://paid-qubole/PigAPIDemo/data/excite-small.log",
    "udf_jar": "s3://paid-qubole/PigAPIDemo/jars/tutorial.jar"
  },
  "script_location": "s3://paid-qubole/PigAPIDemo/scripts/script1-hadoop-parametrized.pig"
}' "https://api.qubole.com/api/v1.2/commands"

Note

The above syntax uses https://api.qubole.com as the endpoint. Qubole provides other endpoints to access QDS that are described in Supported Qubole Endpoints on Different Cloud Providers.

Sample Response – Non-Parametrized script

HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8

 {
   "timeout": null,
   "id": 283032,
   "path": "/tmp/2014-07-14/235/283032",
   "end_time": null,
   "resolved_macros": null,
   "start_time": null,
   "name": null,
   "label": null,
   "meta_data": {
     "results_resource": "commands/283032/results",
     "logs_resource": "commands/283032/logs"
   },
   "can_notify": false,
   "nominal_time": null,
   "command": {
     "latin_statements": null,
     "script_location": "s3://paid-qubole/PigAPIDemo/scripts/script1-hadoop-s3-small.pig",
     "parameters": null
   },
   "command_type": "PigCommand",
   "pool": null,
   "user_id": 846,
   "num_result_dir": -1,
   "status": "waiting",
   "pid": null,
   "qlog": null,
   "created_at": "2014-07-14T06:53:08Z",
   "sequence_id": null,
   "submit_time": 1405320788,
   "progress": 0,
   "template": "generic",
   "qbol_session_id": 38395
 }

Sample Request – Using latin_statements

For small scripts, it’s usually convenient to inline the script:

curl -i -X POST -H "X-AUTH-TOKEN: $AUTH_TOKEN" -H "Content-Type: application/json" \
-H "Accept: application/json" \
-d '{"latin_statements":"A = LOAD \"s3://paid-qubole/PigAPIDemo/data/excite-small.log\"; dump A;",\
"command_type":"PigCommand"}' "https://api.qubole.com/api/v1.2/commands"

Note

The above syntax uses https://api.qubole.com as the endpoint. Qubole provides other endpoints to access QDS that are described in Supported Qubole Endpoints on Different Cloud Providers.

Even with short scripts, latin_statements in curl request can get difficult to construct due to escaping issues. Instead, create a local file – say “script.pig” and copy the script there and try this.

Sample Request – Providing a local script file in the request.

curl -i -X POST -H "X-AUTH-TOKEN: $AUTH_TOKEN" -H "Accept: application/json" \
--data-urlencode [email protected] -d command_type=PigCommand \
"https://api.qubole.com/api/v1.2/commands"

Note

The above syntax uses https://api.qubole.com as the endpoint. Qubole provides other endpoints to access QDS that are described in Supported Qubole Endpoints on Different Cloud Providers.

Note that all input/output data and the UDF jars must be present in S3 bucket.