Create a Schedule on Microsoft Azure
- POST /api/v1.2/scheduler/
This API creates a new schedule to run commands automatically at certain frequency in a specified interval.
Required Role
The following users can make this API call:
Users who belong to the system-user or system-admin group.
Users who belong to a group associated with a role that allows creating a schedule. See Managing Groups and Managing Roles for more information.
Parameters
Note
Parameters marked in bold below are mandatory. Others are optional and have default values.
Parameter |
Description |
---|---|
command_type |
A valid command type supported by Qubole. For example, HiveCommand, HadoopCommand, PigCommand. |
command |
JSON object describing the command. Refer to the Command API for more details. Sub fields can use macros. Refer to the Qubole Scheduler for more details. |
start_time |
Start datetime for the schedule. In the Cron expression, the scheduler calculates the Next Materialized Time (NMT)/Start time considering the current time as the base time and Cron expression passed. Start time is not honored in the Cron expression. |
end_time |
End datetime for the schedule |
retry |
Denotes the number of retries for a job. Valid values of Caution Configuring retries will just do a blind retry of a Presto query. This may lead to data corruption for non-Insert Overwrite Directory (IOD) queries. |
retry_delay |
Denotes the time interval between the retries when a job fails. |
frequency |
Set this option or |
time_unit |
Denotes the time unit for the |
cron_expression |
Set this option or |
name |
A user-defined name for a schedule. If name is not specified, then a system-generated Schedule ID is set as the name. |
label |
Specify a cluster label that identifies the cluster on which the schedule API call must be run. |
macros |
Expressions to evaluate macros. Macros can be used in parameterized commands. Refer to the Macros in Scheduler page for more details. |
no_catch_up |
Set this parameter to |
time_zone |
Timezone of the start and end time of the schedule. Scheduler will understand ZoneInfo identifiers. For example, Asia/Kolkata. For a list of identifiers, check column 3 in List of TZ in databases. Default value is UTC. |
command_timeout |
You can set the command timeout configurable in seconds. Its default value is 129600 seconds (36 hours) and any other value that you set must be less than 36 hours. QDS checks the timeout for a command every 60 seconds. If the timeout is set for 80 seconds, the command gets killed in the next minute that is after 120 seconds. By setting this parameter, you can avoid the command from running for 36 hours. |
time_out |
Unit is minutes. A number that represents a maximum amount of time the schedule should wait for dependencies to be satisfied. |
concurrency |
Specify how many schedule actions can run at a time. Default value is 1. |
Describe dependencies for this schedule. Check the Hive Datasets as Schedule Dependency for more information. |
|
It is an optional parameter that is set to false by default. You can set it to true if you want to be notified through email about instance failure. notification provides more information. |
notification
Parameter |
Description |
---|---|
is_digest |
It is a notification email type that is set to |
notify_failure |
If this option is set to true, you receive schedule failure notifications. |
notify_success |
If this option is set to true, you receive schedule success notifications. |
notification_channels |
It is the Notification Channel id. To know more about how to get Notification Channel id, see Creating Notification Channels. |
dependency_info
Parameter |
Description |
|
---|---|---|
files |
Use this parameter if there is dependency on Azure blob storage files and it has the following sub options. For more information, see Configuring S3/Azure Blob Storage Files Data Dependency. |
|
path |
It is the Azure blob storage path of the dependent file (with data) based on which the schedule runs. |
|
window_start |
It denotes the start day or time. |
|
window_end |
It denotes the end day or time. |
|
hive_tables |
Use this parameter if there is dependency on Hive table data that has partitions. For more information, see Configuring Hive Tables Data Dependency. |
|
schema |
It is the database that contains the partitioned Hive table. |
|
name |
It is the name of the partitioned Hive table. |
|
window_start |
It denotes the start day or time. |
|
window_end |
It denotes the end day or time. |
|
interval |
It denotes the dataset interval and defines how often the data is
generated. Hive Datasets as Schedule Dependency provides more
information. You must also specify the incremental time that can be in
|
|
column |
It denotes the partitioned column name. You must specify the date-time
mask through |
Response
The response contains a JSON object representing the created schedule.
Note
There is a rerun limit for schedule reruns to be processed concurrently at a given point of time. Understanding the Qubole Scheduler Concepts provides more information.
Example 1
Goal: Create a new schedule to run Hive queries.
Use the following query as shown in the example below:
CREATE EXTERNAL TABLE daily_tick_data (
date2 string,
open float,
close float,
high float,
low float,
volume INT,
average FLOAT)
PARTITIONED BY (
stock_exchange STRING,
stock_symbol STRING,
year STRING,
date1 STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION 'wasbn://paid-qubole/default-datasets/stock_ticker';
date1 is the date in the format YYYY-MM-DD
The dataset is available from 2012-07-01.
For this example, let us assume that the dataset is updated everyday at 1AM UTC, and the schedules are scheduled at 2AM UTC, everyday.
The query shown below aggregates the data for every stock symbol, every day.
Command
curl -i -X POST -H "X-AUTH-TOKEN: $AUTH_TOKEN" -H "Accept: application/json" -H "Content-type: application/json" \
-d '{
"command_type":"HiveCommand",
"command": {
"query": "select stock_symbol, max(high), min(low), sum(volume) from daily_tick_data where date1='$formatted_date$' group by stock_symbol"
},
"macros": [
{
"formatted_date": "Qubole_nominal_time.format('YYYY-MM-DD')"
}
],
"notification":{"is_digest": false,
"notification_channels" : [728, 400],
"notify_failure": true, "notify_success": false},
"start_time": "2012-07-01T02:00Z",
"end_time": "2022-07-01T02:00Z",
"frequency": 1,
"time_unit": "days",
"time_out":10,
"command_timeout":36000,
"dependency_info": {}
}' \
"https://api.qubole.com/api/v1.2/scheduler"
Note
The above syntax uses https://api.qubole.com as the endpoint. Qubole provides other endpoints to access QDS that are described in Supported Qubole Endpoints on Different Cloud Providers.
Sample Response
{
"time_out":10,
"status":"RUNNING",
"start_time":"2012-07-01 02:00",
"label":"default",
"concurrency":1,
"frequency":1,
"no_catch_up":false,
"template":"generic",
"command":{
"sample":false,"loader_table_name":null,"md_cmd":null,"script_location":null,"approx_mode":false,"query":"select stock_symbol, max(high), min(low), sum(volume) from daily_tick_data where date1='$formatted_date$' group by stock_symbol","loader_stable":null,"approx_aggregations":false
},
"command_timeout":"36000"
"time_zone":"UTC",
"time_unit":"days",
"end_time":"2022-07-01 02:00",
"user_id":108,
"macros":[{"formatted_date":"Qubole_nominal_time.format('YYYY-MM-DD')"}],
"incremental":{},
"command_type":"HiveCommand",
"name":"3159",
"dependency_info":{},
"id":3159,
"next_materialized_time":null
"template": "generic",
"pool": null,
"label": "default",
"is_digest": false,
"can_notify": false,
"digest_time_hour": 0,
"digest_time_minute": 0,
"email_list": "[email protected]",
"bitmap": 0
}
Note the schedule ID (in this case 3159), which is used in other examples.
export SCHEDID=3159
Example 2
Here is an API sample request that has notification parameters set.
curl -i -X POST -H "X-AUTH-TOKEN: $AUTH_TOKEN" -H "Accept: application/json" -H "Content-type: application/json" \
-d '{
"command_type":"HiveCommand",
"command": {
"query": "select stock_symbol, max(high), min(low), sum(volume) from daily_tick_data where date1='$formatted_date$' group by stock_symbol"
},
"macros": [
{
"formatted_date": "Qubole_nominal_time.format('YYYY-MM-DD')"
}
],
"notification":{"is_digest": true, "digest_time_hour":04, "digest_time_minute":30,
"notification_channels" : [728, 400],
"notify_failure": true, "notify_success": false}`,
"start_time": "2012-07-01T02:00Z",
"end_time": "2022-07-01T02:00Z",
"frequency": 1,
"time_unit": "days",
"time_out":10,
"dependency_info": {"wait_for_wasb_files": [{file1: {start_time:}, {end_time:}, {file2: }]
}' \
"https://api.qubole.com/api/v1.2/scheduler"
Note
The above syntax uses https://api.qubole.com as the endpoint. Qubole provides other endpoints to access QDS that are described in Supported Qubole Endpoints on Different Cloud Providers.
Example 3
Here is an API sample request that has dependency on files on Azure blob storage.
curl -i -X POST -H "X-AUTH-TOKEN: $AUTH_TOKEN" -H "Accept: application/json" -H "Content-type: application/json" \
-d '{
"command_type":"HiveCommand",
"command": {
"query": "select stock_symbol, max(high), min(low), sum(volume) from daily_tick_data where date1='$formatted_date$' group by stock_symbol"
},
"macros": [
{
"formatted_date": "Qubole_nominal_time.format('YYYY-MM-DD')"
}
],
"notification":{"is_digest": true, "digest_time_hour":04, "digest_time_minute":30,
"notification_channels" : [728, 400],
"notify_failure": true, "notify_success": false}`,
"start_time": "2012-07-01T02:00Z",
"end_time": "2022-07-01T02:00Z",
"frequency": 1,
"time_unit": "days",
"time_out":10,
"dependency_info": {
"files": [
{
"path" : "wasb://<your wasb bucket>/data/data1_30days/170614/",
"window_start": -29,
"window_end": 0
}
]
}
}' \
"https://api.qubole.com/api/v1.2/scheduler"
Note
The above syntax uses https://api.qubole.com as the endpoint. Qubole provides other endpoints to access QDS that are described in Supported Qubole Endpoints on Different Cloud Providers.
Example 4
Here is an API sample request that has dependency on partitioned columns of a Hive table.
curl -i -X POST -H "X-AUTH-TOKEN: $AUTH_TOKEN" -H "Accept: application/json" -H "Content-type: application/json" \
-d '{
"command_type":"HiveCommand",
"command": {
"query": "select stock_symbol, max(high), min(low), sum(volume) from daily_tick_data where date1='$formatted_date$' group by stock_symbol"
},
"macros": [
{
"formatted_date": "Qubole_nominal_time.format('YYYY-MM-DD')"
}
],
"notification":{"is_digest": true, "digest_time_hour":04, "digest_time_minute":30,
"notification_channels" : [728, 400],
"notify_failure": true, "notify_success": false}`,
"start_time": "2018-02-12 00:00",
"end_time": "2021-02-12 00:00",
"frequency": 12,
"time_unit": “months”,
"time_out":10,
"dependency_info": {
"hive_tables":[
{"schema":"daily_tick_data","name":"daily_cluster_nodes","window_start":"-1","window_end":"-1","interval":{"days":"1"},"columns":{"dt":"%Y-%d","source”:[“”]}}]
}' \
"https://api.qubole.com/api/v1.2/scheduler"
Example 5
Here is an API sample request to schedule a workflow command.
curl -i -X POST -H "X-AUTH-TOKEN: $AUTH_TOKEN" -H "Accept: application/json" -H "Content-type: application/json" \
-d '{
"command_type": "CompositeCommand",
"command": {
"sub_commands":
[{
"query": "select stock_symbol, max(high), min(low), sum(volume) from daily_tick_data where date1='$formatted_date$' group by stock_symbol",
"command_type": "HiveCommand"
},
{
"query": "select stock_symbol, max(high), min(low), sum(volume) from daily_tick_data where date2='$formatted_date$' group by stock_symbol",
"command_type": "HiveCommand"
}
]
},
"start_time": "2012-07-01T02:00Z",
"end_time": "2022-07-01T02:00Z",
"frequency": 1,
"time_unit": "days",
"time_out": 10,
"dependency_info": {}
}' \ "https://api.qubole.com/api/v1.2/scheduler"
Note
The above syntax uses https://api.qubole.com as the endpoint. Qubole provides other endpoints to access QDS that are described in Supported Qubole Endpoints on Different Cloud Providers.