Create a Cluster on Microsoft Azure
- POST /api/v2/clusters/
Use this API to create a new cluster when you are using Qubole on the Azure cloud. You create a cluster for a workload that has to run in parallel with your pre-existing workloads.
You might want to run workloads across different geographical locations or there could be other reasons for creating a new cluster.
Required Role
The following users can make this API call:
- Users who belong to the system-user or system-admin group. 
- Users who belong to a group associated with a role that allows creating a cluster. See Managing Groups and Managing Roles for more information. 
Parameters
Note
Parameters marked in bold below are mandatory. Others are optional and have default values.
| Parameter | Description | 
|---|---|
| A list of labels that identify the cluster. At least one label must be provided when creating a cluster. | |
| It contains the configurations of a cluster. | |
| It contains the configurations of the type of clusters | |
| It contains the cluster monitoring configuration. | |
| It contains the security settings for the cluster. | 
cloud_config
| Parameter | Description | 
|---|---|
| provider | It defines the cloud provider. Set  | 
| It defines the Azure account compute credentials for the cluster. | |
| location | It is used to set the geographical Azure location.  | 
| It defines the network configuration for the cluster. | |
| It defines the Azure account storage credentials for the cluster. | 
compute_config
| Parameter | Description | 
|---|---|
| compute_validated | It denotes if the credentials are validated or not. | 
| use_account_compute_creds | It is to use account compute credentials. By default, it is set to  | 
| compute_client_id | The client ID of the Azure active directory application which has the permissions over the subscription. It is required when  | 
| compute_client_secret | The client secret of the Azure active directory application. It is required when  | 
| compute_tenant_id | The tenant_id of the Azure Active Directory. It is required when  | 
| compute_subscription_id | The subscription id of the azure account where you want to create the compute resources. It is required when  | 
network_config
| Parameter | Description | 
|---|---|
| vnet_name | Set the virtual network. | 
| subnet_name | Set the subnet | 
| vnet_resource_group_name | Set the resource group of your virtual network. | 
| bastion_node | It is the public IP address of bastion node to access private subnets if required. | 
| persistent_security_group_name | It is the network security group name on the Azure account. | 
| persistent_security_group_resource_group_name | It is the resource group of the network security group of the Azure account. | 
storage_config
| Parameter | Description | 
|---|---|
| disk_storage_account_name | Set your Azure storage account. You must only configure this parameter or  | 
| disk_storage_account_resource_group_name | Set your Azure disk storage account resource group name. | 
| managed_disk_account_type | You can set it if you do not want to configure disk storage account details. Its accepted values are  | 
| data_disk_count | It is the number of reserved disks to be attached to each cluster node; so, for example, choosing a Data Disk Count of 2 in a four-node cluster will provision eight disks in all. | 
| data_disk_size | It is used to set the Data Disk Size in gigabytes (GB). The default size is 256 GB. | 
cluster_info
| Parameter | Description | 
|---|---|
| label | A cluster can have one or more labels separated by a commas. You can make a cluster the default cluster by including the label “default”. | 
| master_instance_type | To change the coordinator node type from the default (Standard_A5), select a different type from the drop-down list. | 
| slave_instance_type | To change the worker node type from the default (Standard_A5), select a different type from the drop-down list. | 
| min_nodes | Enter the minimum number of worker nodes if you want to change it (the default is 1). | 
| max_nodes | Enter the maximum number of worker nodes if you want to change it (the default is 1). | 
| node_bootstrap | You can append the name of a node bootstrap script to the default path. | 
| disallow_cluster_termination | Set it to  | 
| custom_tags | It is an optional parameter. Its value contains a <tag> and a <value>. | 
| rootdisk | Use this parameter to configure the root volume of cluster instances. You must configure its size within this parameter. The supported range for the root volume size is  | 
engine_config
| Parameter | Description | 
|---|---|
| flavour | It denotes the type of cluster. The supported values are:  | 
| It provides a list of Airflow-specific configurable sub options. | |
| To change the coordinator node type from the default (Standard_A5), select a different type from the drop-down list. | |
| To change the worker node type from the default (Standard_A5), select a different type from the drop-down list. | |
| Enter the minimum number of worker nodes if you want to change it (the default is 1). | |
| It provides a list of Hiveserver2 specific configurable sub options. | 
hadoop_settings
| Parameter | Description | 
|---|---|
| custom_hadoop_config | The custom Hadoop configuration overrides. The default value is blank. | 
| The fair scheduler configuration options. | 
fairscheduler_settings
| Parameter | Description | 
|---|---|
| fairscheduler_config_xml | The XML string, with custom configuration parameters, for the fair scheduler. The default value is blank. | 
| default_pool | The default pool for the fair scheduler. The default value is blank. | 
presto_settings
| Parameter | Description | 
|---|---|
| presto_version | Specify the Presto version to be used on the cluster. The default version is  | 
| custom_presto_config | Specifies if the custom Presto configuration overrides. The default value is blank. | 
spark_settings
| Parameter | Description | 
|---|---|
| zeppelin_interpreter_mode | The default mode is  | 
| custom_spark_config | Specify the custom Spark configuration overrides. The default value is blank. | 
| spark_version | It is the Spark version used on the cluster. The default version is  | 
monitoring
| Parameter | Description | 
|---|---|
| enable_ganglia_monitoring | Enable Ganglia monitoring for the cluster. The default value is,  | 
security_settings
| Parameter | Description | 
|---|---|
| ssh_public_key | SSH key to use to login to the instances. The default value is none. (Note: This parameter is not visible to non-admin users.) The SSH key must be in the OpenSSH format and not in the PEM/PKCS format. | 
airflow_settings
The following table contains engine_config for an Airflow cluster.
Note
Parameters marked in bold below are mandatory. Others are optional and have default values.
| Parameter | Description | 
|---|---|
| dbtap_id | ID of the data store inside QDS. Set it to  | 
| fernet_key | Encryption key for sensitive information inside airflow database. For example, user passwords and connections. It must be a 32 url-safe base64 encoded bytes. | 
| type | Engine type. It is  | 
| version | The default version is 1.10.0 (stable version). The other supported stable versions are 1.8.2 and 1.10.2. All the Airflow versions are compatible with MySQL 5.6 or higher. | 
| airflow_python_version | Supported versions are 3.5 (supported using package management) and 2.7. To know more, see Configuring an Airflow Cluster. | 
| overrides | Airflow configuration to override the default settings. Use the following syntax for overrides: 
 | 
engine_config to enable an HiveServer2 on a Hadoop 2 (Hive) Cluster
You can enable HiveServer2 on a Hadoop 2 (Hive) cluster. The following table contains engine_config for enabling
HiveServer2 on a cluster. Other settings of HiveServer2 are configured under the hive_settings parameter. For more
information on HiveServer2 in QDS, see Configuring a HiveServer2 Cluster.
This is an additional setting in the Hadoop 2 request API for enabling HiveServer2. Other settings that are explained in Parameters must be added.
Note
Parameters marked in bold below are mandatory. Others are optional and have default values.
| Parameter | Description | |
|---|---|---|
| hive_settings | is_hs2 | Set it to  | 
| hive_version | It is the Hive version that supports HiveServer2. The values are  | |
| hive.qubole.metadata.cache | This parameter enables Hive metadata caching that reduces split computation time for ORC
files. This feature is not available by default. Create a ticket with
Qubole Support for using this feature on the QDS
account. Set it to  | |
| hs2_thrift_port | It is used to set HiveServer2 port. The default port is  | |
| overrides | Hive configuration to override the default settings. | |
| flavour | It denotes the cluster type. It is  | |
Request API Syntax
If use_account_compute_creds is set to false, then it is not required to set compute credentials.
curl -X POST -H "X-AUTH-TOKEN:$X_AUTH_TOKEN" -H "Content-Type:application/json" -H "Accept: application/json" \
-d '{
     "cloud_config" : {
       "provider" : "azure",
       "compute_config" : {
                     "compute_validated": "<default is ``false``/set it to ``true``>",
                     "use_account_compute_creds": false,
                     "compute_client_id": "<your client ID>",
                     "compute_client_secret": "<your client secret key>",
                     "compute_tenant_id": "<your tenant ID>",
                     "compute_subscription_id": "<your subscription ID>"
               },
               "location": {
                     "location": "centralus"
                  },
               "network_config" : {
                     "vnet_name" : "<vpc name>",
                         "subnet_name": "<subnet name>",
                         "vnet_resource_group_name": "<vnet resource group name>",
                         "bastion_node_public_dns": "<bastion node public dns>",
                        "persistent_security_groups": "<persistent security group>",
                        "master_elastic_ip": ""
               },
               "storage_config" : {
                     "disk_storage_account_name": "<Disk storage account name>",
                     "disk_storage_account_resource_group_name": "<Disk account resource group name>",
         //You can either configure "disk_storage_account_name" or "managed_disk_account_type"
         "managed_disk_account_type":"<standard_lrs/premium_lrs>",
         "data_disk_count":"<Count>",
         "data_disk_size":"<Disk Size>"
         }
         },
     "cluster_info": {
          "master_instance_type": "Standard_A6",
          "slave_instance_type": "Standard_A6",
          "label": ["azure1"],
          "min_nodes": 1,
          "max_nodes": 2,
          "cluster_name": "Azure1",
          "node_bootstrap": "node_bootstrap.sh",
          },
     "engine_config": {
          "flavour": "hadoop2",
          "hadoop_settings": {
             "custom_hadoop_config": <default is null>,
             "fairscheduler_settings": {
                "default_pool": <default is null>
             }
          }
     },
     "monitoring": {
            "ganglia": <default is false/set it to true>,
           }
     }' \ "https://azure.qubole.com/api/v2/clusters"
Sample API Request
curl -X POST -H "X-AUTH-TOKEN:$X_AUTH_TOKEN" -H "Content-Type:application/json" -H "Accept: application/json"
-d '{
     "cloud_config" : {
       "provider" : "azure",
       "compute_config" : {
                     "compute_validated": False,
                     "use_account_compute_creds": False,
                     "compute_client_id": "<your client ID>",
                     "compute_client_secret": "<your client secret key>",
                     "compute_tenant_id": "<your tenant ID>",
                     "compute_subscription_id": "<your subscription ID>"
               },
       "location": {
                     "location": "centralus"
               },
       "network_config" : {
                     "vnet_name" : "<vpc name>",
                         "subnet_name": "<subnet name>",
                         "vnet_resource_group_name": "<vnet resource group name>",
                         "persistent_security_groups": "<persistent security group>"
               },
       "storage_config" : {
                     "storage_access_key": "<your storage access key>",
                     "storage_account_name": "<your storage account name>",
                     "disk_storage_account_name": "<your disk storage account name>",
                     "disk_storage_account_resource_group_name": "<your disk storage account resource group name>"
         "data_disk_count":4,
         "data_disk_size":300 GB
               }
     },
     "cluster_info": {
          "master_instance_type": "Standard_A6",
          "slave_instance_type": "Standard_A6",
          "label": ["azure1"],
          "min_nodes": 1,
          "max_nodes": 2,
          "cluster_name": "Azure1",
          "node_bootstrap": "node_bootstrap.sh"
          },
     "engine_config": {
          "flavour": "hadoop2",
            "hadoop_settings": {
                "custom_hadoop_config": "mapred.tasktracker.map.tasks.maximum=3"
            }
          "hive_settings":{
            "is_hs2":true,
            "hive_version":"2.3",
            "overrides":"hive.server2.a=dummy",
            "is_metadata_cache_enabled":false,
            "execution_engine":"tez",
            "hs2_thrift_port":10003
            }
           },
     "monitoring": {
            "ganglia": true,
           }
     }' "https://azure.qubole.com/api/v2/clusters"