Configuring AWS Glue Data Catalog as a Metastore for Hive

Qubole supports AWS Glue Data Catalog as an external Hive metastore. When it is used as a metastore, the metadata is read and written into the AWS Glue Data Catalog and not the default Hive metastore. AWS Glue is only supported on Hive 2.3, Presto 0.208 and Presto 317 (beta), and Spark 2.4.0 versions.

Note

This feature is not available by default. Create a ticket with Qubole Support to enable it on the Qubole account.

Prerequisites

  • You need an AWS IAM Role authentication to access the Glue Data catalog. Setting-up the Qubole Data Service describes how to set up a IAM Role-based QDS account.
  • Contact Qubole Support to enable it on the QDS account.

Assuming you have an IAM Role-based Qubole account and got the AWS Glue service enabled on the account, you can follow these steps to configure and use AWS Glue as a metastore:

Step 1: Add the Policy to Access a Glue Data Catalog to the existing IAM Role

1.In the AWS console, go to the IAM service.

  1. Click the Roles tab in the left sidebar.
  2. In the role list, click the role that you have used to authenticate on the Qubole Account settings.
  3. Add an inline policy to the Glue Catalog by following these steps:
  1. In the Permissions tab, click + Add Inline policy.

  2. Click the JSON tab.

    Add the following IAM permissions to create an AWS Glue database.

    {
      "Version": "2012-10-17",
      "Statement": [
       {
         "Effect" : "Allow",
         "Action": [
               "glue:UpdateDatabase",
               "glue:DeleteDatabase",
               "glue:GetDatabase",
               "glue:GetDatabases",
               "glue:CreateTable",
               "glue:UpdateTable",
               "glue:DeleteTable",
               "glue:GetTable",
               "glue:GetTables",
               "glue:GetTableVersions",
               "glue:CreatePartition",
               "glue:BatchCreatePartition",
               "glue:UpdatePartition",
               "glue:DeletePartition",
               "glue:BatchDeletePartition",
               "glue:GetPartition",
               "glue:GetPartitions",
               "glue:BatchGetPartition",
               "glue:CreateUserDefinedFunction",
               "glue:UpdateUserDefinedFunction",
               "glue:DeleteUserDefinedFunction",
               "glue:GetUserDefinedFunction",
               "glue:GetUserDefinedFunctions"
                ],
         "Resource": [
                 "*"
                ]
       }
       ]
    }
    

Step 2: Ensure that the IAM Role Name is in the EC2 Policy

Verify the IAM Role name is in the EC2 policy, which would likely be as it is used to authenticate AWS services.

Step 3: Ensure that the IAM Role is Added on the QDS

Verify the IAM Role is added on the QDS platform. Navigate to QDS UI > Control Panel > Account Settings. In Access Type, you can see the IAM Role details.

Step 4: Launch a Cluster with the IAM Role

After completing the steps 1 through 3, you can launch a cluster and use the AWS Glue Data Catalog as a metastore.

Note

This FAQ provides details about the exception when the AWS account does not use AWS Glue Data Catalog.

Limitations

Here are a few limitations on using AWS Glue as a metastore:

  • AWS Glue is only supported on Hive 2.3, Presto 0.208, and Spark 2.4.0 versions. These are the limitations at the Qubole end:
    • AWS Glue Data Catalog cannot be a Hive metastore when you run Hive on QDS servers.
    • Qubole Explore UI does not display the AWS Glue Data Catalog in the Explore UI page. Hive metadata APIs are not supported when AWS Glue Data Catalog is used as an external Hive metastore.
    • A Hive query that is run using a Hive version other than Hive 2.3 version returns the Hive metastore details.
    • Exporting data to the AWS Glue metastore and importing data from the AWS Glue metastore are not supported.
  • The authentication type is only IAM Roles.
  • For limitations of using AWS Glue as a metastore for Hive, refer to considerations when using AWS Glue.