Creating Dual IAM Roles for QDS

Qubole supports creating two IAM roles as part of IAM role authentication for a single Qubole account. With this approach, you can restrict access to data to only those commands and queries that have to operate on this data and deny all action access to all Qubole users in this account.

Note

Choosing between a Cross-account IAM Role and Dual IAM Roles highlights the difference between the cross-account IAM Role-based authentication and the dual IAM Roles-based authentication.

The function of two IAM roles are described below.

Cross Account IAM Role

The cross-account IAM role is a cross-AWS-account role created at the account level as described in Creating a Cross-account IAM Role for QDS. Setting-up the Qubole Data Service provides a detailed step-by-step procedure on how to go about creating IAM roles on the AWS end as well as the Qubole end.

A cross-account IAM role has the following permissions:

  • EC2 permissions
  • S3 permissions over the default location
  • No S3 permissions over Input S3 buckets
  • Qubole keys assume the cross-account role
  • iam:GetInstanceProfile and iam:PassRole permissions

Dual IAM Role

The Dual IAM Role is configured at the QDS cluster level to interact with data sources. Qubole cannot assume the second role to access S3 data and hence, it provides much more data security. This within-an-account role has different cross- account policy and S3 policy for accessing data. It must have the following permissions:

  • EC2 permissions
  • S3 permissions over the default location
  • S3 permissions over Input S3 buckets.
  • Cross-account IAM role and Qubole cannot assume this role but only start EC2 instances with this role
  • iam:GetInstanceProfile and iam:PassRole permissions.

Configuring the Cross Account IAM Role

Managing Roles and Setting-up the Qubole Data Service/Creating a Cross-account IAM Role for QDS to configure a cross-account IAM role that is a first IAM role.

Note

In a Dual IAM Role setup, you must allow the cross account IAM Role to only access the default location and not the AWS S3 data buckets (Bucket2). For simplicity, S3 data buckets are referred to as Bucket2 and the default location is referred to as Bucket1.

Important

Ensure that the cross-account policy passes the Dual IAM Role and not the Cross Account IAM Role. For the policy, see the step 3 below.

Configuring the Dual IAM Role

To configure the Dual IAM Role, perform the following steps:

  1. Create an AWS EC2 policy by performing the following steps:

    1. Log into AWS Console through console.aws.amazon.com.
    2. Navigate to the Identity and Access Management interface.
    3. Navigate to the Policies interface within the Identity and Access Management interface.
    4. Click Create Policy.
    5. Click Create Your Own Policy.
    6. Enter a Policy Name for the EC2 policy.
    7. Provide a Policy Description.
    8. For the Policy Document, use one of the following samples and update the text as required.

    Here is a sample IAM template for EC2 settings.

    Sample 1 - This sample is a simpler AWS policy for EC2 settings.

    {
        "Version": "2012-10-17",
        "Statement": [
         {
             "Effect": "Allow",
             "Action": [
                 "ec2:AuthorizeSecurityGroupEgress",
                 "ec2:AuthorizeSecurityGroupIngress",
                 "ec2:AttachVolume",
                 "ec2:CancelSpotInstanceRequests",
                 "ec2:CreateSecurityGroup",
                 "ec2:CreateTags",
                 "ec2:CreateVolume",
                 "ec2:DeleteSecurityGroup",
                 "ec2:DeleteTags",
                 "ec2:DeleteVolume",
                 "ec2:Describe*",
                 "ec2:DescribeVolumes",
                 "ec2:DetachVolume",
                 "ec2:ImportKeyPair",
                 "ec2:DescribeKeyPairs",
                 "ec2:ModifyInstanceAttribute",
                 "ec2:RequestSpotInstances",
                 "ec2:RevokeSecurityGroupIngress",
                 "ec2:RunInstances",
                 "ec2:StartInstances",
                 "ec2:StopInstances",
                 "ec2:TerminateInstances"
             ],
             "Resource": ["*"]
             },
             {
             "Effect": "Allow",
             "Action": ["sts:DecodeAuthorizationMessage"],
             "Resource": ["*"]
         },
         {
             "Effect": "Allow",
              "Action": [
                   "iam:CreateServiceLinkedRole",
                   "iam:PutRolePolicy"
              ],
              "Resource": ["arn:aws:iam::*:role/aws-service-role/spot.amazonaws.com/AWSServiceRoleForEC2Spot", "arn:aws:iam::*:role/aws-service-role/spotfleet.amazonaws.com/AWSServiceRoleForEC2SpotFleet"],
              "Condition": {
                  "StringLike": {
                      "iam:AWSServiceName": ["spot.amazonaws.com","spotfleet.amazonaws.com"]
                  }
              }
         }
         ]
         }
    
  2. You must create a second IAM Role on the AWS console with S3 policies and ensure to provide access to the S3 data buckets (Bucket2) and the default location (Bucket1).

    Note

    Only the Dual IAM Role must have access to the S3 data buckets (Bucket2) and not the Cross-account IAM Role.

    Here is an example.

    {
     "Version": "2012-10-17",
     "Statement": [
                    {
                      "Effect": "Allow",
                      "Action": [
                                  "s3:DeleteObject",
                                  "s3:GetObject",
                                  "s3:GetObjectAcl",
                                  "s3:PutObject",
                                  "s3:PutObjectAcl",
                                  "s3:GetBucketAcl",
                                  "s3:GetBucketLocation",
                                  "s3:ListBucket"
                                ],
                      "Resource": [
                                    "arn:aws:s3:::<Bucket1path>/*",
                                    "arn:aws:s3:::<Bucket1path>",
                                    "arn:aws:s3:::<Bucket2path>/*",
                                    "arn:aws:s3:::<Bucket2path>"
                                  ]
                    }
     ]
    }
    
  3. Go to the AWS account and perform the following steps to update trust relationships of the Dual IAM Role:

    1. Navigate to the Identity and Access Management interface.

    2. Navigate to the Roles interface within the Identity and Access Management interface.

    3. Click the new Dual IAM role.

    4. Click the Trust Relationships tab.

    5. Click Edit Trust Relationships.

    6. For the Policy Document, use the following code and update the text as required.

      Note

      In the example below, the only the AWS service is allowed access and the allow-access is not specified for Qubole.

      {
         "Version": "2012-10-17",
         "Statement": [
         {
             "Effect": "Allow",
             "Principal": {
                           "Service": "ec2.amazonaws.com"
             },
            "Action": "sts:AssumeRole"
         }
         ]
      }
      
  4. On the AWS account, perform the following steps to create an IAM policy for the Dual IAM role:

    1. Navigate to the Identity and Access Management interface.

    2. Navigate to the Policies interface within the Identity and Access Management interface.

    3. Click Create Policy.

    4. Click Create Your Own Policy.

    5. Enter a Policy Name for the account policy.

    6. Provide a Policy Description.

    7. For the Policy Document, use the following code and update the text as required.

      {   "Version": "2012-10-17",
          "Statement": [
          {
              "Effect": "Allow",
              "Action": "iam:GetInstanceProfile",
              "Resource": "arn:aws:iam::<arn_number>:instance-profile/<Dual_Role>"
          },
          {
              "Effect": "Allow",
              "Action": "iam:PassRole",
              "Resource": "arn:aws:iam::<arn_number>:role/<Dual_Role>"
          }
          ]
      }
      

      In the above policy example, <Dual_Role> is the placeholder of the Dual IAM Role name.

  5. Navigate to the QDS UI’s Clusters page. Edit the cluster that must be accessible to the Dual IAM Role. Configure the Dual IAM Role’s name in the Advanced Configuration > EC2 SETTINGS of the cluster.

    Here is an example of the Instance Profile in a cluster’s configuration UI.

    ../../../../_images/RoleInstanceProfile.png

    You must add the Dual IAM Role name that is displayed for an instance profile in the policy. For example, in the policy element, "Resource": "arn:aws:iam::<arn_number>:instance-profile/<Dual_Role>", <Dual_Role> is the Dual IAM Role name. You must add it in the Instance Profile text box.

    For more information on configuring role-instance-profile using cluster APIs, see ec2_settings.

Limitations of Using Dual IAM Roles

The limitations of using dual IAM roles are mentioned in the following list:

  • Hive queries must only run on the cluster’s master node and not the QDS server. Use Hive on the master node only if Hive queries must access data in S3 buckets that are only accessible through the Dual IAM Role.

  • S3 Explorer in the Explore tab as well as Notebooks do not work for S3 buckets, which are only accessible through the Dual IAM Role.

  • The S3 Files data dependency on the Qubole Scheduler might not work. The dependencies only work over the default location or S3 path to which the Cross-account IAM Role has access to.

  • Data import and data export jobs must have the Use Hadoop Cluster option enabled for any data which is not accessible to the Cross-account IAM Role. The Use Hadoop Cluster option is at the end of the query composer for a data import/export command in the Analyze UI and it looks as illustrated here.

    ../../../../_images/UseHadoopClusterOption.png

    For more information, see Composing a Data Export Command through the UI and Composing a Data Import Command through the UI.

  • An Amazon S3 path in commands works only when the S3 path is accessible to the Cross-account IAM Role.

  • Data in results and logs is accessible to other users in the same QDS account given that the default S3 location, which is storing them is accessible to the Cross-account IAM Role. However, you can hide commands from other QDS users in the same account as required and the steps to hide commands are described in Using Role-based Access Control for Commands.