Creating Dual IAM Roles for your Account
With a view to providing greater data security, Qubole supports the configuration of two IAM roles for your account. Create one IAM role that can only access the default location buckets on Amazon S3. Let’s call this the Service IAM role. Optionally, you can configure this role to access any non-secure data. Create another IAM role (in the same account), called the Secondary IAM role, that can access Amazon S3 buckets containing sensitive data.
Assign the Service IAM role, with access to only the default location, to Qubole. Assign the Secondary IAM Role to the clusters. With this approach, you can restrict Qubole’s access to only those commands and queries that need to operate on the data.
To set up the Dual IAM roles-based authentication, configure the following 2 IAM roles:
The Secondary IAM role. Only the cluster can assume this role. It has access to Amazon S3 buckets containing sensitive data.
The Service IAM role. The Qubole account is configured with this role, and it is similar to the cross-account IAM role. This role can only access the default location buckets on Amazon S3. The default location is the location where logs, output data, and so on are stored for any data that is created.
Secondary IAM role |
Service IAM role |
---|---|
Has Amazon S3 permissions over the default location |
Has Amazon S3 permissions over the default location |
Has |
Has |
Has Amazon S3 permissions over Input/Data Amazon S3 buckets |
No Amazon S3 permissions over Input/Data Amazon S3 buckets |
Service IAM role users and Qubole users cannot assume this role. However, they can start Amazon EC2 instances with this role. |
Qubole keys assume the Service role. |
Note
In a Dual IAM roles setup, you must allow the Service IAM role to only access the default location and, optionally, any non-secure data.
Important
Ensure that the cross-account policy passes the Secondary IAM role and not the Service IAM role.
Configure the Secondary IAM Role
The Secondary IAM role is configured at the QDS cluster level to interact with data sources. This provides better data security since the Service role (available to Qubole) does not have direct access to Amazon S3 data buckets. Only clusters are provided access to Amazon S3 data buckets using the Secondary role (attached to all instances).
To configure the Secondary IAM role, perform the following steps:
Step 1. Create an Amazon EC2 policy for the Secondary IAM Role
Navigate to the Identity and Access Management page.
On the left pane, click Policies.
On the right pane, click Create policy. This opens on a separate page.
On the Create policy page, click the JSON tab.
Copy and paste the sample policy given below.
Click Review policy.
In the Name field, enter a name for the policy.
Click Create policy. This creates your Amazon EC2 policy.
Sample Amazon EC2 policy
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "ec2:AuthorizeSecurityGroupEgress", "ec2:AuthorizeSecurityGroupIngress", "ec2:AttachVolume", "ec2:CancelSpotInstanceRequests", "ec2:CreateSecurityGroup", "ec2:CreateTags", "ec2:CreateVolume", "ec2:DeleteSecurityGroup", "ec2:DeleteTags", "ec2:DeleteVolume", "ec2:Describe*", "ec2:DescribeVolumes", "ec2:DetachVolume", "ec2:ImportKeyPair", "ec2:DescribeKeyPairs", "ec2:ModifyInstanceAttribute", "ec2:RequestSpotInstances", "ec2:RevokeSecurityGroupIngress", "ec2:RunInstances", "ec2:StartInstances", "ec2:StopInstances", "ec2:TerminateInstances" ], "Resource": ["*"] }, { "Effect": "Allow", "Action": ["sts:DecodeAuthorizationMessage"], "Resource": ["*"] }, { "Effect": "Allow", "Action": [ "iam:CreateServiceLinkedRole", "iam:PutRolePolicy" ], "Resource": ["arn:aws:iam::*:role/aws-service-role/spot.amazonaws.com/AWSServiceRoleForEC2Spot", "arn:aws:iam::*:role/aws-service-role/spotfleet.amazonaws.com/AWSServiceRoleForEC2SpotFleet"], "Condition": { "StringLike": { "iam:AWSServiceName": ["spot.amazonaws.com","spotfleet.amazonaws.com"] } } } ] }
If you are using heterogeneous clusters, see An Overview of Heterogeneous Nodes in Clusters.
Step 2. Create an Amazon S3 Policy
Step 3. Create an IAM Role
Create an IAM role on the Amazon console with Amazon S3 policies and provide access to the Amazon S3 and the default location data buckets.
Step 4. Create a PassRole policy for the Secondary IAM Role
On the AWS account, perform the following steps to create an PassRole policy for the Secondary IAM role:
Navigate to the Identity and Access Management page.
On the left pane, click Policies.
On the right pane, click Create policy. This opens on a separate page.
On the Create policy page, click the JSON tab.
Copy and paste the sample policy provided below.
Click Review policy.
In the Name field, enter a name for the policy.
Click Create policy. This creates your IAM policy for the Secondary role.
Use the following sample:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "iam:GetInstanceProfile", "Resource": "arn:aws:iam::<arn_number>:instance-profile/<Secondary_Role>" }, { "Effect": "Allow", "Action": "iam:PassRole", "Resource": "arn:aws:iam::<arn_number>:role/<Secondary_Role>" } ] }
In the above policy example, <Secondary_Role>
is a placeholder for the Secondary IAM role name.
Step 5. Attach policies to the Secondary IAM Role
Attach the policies, created in steps 1(EC2), 2(S3), and 4(PassRole), to the Secondary role.
Step 6. Configure the Secondary IAM Role
Navigate to the Clusters page. Edit the cluster that must be accessible to the Secondary IAM role. Configure the Secondary IAM role’s name in the Advanced Configuration > EC2 SETTINGS of the cluster.
Here is an example of the Instance Profile in a cluster’s configuration UI.
You must add the Secondary IAM role name that appears for an instance profile in the policy. For example, in the
policy element, "Resource": "arn:aws:iam::<arn_number>:instance-profile/<Secondary_Role>"
,
<Secondary_Role>
is the Secondary IAM role name. You must add it in the Instance Profile text box.
For more information on configuring role-instance-profile
using cluster APIs, see ec2_settings.
Step 7. Update the Trust Relationship
On the AWS account, perform the following steps to update trust relationships for the Secondary IAM Role:
Navigate to the Identity and Access Management page.
On the left pane, click Roles.
On the Roles page, select the newly created Secondary IAM role.
Click the Trust Relationship tab, then click Edit trust relationship.
Copy and paste the JSON text below.
Click Update trust policy.
Note
In the sample below, only the AWS service is allowed access.
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "ec2.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }
Configure the Service IAM Role
The Service IAM role is a cross-AWS-account role created at the account level. See Configuring the Qubole Data Service for details on creating IAM roles.
To configure the Service IAM role, perform the following steps:
See Managing Roles.
Step 1. Obtain the Qubole AWS Account ID and External ID
Step 2. Create an Amazon EC2 Policy
Step 3. Create an Amazon S3 Policy for the Service IAM Role
Only the Secondary IAM role must have access to the Amazon S3 data buckets (Bucket2) and not the Service IAM role.
Here is a sample policy.
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:DeleteObject", "s3:GetObject", "s3:GetObjectAcl", "s3:PutObject", "s3:PutObjectAcl", "s3:GetBucketAcl", "s3:GetBucketLocation", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::<Bucket1path>/*", "arn:aws:s3:::<Bucket1path>", ] } ] }
Step 4. Create an IAM Service Role
Create another IAM role (separate from the one created above for the Secondary role). See Create an IAM Service Role
Step 5. Create an Amazon Cross-account Policy
Step 6. Update the Cross-account Policy for an IAM Role
Step 7. Update IAM Role Trust Relationships
Limitations of Using Dual IAM Roles
The following are the limitations of using dual IAM roles:
Run Hive queries only on the cluster’s coordinator node (and not on the QDS server). Use Hive on the Coordinator node only if Hive queries must access data in Amazon S3 buckets that are only accessible through the Secondary IAM role. See Understanding Different Ways to Run Hive.
The S3 Explorer (available in the Explore, Jupyter, and Notebook tabs) will not work for Amazon S3 buckets that are only accessible through the Secondary IAM role.
The Amazon S3 Files data dependency on the Qubole Scheduler may not work. The dependencies only work over the default location or the Amazon S3 path to which the Service IAM role has access. To overcome this limitation, ensure that the output of the dependent job is always present in an Amazon S3 location that is accessible to Service IAM role.
An Amazon S3 path in commands works only when the path can be accessed by the Service IAM role. When you pass the Amazon S3 path in commands, Qubole may perform certain Amazon S3 operations over that location in our control plane. These may fail if the Service IAM role does not have access to the Amazon S3 path.
Results and logs data is accessible to other users in the same QDS account as the default Amazon S3 location, which is storing them, is accessible by the Service IAM role. However, you can hide commands from other QDS users in the same account. See Using Role-based Access Control for Commands for details.