Configuring S3Guard for S3a File System

S3Guard is introduced to address consistency issues in QDS. It uses a table in DynamoDB to read and write metadata for any S3 path. S3Guard only stores the metadata (not the actual data) and the cost of this is minimal. The major expense incurred in using DynamoDB table is its provisioned IO capacity. DynamoDB offers different pricings based on the capacity mode in a table.

DynamoDB has two modes (Provisioned mode and OnDemand mode) and both of them have their pros and cons. If you can predict the application traffic, you must use Provisioned mode. However, use OnDemand mode if the application traffic is unpredictable. To know more, see Amazon DynamoDB pricing.

You can share a table across multiple buckets. However, you must know the pros and cons of this. For more information, see here.

For S3Guard to work with DynamoDB table, you should provide read and write permissions to the table that you create. To ensure that, you must create an Access policy for DynamoDB table and attach it to your IAM identities such as roles, users, and so on. For more information and examples, see Using Identity-Based Policies (IAM Policies) for Amazon DynamoDB. Here is the list of the non-admin permissions required for S3Guard to work with DynamoDB:

Non-admin Permissions:

  • dynamodb:GetItem
  • dynamodb:BatchGetItem
  • dynamodb:BatchWriteItem
  • dynamodb:DeleteItem
  • dynamodb:PutItem
  • dynamodb:Query
  • dynamodb:UpdateItem
  • dynamodb:DescribeTable

Note

The Admins must pre-create table in DynamoDB to ensure that permissions like dynamodb:CreateTable and dynamodb:DeleteTable are not required for the non-admin users.

Keys and Values for the Table

After you create the table in DynamoDB, enter the following values in the table for S3Guard to work as expected.

Key Value
parent ../VERSION
child ../VERSION
table_version

100

Note

This version number is to check whether S3Guard’s code is compatible to read or write using this table.

table_created

<current epoch timestamp>

This key value denotes the table creation time.

Hadoop Overrides for S3Guard Configuration on a Cluster

To pass Hadoop overrides, navigate to the specific cluster’s configuration UI. Under Advanced Configuration > Hadoop Cluster Settings, pass the override in the Override Hadoop Configuration Variables text box. For more information, see Advanced Configuration: Modifying Hadoop Cluster Settings.

fs.s3a.metadatastore.impl org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
fs.s3a.s3guard.ddb.table <name of the table>
fs.s3a.s3guard.ddb.region <region of the table>

Note

Qubole doesn’t support per bucket level configuration in S3Guard for Hadoop version 2.6.