Enabling Encryption for Data at Rest (AWS)

For security reasons, data at rest may have to be encrypted. When working with Qubole on Amazon Web Service, data is at rest in these two locations:

  1. S3. See Enable Encryption on S3.
  2. The ephemeral HDFS brought up on the EC2 compute nodes. See Enable Encryption on Ephemeral HDFS through QDS UI. To enable encryption on the ephemeral drives through a Cluster REST API, see security_settings.

Note

Encrypting Data on Amazon S3 describes the different encryption mechanisms that Qubole supports.

Enable Encryption on S3

Qubole leverages S3’s server-side encryption. For more information, see this reference.

To enable this server-side encryption, set the following property in S3n and S3a filesystems:

"fs.s3n.sse=AES256" (default='None')

The property can be set as an Hadoop configuration override, Hive bootstrap and per command setting, and as a Presto catalog/hive.properties as explained below.

Enabling Server-side Encryption as an Hadoop Configuration Override

As a Hadoop configuration override, set at the Add/Edit Cluster page, Overide Hadoop Configuration Variables. Set the override as fs.s3n.sse=AES256. (Navigate to the Control Panel page. In the Clusters tab, click the edit icon EditIcon to go to the Edit Cluster page or + icon to go the Add Cluster page).

Enabling Server-side Encryption as a Hive Bootstrap Setting

Set it as a Hive bootstrap setting. This would affect all Hive commands for a given account. Use this syntax: set fs.s3n.sse=AES256.

Enabling Server-side Encryption as a Hive Command Setting

For Hive commands, you can set it per command using set fs.s3n.sse=AES256; in the same command session as the command.

For example,

CREATE EXTERNAL TABLE New2 (`Col0` STRING, `Col1` STRING, `Col2` STRING) PARTITIONED BY (`20100102` STRING,`IN` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION 's3://ap-dev-qubole/common/hive/30day_1/30daysmall'; set fs.s3n.sse=AES256;

Enabling Server-side Encryption as a Hive Property Override in Presto

As a Presto catalog/hive.properties override setting, set hive.s3.serverside-encryption-algorithm=AES256. See catalog/hive.properties for more information.

Note

The results of the select calls with the limit clause are not encrypted as the limit clause would result in bypassing of the map/reduce flow.

Results of select calls without the limit clause are encrypted. Basically, a standard Hadoop map/reduce output is encrypted. A Presto output, which does not use map/reduce is not encrypted.

Enabling KMS and Customer Provided Keys Server-side Encryption on the S3a File System

QDS supports SSE-KMS and SSE-Customer Provided Keys (SSE-C) on the S3a file system. For details on the client-side KMS encryption, see Enable AWS Key Management Service Client-side Encryption on the S3a File System.

Set the following properties to use the SSE-KMS and SSE-C encryption:

  • fs.s3a.server-side-encryption-algorithm: It is not set by default. Set it to one of these supported values:

    • AES256 (for SSE-S3)
    • SSE-KMS
    • SSE-C
  • fs.s3a.server-side-encryption.key: Its value specifies the encryption key to use if fs.s3a.server-side-encryption-algorithm has been set to SSE-KMS or SSE-C. These conditions apply to this property:

    • In case of SSE-C, the value of this property must be the Base64 encoded key.
    • If you are using SSE-KMS and leave this property empty, you would be using your default S3 KMS key. Otherwise, you must set this property to the specific KMS key ID.

Enable Encryption on Ephemeral HDFS through QDS UI

Navigate to the Clusters page, click the edit button to go to the Edit Cluster page.

Select Enable Encryption listed below Security Settings in Advanced Configuration as shown in the following figure.

../_images/EnableEncrypt.png

Enable Encryption is an option to encrypt the data at rest on the node’s ephemeral (local) storage. This includes HDFS and any intermediate output generated by Hadoop. The block device encryption is setup before the node joins the cluster and can increase the bring up time of the cluster.

To enable encryption on the ephemeral drives through a Cluster REST API, see security_settings.