Enabling Encryption for Data at Rest on Amazon S3

For security reasons, data at rest may have to be encrypted. When working with Qubole on Amazon Web Service, data is at rest in these two locations:

  1. S3
  2. The ephemeral HDFS brought up on the EC2 compute nodes

Enable Encryption on S3

Qubole leverages S3’s server-side encryption. For more information, see this reference.

To enable this server-side encryption, set the following property:

"fs.s3n.sse=AES256" (default='None')

The property can be set as an Hadoop configuration override, Hive bootstrap and per command setting, and as a Presto catalog/hive.properties as explained below.

Enabling Server-side Encryption as an Hadoop Configuration Override

As a Hadoop configuration override, set at the Add/Edit Cluster page, Overide Hadoop Configuration Variables. Set the override as fs.s3n.sse=AES256. (Navigate to the Control Panel page. In the Clusters tab, click the edit icon EditIcon to go to the Edit Cluster page or + icon to go the Add Cluster page).

Enabling Server-side Encryption as a Hive Bootstrap Setting

Set it as a Hive bootstrap setting. This would affect all Hive commands for a given account. Use this syntax: set fs.s3n.sse=AES256.

Enabling Server-side Encryption as a Hive Command Setting

For Hive commands, you can set it per command using set fs.s3n.sse=AES256; in the same command session as the command.

For example,

CREATE EXTERNAL TABLE New2 (`Col0` STRING, `Col1` STRING, `Col2` STRING) PARTITIONED BY (`20100102` STRING,`IN` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION 's3://ap-dev-qubole/common/hive/30day_1/30daysmall'; set fs.s3n.sse=AES256;

Enabling Server-side Encryption as a Hive Property Override in Presto

As a Presto catalog/hive.properties override setting, set hive.s3.serverside-encryption-algorithm=AES256. See catalog/hive.properties for more information.

Note

The results of the select calls with the limit clause are not encrypted as the limit clause would result in bypassing of the map/reduce flow.

Results of select calls without the limit clause are encrypted. Basically, a standard Hadoop map/reduce output is encrypted. A Presto output, which does not use map/reduce is not encrypted.

Enable Encryption on Ephemeral HDFS through QDS UI

Navigate to the Clusters page, click the edit button to go to the Edit Cluster page.

Select Enable Encryption listed below Security Settings in Advanced Configuration as shown in the following figure.

../_images/EnableEncrypt.png

Enable Encryption is an option to encrypt the data at rest on the node’s ephemeral (local) storage. This includes HDFS and any intermediate output generated by Hadoop. The block device encryption is setup before the node joins the cluster and can increase the bring up time of the cluster.