Enabling Encryption for Data at Rest (AWS)

For security reasons, data at rest may have to be encrypted. When working with Qubole on Amazon Web Service, data is at rest in these two locations:

  1. S3. See Enable Encryption on S3.
  2. The ephemeral HDFS brought up on the EC2 compute nodes. See Enable Encryption on Ephemeral HDFS through QDS UI. To enable encryption on the ephemeral drives through a Cluster REST API, see security_settings.

Note

Encrypting Data on Amazon S3 describes the different encryption mechanisms that Qubole supports.

Enable Encryption on S3

Qubole leverages S3’s server-side encryption (SSE). For more information, see this reference.

To enable SSE in S3n filesystems, set the following property:

fs.s3n.sse=AES256

To enable this server-side encryption in S3a filesystems, use the fs.s3a.server-side-encryption-algorithm property and these are its supported values:

  • AES256 (for SSE-S3)
  • SSE-KMS
  • SSE-C

For more information, see Enabling KMS and Customer Provided Keys Server-side Encryption on the S3a File System.

The SSE can be set in clusters, Hive bootstrap and command. But Presto honors the open source server-side encryption as described in Enabling Server-side Encryption as a Hive Property in Presto.

Enabling Server-side Encryption in Clusters

(Navigate to the Control Panel page. In the Clusters tab, click the Edit to go to the Edit Cluster page or New to go the a new cluster page. set at the Add/Edit Cluster page, Overide Hadoop Configuration Variables. Set the override as:

  • fs.s3n.sse=AES256 in S3n file systems.
  • fs.s3a.server-side-encryption-algorithm=<value> in S3a file systems. The values can be AES256, SSE-KMS, or SSE-C.

Enabling Server-side Encryption as a Hive Bootstrap Setting

Set it as a Hive bootstrap setting. This would affect all Hive commands for a given account. Use this syntax:

  • set fs.s3n.sse=AES256 on S3n file systems.
  • set fs.s3a.server-side-encryption-algorithm=<value> in S3a file systems. The values can be AES256, SSE-KMS, or SSE-C.

Similarly, the same syntax is applicable on Hive commands, which is set per command and in the same command session as the command.

For example,

CREATE EXTERNAL TABLE New2 (`Col0` STRING, `Col1` STRING, `Col2` STRING) PARTITIONED BY (`20100102` STRING,`IN` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION 's3://ap-dev-qubole/common/hive/30day_1/30daysmall'; set fs.s3n.sse=AES256;

Enabling Server-side Encryption as a Hive Property in Presto

As a Presto catalog/hive.properties setting, set hive.s3.sse.enabled=true. See catalog/hive.properties for more information.

Note

The results of the select calls with the limit clause are not encrypted as the limit clause would result in bypassing of the map/reduce flow.

Results of select calls without the limit clause are encrypted. Basically, a standard Hadoop map/reduce output is encrypted. A Presto output, which does not use map/reduce is not encrypted.

Enabling KMS and Customer Provided Keys Server-side Encryption on the S3a File System

QDS supports SSE-KMS and SSE-Customer Provided Keys (SSE-C) on the S3a file system. For details on the client-side KMS encryption, see Enable AWS Key Management Service Client-side Encryption on the S3a File System.

Set the following properties to use the SSE-KMS and SSE-C encryption:

  • fs.s3a.server-side-encryption-algorithm: It is not set by default. Set it to one of these supported values:

    • AES256 (for SSE-S3)
    • SSE-KMS
    • SSE-C
  • fs.s3a.server-side-encryption.key: Its value specifies the encryption key to use if fs.s3a.server-side-encryption-algorithm has been set to SSE-KMS or SSE-C. These conditions apply to this property:

    • In case of SSE-C, the value of this property must be the Base64 encoded key.
    • If you are using SSE-KMS and leave this property empty, you would be using your default S3 KMS key. Otherwise, you must set this property to the specific KMS key ID.

Enabling Server-side Encryption while using Hadoop DistCp

While using Hadoop DistCp, these parameters can be set for server-side encryption along with the other parameters:

  • s3ServerSideEncryption: It enables encryption of data at the object level as S3 writes it to disk.
  • s3SSEAlgorithm: It is used for encryption. If you do not specify it but s3ServerSideEncryption is enabled, then AES256 algorithm is used by default. Valid values are AES256, SSE-KMS and SSE-C.
  • encryptionKey: If SSE-KMS or SSE-C is specified in the algorithm, then using this parameter, you can specify the key using which the data is encrypted. In case the algorithm is SSE-KMS, the key is not mandatory as AWS KMS would be used. If algorithm is SSE-C, then specify the key else the job fails.

Enable Encryption on Ephemeral HDFS through QDS UI

Navigate to the Clusters page, click the edit button to go to the Edit Cluster page.

Select Enable Encryption listed below Security Settings in Advanced Configuration as shown in the following figure.

../_images/EnableEncrypt.png

Enable Encryption is an option to encrypt the data at rest on the node’s ephemeral (local) storage. This includes HDFS and any intermediate output generated by Hadoop. The block device encryption is setup before the node joins the cluster and can increase the bring up time of the cluster.

To enable encryption on the ephemeral drives through a Cluster REST API, see security_settings.