Enabling Encryption for Data at Rest (AWS)¶
For security reasons, data at rest may have to be encrypted. When working with Qubole on Amazon Web Service, data is at rest in these two locations:
- S3. See Enable Encryption on S3.
- The ephemeral HDFS brought up on the EC2 compute nodes. See Enable Encryption on Ephemeral HDFS through QDS UI. To enable encryption on the ephemeral drives through a Cluster REST API, see security_settings.
Encrypting Data on Amazon S3 describes the different encryption mechanisms that Qubole supports.
Enable Encryption on S3¶
Qubole leverages S3’s server-side encryption (SSE). For more information, see this reference.
To enable SSE in S3n filesystems, set the following property:
To enable this server-side encryption in S3a filesystems, use the
and these are its supported values:
For more information, see Enabling KMS and Customer Provided Keys Server-side Encryption on the S3a File System.
The SSE can be set in clusters, Hive bootstrap and command. But Presto honors the open source server-side encryption as described in Enabling Server-side Encryption as a Hive Property in Presto.
Enabling Server-side Encryption in Clusters¶
(Navigate to the Control Panel page. In the Clusters tab, click the Edit to go to the Edit Cluster page or New to go the a new cluster page. set at the Add/Edit Cluster page, Overide Hadoop Configuration Variables. Set the override as:
fs.s3n.sse=AES256in S3n file systems.
fs.s3a.server-side-encryption-algorithm=<value>in S3a file systems. The values can be
Enabling Server-side Encryption as a Hive Bootstrap Setting¶
Set it as a Hive bootstrap setting. This would affect all Hive commands for a given account. Use this syntax:
set fs.s3n.sse=AES256on S3n file systems.
set fs.s3a.server-side-encryption-algorithm=<value>in S3a file systems. The values can be
Similarly, the same syntax is applicable on Hive commands, which is set per command and in the same command session as the command.
CREATE EXTERNAL TABLE New2 (`Col0` STRING, `Col1` STRING, `Col2` STRING) PARTITIONED BY (`20100102` STRING,`IN` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION 's3://ap-dev-qubole/common/hive/30day_1/30daysmall'; set fs.s3n.sse=AES256;
Enabling Server-side Encryption as a Hive Property in Presto¶
As a Presto catalog/hive.properties setting, set
hive.s3.sse.enabled=true. See catalog/hive.properties
for more information.
The results of the select calls with the limit clause are not encrypted as the limit clause would result in bypassing of the map/reduce flow.
Results of select calls without the limit clause are encrypted. Basically, a standard Hadoop map/reduce output is encrypted. A Presto output, which does not use map/reduce is not encrypted.
Enabling KMS and Customer Provided Keys Server-side Encryption on the S3a File System¶
QDS supports SSE-KMS and SSE-Customer Provided Keys (SSE-C) on the S3a file system. For details on the client-side KMS encryption, see Enable AWS Key Management Service Client-side Encryption on the S3a File System.
Set the following properties to use the SSE-KMS and SSE-C encryption:
fs.s3a.server-side-encryption-algorithm: It is not set by default. Set it to one of these supported values:
fs.s3a.server-side-encryption.key: Its value specifies the encryption key to use if
fs.s3a.server-side-encryption-algorithmhas been set to
SSE-C. These conditions apply to this property:
- In case of
SSE-C, the value of this property must be the Base64 encoded key.
- If you are using SSE-KMS and leave this property empty, you would be using your default S3 KMS key. Otherwise, you must set this property to the specific KMS key ID.
- In case of
Enabling Server-side Encryption while using Hadoop DistCp¶
While using Hadoop DistCp, these parameters can be set for server-side encryption along with the other parameters:
s3ServerSideEncryption: It enables encryption of data at the object level as S3 writes it to disk.
s3SSEAlgorithm: It is used for encryption. If you do not specify it but
s3ServerSideEncryptionis enabled, then AES256 algorithm is used by default. Valid values are
encryptionKey: If SSE-KMS or SSE-C is specified in the algorithm, then using this parameter, you can specify the key using which the data is encrypted. In case the algorithm is
SSE-KMS, the key is not mandatory as AWS KMS would be used. If algorithm is
SSE-C, then specify the key else the job fails.
Enable Encryption on Ephemeral HDFS through QDS UI¶
Navigate to the Clusters page, click the edit button to go to the Edit Cluster page.
Select Enable Encryption listed below Security Settings in Advanced Configuration as shown in the following figure.
Enable Encryption is an option to encrypt the data at rest on the node’s ephemeral (local) storage. This includes HDFS and any intermediate output generated by Hadoop. The block device encryption is setup before the node joins the cluster and can increase the bring up time of the cluster.
To enable encryption on the ephemeral drives through a Cluster REST API, see security_settings.