Understanding QDS Network Perimeter Security

Qubole offers features that provide good network perimeter security while performing data analysis. Here is a data flow diagram of QDS.

../_images/QDSDataFlow.png

Whitelisting IP Addresses

In general, Qubole endpoints are accessible from anywhere worldwide through HTTPs and that applies to browser-based access as well as API-based access. The access can be limited to specific IP addresses. A common way of arranging this is, for example, to put all machines that may access Qubole in a private network/VPN and whitelisting the IP address of the NAT gateway so that only members logged into the VPN can access Qubole.

Whitelisting one or more IP address is a feature that Qubole offers if you want a single or a fixed number of IP addresses to be used for accessing the QDS account.

Create a ticket with Qubole Support to enable this feature for the QDS account.

Caution

Once you add IP addresses to whitelist, logging in to QDS is possible only from the whitelisted IP addresses.

For more information, see Whitelisting IP Addresses.

Securing with HTTP over SSL

Qubole now supports only HTTPS. All HTTP requests are now redirected to HTTPS. This is aimed at better security for Qubole users. It is applicable to all supported clouds, that is Qubole-on-AWS, Qubole-on-Azure, and Qubole-on-Oracle.

Securing Data Traffic on the Cluster Nodes

Each cluster is associated with a unique security group which acts as a virtual firewall that controls the traffic for the nodes within the cluster. A security group can be configured at account and cluster levels. For more information, see:

Encrypting Data At Rest

Qubole also encrypts data at rest on ASW S3 to prevent unauthorized access to the S3 data. You can enable it as described in Enabling Encryption for Data at Rest on Amazon S3. This is the server-based encryption.

Enable AWS Key Management Service Encryption in QDS describes how to enable encryption at the AWS client side.

Encrypting Data Traffic to AWS S3

Qubole supports encrypting data transit to S3 in different types of clusters as mentioned below:

  • Airflow: Data traffic is not applicable to an Airflow clusters.
  • Hadoop 1: Set fs.s3.https.only=true as an Hadoop Override to encrypt data transit to an AWS S3 location.
  • Hadoop 2 (Hive) and Spark: set fs.s3a.connection.ssl.enabled=true as an Hadoop Override to encrypt data transit to an AWS S3 location.
  • Presto: Set hive.s3.ssl.enabled to true to secure the communication between Amazon S3 and the Presto cluster using SSL.

Encrypting Data Traffic Among Cluster Nodes

Qubole supports encrypting data traffic among cluster nodes in different types of cluster as mentioned below:

Isolating from Virtual Networks through AWS Virtual Private Clouds

An AWS-VPC allows you to customize the network configuration and thus you have complete control over the virtual network. You can use IPv4 and IPv6 to securely and easily access resources and applications.Qubole supports configuring clusters in an AWS Virtual Private Cloud (VPC). It also supports AWS VPCs with private and public subnets. Configuring clusters in an AWS VPC with private and public subnets ensures the data security that is processed on the QDS platform. You can secure the data export/import from/to the QDS platform in VPCs.

For more information on how to configure a cluster in an AWS VPC, see Configuring a Cluster in a VPC with Public and Private Subnets (AWS). Unless you open the SSH port to the world, you must use tunnels to communicate. For details, see Securing through HTTP Tunnelling.

Securing through HTTP Tunnelling

Enable Qubole tunnel server settings on the cluster when it is in a VPC unless you want to open the SSH port to the world. Create a ticket with Qubole Support to enable the Qubole tunnel server settings on the cluster. Once tunnelling is enabled on the cluster, it is automatically used for data export/import and running commands and so on as before.

Note

It is highly recommended to use a tunnel and not open SSH to the world.

For more information on the ports that allow inbound traffic, see Understanding Cluster Network Security Characteristics (AWS).