Understanding QDS Network Perimeter Security

Qubole offers features that provide good network perimeter security while performing data analysis. Here is a data flow diagram of QDS.

../_images/QDSDataFlow.png

Allowing IP Addresses

In general, QDS endpoints are accessible from anywhere worldwide through HTTPs and that applies to browser-based access as well as API-based access. The access can be limited to specific IP addresses. A common way of arranging this is to include all hosts that have access to QDS in a private network/VPN, and putting the IP address of the NAT gateway on an allow-list so that only members logged into the VPN have access to QDS.

To control access to QDS in this way create a ticket with Qubole Support to enable this feature for the QDS account.

Caution

Once you add IP addresses to an allow-list, your users can log in to QDS only from the allowed IP addresses.

For more information, see Listing Allowed IP Addresses.

Securing with HTTP over SSL

Qubole now supports only HTTPS. All HTTP requests are now redirected to HTTPS. This is aimed at better security for Qubole users. It is applicable to all the Clouds that QDS supports.

Securing Data Traffic on the Cluster Nodes

Each cluster is associated with a unique security group which acts as a virtual firewall that controls the traffic for the nodes within the cluster. A security group can be configured at account and cluster levels. For more information, see:

Encrypting Data At Rest

Encrypting Data Traffic to AWS S3

Qubole supports encrypting data transit to S3 in different types of clusters as mentioned below:

  • Airflow: Data traffic is not applicable to an Airflow clusters.
  • Hadoop 2 (Hive) and Spark: set fs.s3a.connection.ssl.enabled=true as an Hadoop Override to encrypt data transit to an AWS S3 location.
  • Presto: Set hive.s3.ssl.enabled to true to secure the communication between Amazon S3 and the Presto cluster using SSL.

Encrypting Data Traffic Among Cluster Nodes

Qubole supports encrypting data traffic among cluster nodes in different types of cluster as mentioned below:

Isolating from Virtual Networks through AWS Virtual Private Clouds

An AWS-VPC allows you to customize the network configuration and thus you have complete control over the virtual network. You can use IPv4 and IPv6 to securely and easily access resources and applications.Qubole supports configuring clusters in an AWS Virtual Private Cloud (VPC). It also supports AWS VPCs with private and public subnets. Configuring clusters in an AWS VPC with private and public subnets ensures the data security that is processed on the QDS platform. You can secure the data export/import from/to the QDS platform in VPCs.

For more information on how to configure a cluster in an AWS VPC, see Configuring a Cluster in a VPC with Public and Private Subnets (AWS). Unless you open the SSH port to the world, you must use tunnels to communicate. For details, see Securing through SSH Tunnelling.

Securing through SSH Tunnelling

Enable Qubole tunnel server settings on the cluster when it is in a VPC unless you want to open the SSH port to the world. Tunneling with Bastion Nodes for Private Subnets in an AWS VPC lists the IP addresses of the Qubole tunnel servers. Once tunnelling is enabled on the cluster, it is automatically used for data export/import and running commands and so on as before.

Note

It is highly recommended to use a tunnel and not open SSH to the world.

For more information on the ports that allow inbound traffic, see Understanding Cluster Network Security Characteristics (AWS).