Understanding QDS Network Perimeter Security
Qubole offers features that provide good network perimeter security while performing data analysis. Here is a data flow diagram of QDS.
Allowing IP Addresses
In general, QDS endpoints are accessible from anywhere worldwide through HTTPs and that applies to browser-based access as well as API-based access. The access can be limited to specific IP addresses. A common way of arranging this is to include all hosts that have access to QDS in a private network/VPN, and putting the IP address of the NAT gateway on an allow-list so that only members logged into the VPN have access to QDS.
To control access to QDS in this way create a ticket with Qubole Support to enable this feature for the QDS account.
Caution
Once you add IP addresses to an allow-list, your users can log in to QDS only from the allowed IP addresses.
For more information, see Listing Allowed IP Addresses.
Securing with HTTP over SSL
Qubole now supports only HTTPS. All HTTP requests are now redirected to HTTPS. This is aimed at better security for Qubole users. It is applicable to all the Clouds that QDS supports.
Securing Data Traffic on the Cluster Nodes
Each cluster is associated with a unique security group which acts as a virtual firewall that controls the traffic for the nodes within the cluster. A security group can be configured at account and cluster levels. For more information, see:
Encrypting Data At Rest
For AWS, Qubole encrypts data at rest on S3 to prevent unauthorized access to the S3 data. You can enable server-based encryption as described in Enabling Server-side Encryption in QDS (AWS). Enabling Client-side Encryption (AWS) describes how to enable encryption on the AWS client side.
On Azure, data at rest is encrypted by default; see Encryption for Data at Rest on Azure.
Encrypting Data Traffic to AWS S3
Qubole supports encrypting data transit to S3 in different types of clusters as mentioned below:
Airflow: Data traffic is not applicable to an Airflow clusters.
Hadoop 2 (Hive) and Spark: set fs.s3a.connection.ssl.enabled=true as an Hadoop Override to encrypt data transit to an AWS S3 location.
Presto: Set
hive.s3.ssl.enabled
totrue
to secure the communication between Amazon S3 and the Presto cluster using SSL.
Encrypting Data Traffic Among Cluster Nodes
Qubole supports encrypting data traffic among cluster nodes in different types of cluster as mentioned below:
Airflow: As Airflow is a single-node cluster, data traffic among cluster nodes is not applicable to it.
Hadoop 2 (Hive) and Presto: Encrypting Communication within a Presto Cluster describes how to encrypt the data among Hadoop 2 (Hive) cluster nodes or Presto cluster nodes.
Spark: Encrypting and Authenticating Spark Data in Transit describes how to encrypt the data in transit on Spark cluster nodes.
Isolating from Virtual Networks through AWS Virtual Private Clouds
An AWS-VPC allows you to customize the network configuration and thus you have complete control over the virtual network. You can use IPv4 and IPv6 to securely and easily access resources and applications.Qubole supports configuring clusters in an AWS Virtual Private Cloud (VPC). It also supports AWS VPCs with private and public subnets. Configuring clusters in an AWS VPC with private and public subnets ensures the data security that is processed on the QDS platform. You can secure the data export/import from/to the QDS platform in VPCs.
For more information on how to configure a cluster in an AWS VPC, see Configuring a Cluster in a VPC with Public and Private Subnets (AWS). Unless you open the SSH port to the world, you must use tunnels to communicate. For details, see Securing through SSH Tunnelling.
Securing through SSH Tunnelling
Enable Qubole tunnel server settings on the cluster when it is in a VPC unless you want to open the SSH port to the world. Tunneling with Bastion Nodes for Private Subnets in an AWS VPC lists the IP addresses of the Qubole tunnel servers. Once tunnelling is enabled on the cluster, it is automatically used for data export/import and running commands and so on as before.
Note
It is highly recommended to use a tunnel and not open SSH to the world.
For more information on the ports that allow inbound traffic, see Understanding Cluster Network Security Characteristics (AWS).