Understanding Cluster Network Security Characteristics (AWS)¶
Each cluster is associated with a unique security group which acts as a virtual firewall that controls the traffic for the nodes within the cluster. The ports that allow inbound traffic are:
Ports that allow inbound traffic from Qubole’s security group
sg-a8c407c0in the AWS us-east-1 region are:
Port 22, which is the SSH port.
Hadoop 2 requires only port 22 to allow inbound traffic from Qubole’s security group.
Port 9000, which is the NameNode port
Port 50070, which is the NameNode web port
Port 50075, which is the DataNode web port
Port 8081, which is the Presto server port
Port 8443, which is the HTTPs port for a Presto server
Port 8082, which is the Zeppelin server port
Port 18080, which is the Spark History Server port
Port 22 allows inbound traffic from:
- Qubole’s security group
sg-a8c407c0in the AWS us-east-1 region and the EC2-classic platform.
CIDR 0.0.0.0/0 (world) for all other cases. (These include clusters in an AWS VPC, including the default VPC in the us-east-1 region, and AWS regions other than us-east-1).
- Qubole’s security group
Within a cluster, participating nodes can communicate with each other on all ports.
Configuring a Cluster Proxy¶
Configuring a cluster proxy ensures that the outbound data traffic from clusters reaches Qubole Control Plane as well as other required services such as EC2/S3.
As a prerequisite, create a ticket with Qubole Support to enable the proxy configuration for the account. In addition, Qubole requires the following from you:
Provide a proxy server URL in this form:
Domain names, URLs, and IP addresses that must bypass the proxy server when connected from the cluster nodes. The default value includes
169.254.169.254, 127.0.0.1, localhostand S3 endpoints.
Qubole recommends configuring S3 VPC endpoint as described in endpoints for Amazon S3. This helps reduce the load on the proxy server and also ensures that the traffic to S3 from the cluster nodes does not go outside the AWS network.
The proxy server protocol to use if the proxy server does not support both
Ensure to provide a persistent security group for Qubole clusters when you configure the outbound communication from the cluster nodes to pass through an Internet proxy server. You can configure a persistent security group in the Advanced Configuration tab of that cluster’s UI as described in Advanced Configuration: Modifying Security Settings (AWS).
Configuring Outbound Endpoints for Proxy Server¶
For a proxy server setup, you must allow access to the following endpoints:
*.qubole.comto ensure that the outbound data traffic from Qubole clusters reaches the Qubole Control Plane.
*.amazonaws.comto ensure that the outbound data traffic from Qubole clusters reaches
*.amazonaws.comto invoke EC2 API calls from the cluster nodes.
Additional Endpoints for Jupyter and Zeppelin Notebooks¶
You must also allow access to any maven coordinates that are defined in notebooks’ interpreter settings.
Additional Endpoints for Package Management¶
Allow access to these endpoints for pip/Conda packages:
Allow access to these endpoints for CRAN packages: