Understanding Cluster Network Security Characteristics (AWS)
Each cluster is associated with a unique security group which acts as a virtual firewall that controls the traffic for the nodes within the cluster. The ports that allow inbound traffic are:
Ports that allow inbound traffic from Qubole’s security group
sg-a8c407c0
in the AWS us-east-1 region are:Port 22, which is the SSH port.
Note
Hadoop 2 requires only port 22 to allow inbound traffic from Qubole’s security group.
Port 9000, which is the NameNode port
Port 50070, which is the NameNode web port
Port 50075, which is the DataNode web port
Port 8081, which is the Presto server port
Port 8443, which is the HTTPs port for a Presto server
Port 8082, which is the Zeppelin server port
Port 18080, which is the Spark History Server port
Port 22 allows inbound traffic from:
Qubole’s security group
sg-a8c407c0
in the AWS us-east-1 region and the EC2-classic platform.
Note
CIDR 0.0.0.0/0 (world) for all other cases. (These include clusters in an AWS VPC, including the default VPC in the us-east-1 region, and AWS regions other than us-east-1).
Create a ticket with Qubole Support if you want to restrict SSH port (port 22) access to limited IP addresses. For more information, see Creating a Security Group in the VPC.
Within a cluster, participating nodes can communicate with each other on all ports.
Configuring a Cluster Proxy
Configuring a cluster proxy ensures that the outbound data traffic from clusters reaches Qubole Control Plane as well as other required services such as EC2/S3.
As a prerequisite, create a ticket with Qubole Support to enable the proxy configuration for the account. In addition, Qubole requires the following from you:
Provide a proxy server URL in this form:
<my-squid-proxy.domain>:<port>
Domain names, URLs, and IP addresses that must bypass the proxy server when connected from the cluster nodes. The default value includes
169.254.169.254, 127.0.0.1, localhost
and S3 endpoints.Note
Qubole recommends configuring S3 VPC endpoint as described in endpoints for Amazon S3. This helps reduce the load on the proxy server and also ensures that the traffic to S3 from the cluster nodes does not go outside the AWS network.
The proxy server protocol to use if the proxy server does not support both
http
andhttps
protocols.
Note
Ensure to provide a persistent security group for Qubole clusters when you configure the outbound communication from the cluster nodes to pass through an Internet proxy server. You can configure a persistent security group in the Advanced Configuration tab of that cluster’s UI as described in Advanced Configuration: Modifying Security Settings (AWS).
Configuring Outbound Endpoints for Proxy Server
For a proxy server setup, you must allow access to the following endpoints:
Allow
*.qubole.com
to ensure that the outbound data traffic from Qubole clusters reaches the Qubole Control Plane.Allow
https://***.cloudfront.net/i
.Allow
*.amazonaws.com
to ensure that the outbound data traffic from Qubole clusters reaches*.amazonaws.com
to invoke EC2 API calls from the cluster nodes.Allow
https://app.datadoghq.com
.
Additional Endpoints for Jupyter and Zeppelin Notebooks
You must also allow access to any maven coordinates that are defined in notebooks’ interpreter settings.
Additional Endpoints for Package Management
Allow access to these endpoints for pip/Conda packages:
https://<CustomChannelURLs>/
Allow access to these endpoints for CRAN packages:
Additional Endpoints
If you use a public git repo as a PyPI package (pip install
), allow access to git URLs (GitHub/GitLab/Bitbucket). You must
also allow access to web URLs that are added in node bootstraps.