Configuring a Cluster in a VPC with Public and Private Subnets (AWS)

This topic explains how to configure a cluster in a VPC with public and private subnets.

Configuring a cluster in a VPC with public and private subnets involves the following steps:

  1. Creating a VPC with Private and Public Subnets on AWS
  2. Configuring the Route Tables
  3. Creating a Security Group in the VPC
  4. Bringing up the Bastion Host in the Public Subnet
  5. Adding a Custom SSH Key to the Bastion Node (Optional)
  6. Assigning the Security Group to the Bastion Host
  7. Creating a Cluster in the VPC

See reference documentation for more information.

Creating a VPC with Private and Public Subnets on AWS

  1. Create a VPC.
  2. For the VPC, set the EnableDNSHostnames property to true.
  3. Create the required private and public subnets for the VPC.
  4. If you want to use a private subnet, then create a NAT gateway in a public subnet for that VPC.

See Working with VPCs and Subnets for the detailed information.

Note

Tunnelling with Bastion Nodes for Private Subnets in an AWS VPC provides a list of IP addresses to whitelist on https://api.qubole.com and https://us.qubole.com QDS environments.

Configuring the Route Tables

You should configure the route tables used with private and public subnets to control the routing for the subnet.

  • Configure Route Table used with a private subnet All outbound traffic (0.0.0.0/0) must go to the NAT gateway or the NAT instance that resides in the public subnet. Qubole recommends using a NAT gateway instead of a customer-setup NAT instance for high reliability. You should ensure that the Route Table contains an Amazon S3 endpoint as one of the routes.

    If you do not find an Amazon S3 endpoint, create a VPC endpoint to allow direct access to the AWS S3 object store in the region in which the VPC is in and add this VPC endpoint as an entry in the private subnet’s Route Table.

    Some AWS regions do not support NAT gateways. If the VPC is in such AWS region, create a NAT instance as described in NAT Instances. Set the following inbound and outbound rules in the NAT instance.

    Inbound

    • Open HTTP and HTTPS ports for private subnet CIDR.

    Outbound

    • Allow outgoing traffic on all ports to everywhere.
  • Configure Route Table used with a public subnet All outbound traffic (0.0.0.0/0) must go to the internet gateway. You should ensure that the Route Table contains an Amazon S3 endpoint as one of the routes.

    If you do not find an Amazon S3 endpoint, add the same VPC endpoint that was created for the private subnet in the route table of the public subnet.

Creating a Security Group in the VPC

You should create a Security Group for the bastion host on the VPC where you want to launch the cluster and set the following inbound and outbound settings:

Inbound

  • Refer to Tunnelling with Bastion Nodes for Private Subnets in an AWS VPC to get the Qubole tunnel server’s IP address. Allow SSH access (on port 22) to the Qubole tunnel server. You can open the SSH port access of the bastion node to the world or to only a few IP addresses by contacting Qubole Support.

    Note

    It is highly recommended to use a tunnel and not open SSH to the world.

    Create a ticket with Qubole Support to enable the Qubole tunnel server settings on the cluster. Once tunnelling is enabled on the cluster, it is automatically used for data export/import and running commands and so on as before.

    For more information on the ports that allow inbound traffic, see Understanding Cluster Network Security Characteristics (AWS).

  • Allow port 7000 access to the private subnet CIDR.

Outbound

  • Allow outbound traffic on all ports to everywhere. However, allowing outbound traffic to only a private subnet CIDR is also sufficient.

Bringing up the Bastion Host in the Public Subnet

Qubole uses the bastion host for all forward/reverse communication from the cluster. You should bring up the bastion host in the public subnet of the VPC.

Adding a Unique SSH Key while bringing up the Bastion Host

Note

This is a mandatory step if the QDS account is on https://us.qubole.com and https://in.qubole.com.

Clusters support the account-level unique SSH key feature, which would enable Qubole to SSH into the bastion host in the subsequent logins through the unique public SSH key.

Important

The account-level unique SSH key feature is enabled by default on https://us.qubole.com and https://in.qubole.com. If you have the QDS account on any other QDS environment, then create a ticket with Qubole Support to get it enabled.

View the Account-level Public SSH Key describe the account API to view the account-level public SSH key.

Refresh the Account-level SSH Key Pair describes the API to refresh/rotate the SSH key pair. You must add this public SSH key to the bastion host by appending ~/.ssh/authorized_keys for an ec2-user. Whenever the public SSH key is rotated, ensure to replace the public SSH key with the rotated public SSH key in the bastion host.

Bringing up a bastion host basically involves starting an EC2 instance using either the Qubole-provided AMI or other AMIs:

Bring up the Bastion Host using the Qubole-provided AMI

Use the image in the Amazon community labelled qubole-bastion-hvm-amzn-linux (a hardware virtual machine (HVM) image) available in all AWS regions. You can use the HVM image to bring up instances from older generations of instance families (m1 and m2 instance families).

The image contains a Qubole Public Key, which you can use to start an EC2 instance for bringing up a bastion host in the public subnet while creating a cluster.

Bring up the Bastion Host using other AMIs

If you are not using Qubole’s AMI to bring up bastion node, perform the following steps:

  1. Alter SSH configuration in /etc/ssh/sshd_config and set GatewayPorts to yes.
  2. After editing the SSH configuration file, restart ssh service by running sudo /etc/init.d/sshd restart.

Adding a Custom SSH Key to the Bastion Node

If you want to log in to the cluster, you must add a custom SSH key to the bastion node (that is brought up by Qubole-provided AMI/other AMIs).

Follow these steps to log into the bastion node through SSH:

  1. Use the SSH KeyPair that is used to launch the cluster instance, to log into the bastion node.
  2. Ensure that the machine from which you are using SSH, has access to the bastion node. Whitelisting the machine’s IP address in the bastion node’s security group guarantees the machine’s access to the bastion node. If there is no security group, then assign a security group to the bastion node as described in Assigning the Security Group to the Bastion Host.
  3. After logging into the bastion node, add the custom SSH key. This allows you to log into the bastion node and thus log in to the cluster.

For more information on how to log into clusters in Amazon EC2 and VPC, see:

Assigning the Security Group to the Bastion Host

Assign the Bastion Security Group to the bastion host.

The following figure illustrates a VPC with private and public subnets.

../../_images/VPC-PrivateSubnet.png

Creating a Cluster in the VPC

After creating a VPC, create a cluster within that VPC by performing the following additional step:

  1. Create a cluster with above created VPC’s vpc-id and a private subnet ID within that VPC. QDS UI supports specifying a private subnet ID for the corresponding VPC. See Modifying EC2 Settings (AWS) for more information. The QDS UI also allows you to add a bastion host DNS.

Additional Information