Amazon Linux 2 AMI

Introduction

Amazon Linux 2 is a Linux operating system from Amazon Web Services (AWS). It is designed to provide a stable, secure, and high-performance environment for running cloud applications.

AL2 is the successor of AL1, with many enhancements to optimize its environment for running virtual machines It includes the latest Linux Kernel and Kernel security updates, as well as Long Term Support (LTS) until June 30, 2025.

You can read more about AL2 here .

Prerequisites to launch AL2 clusters

There are no changes on running the cluster or setting the QDS environment. Although, there are 2 additional things that need to be configured before creating clusters with AL2 AMI’s (Amazon Machine Images):

  • Bastion Node - AL2 clusters can only be launched in Private subnets. Create a bastion instance in the Public subnet of the same VPC as cluster and set it to have network connectivity with the Private subnet.

  • Custom Hive Metastore - For AL2 clusters, a custom hive metastore is mandatory, so you must have a custom metastore configured before running an AL2 cluster and ensure the custom metastore has connectivity to the cluster master node.

Note

You can find the steps to setup Bastion Node and Custom Hive Metastore here:

Configuring a Cluster in a VPC with Public and Private Subnets (AWS)

Configuring a Private Subnet

Creating a Custom Hive Metastore

Terminology

In the QDS environments, Generation 1 means AL1 AMI and Generation 2 means AL2 AMI. Release means the QDS environment release version which is set as latest by default.

Creating and launching a cluster with AL2 AMI

To create a new cluster with AL2 AMI, when you access Create New Cluster, on the 1st step: Type, click the Customize drop down on the desired engine. Select ‘2’ under Generation and select ‘latest’ under Release.

../../_images/ClusterAL2.png

When you reach the 4th step: Advanced Configuration, select VPC and select Private Subnet.

In Bastion Node enter your bastion ec2 instance public IP.

Set any other parameter according to your use-case and click Create.

Launch the cluster and see if it started successfully or not.

Note

It is recommended that the user updates their Node Bootstrap file according to the commands and services supported by AL2.

Set AL2 AMI as default for an account

Currently, AL1 is the default image whenever you create a new cluster. However, if you want you can also set AL2 as default for an account. Follow the steps below:

  • Go to Control Panel > Account Settings

  • In the field Default Cluster Machine Image, select ‘2’ and ‘latest’ as shown in the image below:

../../_images/SetAL2Default.png
  • Click Save

The next time you create a new cluster, you don’t need to select between AL1 and AL2. The cluster will use AL2 by default.

Differences between the Qubole’s cluster release AL1 and AL2 AMI’s

Now, Qubole provides a new AMI based on Amazon Linux 2 image. The primary reason for providing a new one, is that the current Qubole AMI is based on a very old image, which leads to several issues:

  • Security - newer secure versions of some insecure packages are not available for this image.

  • Incompatibility with some required software, like the drivers required for AWS FSx for Lustre.

  • Unavailability of systemd, thus preventing us from exploiting the parallelism added for cluster start in cloudman.

The AL2 AMI is likely to be incompatible with some customer requirements, and will thus be provided only on an opt-in basis to begin with. So, it was decided to combine the availability of this AMI with some long-standing security requirements. Specifically, the (mis)use of well-known users like ec2-user and root. The following table shows the differences between what has been changed from AL1 to AL2 and what new things have been added in the AL2 images:

AL1 Image Behaviour

AL2 Image Behavior

1

Qubole managed Hive metastore is supported

Qubole managed Hive metastore is not supported

2

Optional to spin up Qubole operated compute clusters in customer VPC

Customer VPC and bastion for access are necessary

3

Supported HMS versions 1.2, 2.3, 3.1.1 (beta)

Supported HMS versions 2.3, 3.1.1 (beta)

4

ec2-user and root login is allowed

  1. root login disabled

  2. ec2-user removed

  3. New users qubole-support and qubole-user added. These users are members of the hadoop and presto groups on the cluster, so operations like jstack etc., may be performed conveniently.

5

Substantial number of python packages are installed by default

Default python package list is greatly reduced

6

Hadoop services are managed by monit

Hadoop services are managed by systemd

7

Most system executables are present in /bin

Most system executables moved to /usr/bin

8

Hive on Master is allowed

Hive on Master will not work

9

OpenJDK 7 is present and is used for Hadoop/Hive by default

OpenJDK 7 is not present

10

Python 2.6 and 3.4 are available

Python 2.6 and 3.4 are removed. Only 2.7 and 3.6 are present.

11

Commands not supported - Pig, data