Known Issues

The known issues with this version are:

s3cmd Command Failures
Boto Client Error

s3cmd Command Failures

s3cmd is a tool for managing objects in Amazon S3 storage. As part of R57, s3cmd package has been upgraded from version 1.5 to version 2.0.2.

After upgrading to QDS version R57, you may observe s3cmd command failures in QDS commands or Airflow jobs. s3cmd fails with return error code 1. If you are calling s3cmd with -d option, you will also see the 403 Forbidden error message.

It is a known issue.

Current Scenario

During internal testing, Qubole has discovered an issue where cp and mv commands in s3cmd version 2.0.2 fail in a very specific scenario.

cp and mv s3cmd commands fail with exit code 1 when you try to copy or move an object that you own in another AWS account. This happens even when you have read/write access on the object/bucket. Note that objects do get copied in this scenario but the response code returned is 1, which indicates that it is a failure.

Solution

If you observe s3cmd command failures after upgrading to R57, you can prevent these failures by reverting to the older version of s3cmd by adding the following lines in the cluster’s node bootstrap file. Cluster Restart Required

pip uninstall s3cmd
pip install s3cmd==1.5.2

Note

You need to restart the cluster for the s3cmd version change to be effective.

Boto Client Error

Boto is a Python package that provides interfaces to Amazon Web Services. AWS has deprecated the V2 signature usage for new AWS regions created after January 2014. AWS will allow any new S3 bucket created post June 24, 2020 to only use V4 signature.

Qubole has made changes in R57 to use V4 signature for S3 client calls through boto by default through the /etc/boto.cfg file. So, when using Boto with V4 signature, the host parameter is required. If you had not provided the host parameter, the BotoClientError: When using SigV4, you must specify a ‘host’ parameter error appears. So, when connecting to S3 in commands or in notebooks, you may hit this error.

(That is, the error is not an inevitable consequence of using V4 signature. It only occurs when the host parameter is not provided.)

Solution

Qubole recommends you to include the host=s3.amazonaws.com parameter in Boto S3 connect calls. Until you add the host parameter, you can prevent the Boto client error by running the following command through the node bootstrap.

Note

You must restart the cluster for the following command to be effective. Cluster Restart Required

rm /etc/boto.cfg

Removing the boto.cfg file results in the client using the V2 signature. It may cause failures if the cluster software must communicate with S3 buckets, which only support the V4 signature.