Known Issues
The known issues with this version are:
s3cmd Command Failures
s3cmd
is a tool for managing objects in Amazon S3 storage. As part of R57, s3cmd
package has been upgraded from version
1.5 to version 2.0.2.
After upgrading to QDS version R57, you may observe s3cmd
command failures in QDS commands or Airflow jobs. s3cmd
fails with return error code 1. If you are calling s3cmd
with -d option, you will also see the 403 Forbidden
error message.
It is a known issue.
Current Scenario
During internal testing, Qubole has discovered an issue where cp
and mv
commands in s3cmd
version 2.0.2 fail
in a very specific scenario.
cp
and mv
s3cmd commands fail with exit code 1 when you try to copy or move an object that you own in
another AWS account. This happens even when you have read/write access on the object/bucket. Note that
objects do get copied in this scenario but the response code returned is 1, which indicates that it is a failure.
Solution
If you observe s3cmd
command failures after upgrading to R57, you can prevent these failures by reverting to the
older version of s3cmd
by adding the following lines in the cluster’s node bootstrap file. Cluster Restart Required
pip uninstall s3cmd
pip install s3cmd==1.5.2
Note
You need to restart the cluster for the s3cmd version change to be effective.
Boto Client Error
Boto is a Python package that provides interfaces to Amazon Web Services. AWS has deprecated the V2 signature usage for new AWS regions created after January 2014. AWS will allow any new S3 bucket created post June 24, 2020 to only use V4 signature.
Qubole has made changes in R57 to use V4 signature for S3 client calls through boto by default through the /etc/boto.cfg
file. So, when using Boto with V4 signature, the host
parameter is required. If you had not provided the host parameter, the
BotoClientError: When using SigV4, you must specify a ‘host’ parameter
error appears. So, when connecting to S3 in
commands or in notebooks, you may hit this error.
(That is, the error is not an inevitable consequence of using V4 signature. It only occurs when the host parameter is not provided.)
Solution
Qubole recommends you to include the host=s3.amazonaws.com
parameter in Boto S3 connect calls. Until you add the
host parameter, you can prevent the Boto client error by running the following command through the node bootstrap.
Note
You must restart the cluster for the following command to be effective. Cluster Restart Required
rm /etc/boto.cfg
Removing the boto.cfg
file results in the client using the V2 signature. It may cause failures if the cluster software must
communicate with S3 buckets, which only support the V4 signature.