Single File Result Download¶
Qubole command results larger than 20MB are split into multiple result files (See About the Result File Size Limit). Our new Single file result download feature now stitches large results into a single result file.
When the AWS Multipart Upload limit is insufficient to complete the job, Qubole downloads the set of files to your cluster, stitches the results, and uploads a single result file to your defloc bucket (configured through Account Settings) along with the other result and log files.
Qubole provides you with a link to download the result file. This link is valid for 24 hrs and fails with an error upon expiry. Open the command in a new window to automatically generate a fresh download link.
Qubole recommends EBS upscaling to avoid an error scenario where the cluster runs out of local disk space while stitching results whose size exceeds the available disk space.
File merging is done only once. After the merged file is available, a reference to the result is stored. Result stitching is not triggered a second time.
- Contact Qubole Support to enable this feature for your account.
- Attach the advised cluster label, maintenance_single_file_result, to an existing or new cluster.
Ensure you use the exact cluster label mentioned above. Note that the value is case-sensitive.
- Update your AWS policies based on this sample policy.
For more information, see: What are some examples of policies I should use to delegate access to Qubole for my Cloud accounts?.
If you are using a Dual IAM role, update the Secondary/Dual IAM role.
|s3:ListMultipartUploadParts||Yes||Lists and completes the process.|
|s3:AbortMultipartUpload||No||Deletes intermediate files generated by the result stitching job and saves storage costs. Result stitching continues to work without this permission. While this policy is not mandatory, Qubole recommends you provide it.|
- Qubole recommends the following minimum cluster configuration:
|Master Node Type||c3.2xlarge|
|Worker Node Type||c3.large|
|Minimum Worker Nodes||1|
|Maximum Worker Nodes||1|
|EBS Volume Count||1|
|Enable EBS Upscaling||true|
|Maximum EBS Volume Count||3|
|Free Space Threshold (%)||35|
|Absolute Free Space Threshold (in GB)||Set as 35% of EBS volume size|
|Sampling Interval (in seconds)||10|
What can I do if I get an Access Denied error?¶
s3:ListBucket permissions to your role or include the AccessKey/SecretKey that you have configured with Qubole.