8. Will HDFS be affected by cluster auto-scaling?

No. Qubole always removes nodes from HDFS gracefully before terminating them. This allows HDFS to safely replicate data to surviving nodes of a cluster and as a result data stored in HDFS is not lost. See Downscaling.

However, HDFS lasts only for the lifetime of a cluster and its contents are lost when the cluster is terminated. Hence we recommend using HDFS only as a temporary data store for intermediate data output by jobs and queries (for example, MapReduce shuffle data).