Setting up a Data Store (AWS)¶
Airflow uses a data store to track the status of jobs, tasks, and other related information. QDS provisions Airflow clusters with a default, cluster-local data store for this purpose. This data store lasts only for the lifetime of the cluster.
For Airflow clusters running on AWS, Qubole recommends you also configure a persistent data store outside the cluster, to simplify the Airflow upgrade process and safeguard DAG metadata from cluster failures. To do this, proceed as follows.
- Configuring an external, persistent data store for your Airflow cluster is currently supported only on AWS.
- QDS Airflow clusters support MySQL, Amazon Aurora-MySQL, and Postgres data stores at present.
- Create a MySQL, Amazon Aurora-MySQL, or Postgres database in your Cloud account; you may want to name the database airflow for ease of identification.
- Use the Explore page in the QDS UI to add the data store you have created.
- Edit your Airflow cluster (from the Clusters section of the UI), and select your airflow database from the drop-down in the Data Store field under the Configuration tab. Select Update to save the change.