Setting up a Data Store (AWS)¶
Airflow uses a data store to track the status of jobs, tasks, and other related information. QDS provisions Airflow clusters with a default, cluster-local data store for this purpose. This data store lasts only for the lifetime of the cluster.
For Airflow clusters running on AWS, Qubole recommends you also configure a persistent data store outside the cluster, to simplify the Airflow upgrade process and safeguard DAG metadata from cluster failures. To do this, proceed as follows.
- Configuring an external, persistent data store for your Airflow cluster is currently supported only on AWS.
- QDS Airflow clusters support MySQL, Amazon Aurora-MySQL, and Postgres data stores at present.
Create a MySQL, Amazon Aurora-MySQL, or Postgres database in your Cloud account; you may want to name the database airflow for ease of identification.
It is recommended for you to use Postgres as data store. Mysql has the following limitations:
- Mysql date-time is not timezone aware.
- You can only run this if you set the Strict Mode as
disabled. If it is set as
enabled, all the insertions related to date and time will fail.
- Under the mysqld section in your my.cnf file, you need to set the value of
1for airflow to work. In the case of any value other than
1, it displays an error and fails.
Use the Explore page in the QDS UI to add the data store you have created.
Edit your Airflow cluster (from the Clusters section of the UI), and select your airflow database from the drop-down in the Data Store field under the Configuration tab. Select Update to save the change.