Setting up an Airflow Cluster and an ETL in Airflow¶
Airflow is a platform for programmatically authoring, scheduling, and monitoring workflows. It supports integration with third-party platforms. You can author complex directed acyclic graphs (DAGs) of tasks inside Airflow. It comes packaged with a rich feature set, which is essential to the ETL world. The rich user interface and command-line utilities make it easy to visualize pipelines running in production, monitor progress, and troubleshoot issues as required. See Introduction to Airflow in Qubole for more information.
Creating a Data Store¶
Airflow uses a data store to maintain status of jobs/tasks and other related information. Qubole recommends using a persistent data store for for archiving in a history and recovery purpose. See Setting up a Data Store (AWS) for more information. See Understanding a Data Store for more information on how to create a data store using the QDS UI and Create a DbTap to create a data store using a REST API call.
Creating an Airflow Cluster¶
Registering a DAG within Airflow¶
You can submit Airflow commands through a QDS Shell command on an Airflow cluster. You can execute all Airflow commands available in CLI through the shell command. See Qubole Operator DAG to author a DAG using Qubole Operator.
See Registering a DAG on an Airflow Cluster for more information.