Setting up an Airflow Cluster and an ETL in Airflow

Airflow is a platform for programmatically authoring, scheduling, and monitoring workflows. It supports integration with third-party platforms. You can author complex directed acyclic graphs (DAGs) of tasks inside Airflow. It comes packaged with a rich feature set, which is essential to the ETL world. The rich user interface and command-line utilities make it easy to visualize pipelines running in production, monitor progress, and troubleshoot issues as required. See Introduction to Airflow in Qubole for more information.

Creating a Data Store

Airflow uses a data store to maintain status of jobs/tasks and other related information. Qubole recommends using a persistent data store for for archiving in a history and recovery purpose. See Setting up a Data Store (AWS) for more information. See Understanding a Data Store for more information on how to create a data store using the QDS UI and Create a DbTap to create a data store using a REST API call.

Creating an Airflow Cluster

See Configuring an Airflow Cluster to create a cluster using the QDS UI and Create a New Airflow Cluster to create a new cluster using a REST API call.

Registering a DAG within Airflow

You can submit Airflow commands through a QDS Shell command on an Airflow cluster. You can execute all Airflow commands available in CLI through the shell command. See Qubole Operator DAG to author a DAG using Qubole Operator.

See Registering a DAG on an Airflow Cluster for more information.