
The default queue for the environment is defined in the airflow.cfg’s celery -> default_queue. When using the CeleryExecutor, the Celery queues that tasks are sent to can be specified. If connections with the same conn_id are defined in both Airflow metadata database and environment variables, only the one in environment variables will be referenced by Airflow. Then connection parameters must be saved in URI format. The connection information to external systems is stored in the Airflow metadata database and managed in the UI (Menu -> Admin -> Connections).Īirflow also has the ability to reference connections via environment variables from the operating system. The list of pools is managed in the UI (Menu -> Admin -> Pools) by giving the pools a name and assigning it a number of worker slots. Airflow pools can be used to limit the execution of parallelism on arbitrary sets of tasks. Some systems can get overwhelmed when too many processes hit them at the same time. Hooks implement a common interface when possible, and act as a building block for operators. Hooks are interfaces to external platforms and databases like Hive, S3, MySQL, Postgres, HDFS, and Pig. DAG assignment can be done explicitly when the operator is created, through deferred assignment, or even inferred from other operators.Īirflow official document recommends that you should setup operator relationships with bitshift operators rather than set_upstream() and set_downstream()Ĭhain and cross_downstream function provide easier ways to set relationships between operators in a specific situation.Ī task is a parameterized instance of an operatorĪ task that 1) has been assigned to a DAG and 2) has a state associated with a specific run of the DAG However, once an operator is assigned to a DAG, it can not be transferred or unassigned. Operators do not have to be assigned to DAGs immediately (previously dag was a required argument).
Branch operator airflow how to#
While DAGs describes how to run a workflow, Operators determine what actually gets done. This makes it easy to apply a common parameter to many operators without having to type it many times.ĭAGs can be used as context managers to automatically assign new operators to that DAG. That means the DAG must appear in globals().ĭefault arguments are passed to a DAG as default_args dictionary.

In Airflow DAG is a collection of all the tasks you want to run, organized in a way that reflects their relationships and dependencies.Īirflow will load any DAG object it can import from a DAG file.

Here are some core concepts you need to know to become productive in Airflow: Apache Airflow is a tool for describing, executing and monitoring workflows.
