Apache Airflow is an open-source workflow orchestrator used to define, schedule, and monitor batch data pipelines as code. It is commonly selected when teams need explicit dependency management, reliable retries, and operational visibility across complex ETL and ML workflows.
- Python-based DAGs keep workflows version-controlled, testable, and reviewable alongside application code.
- Explicit task dependencies model multi-step pipelines and enforce correct execution order across systems.
- Flexible scheduling supports cron-like intervals, event-driven triggers, backfills, and catchup for historical reprocessing.
- Operational controls include retries, timeouts, SLAs, and alert callbacks to improve reliability and incident response.
- Rich observability provides task-level logs, run history, and a UI for debugging failures and bottlenecks.
- Scalable execution supports multiple executors such as Local, Celery, and Kubernetes to match workload and isolation needs.
- Extensible operators and provider packages integrate with common databases, warehouses, object storage, and APIs.
- Dynamic DAG patterns enable parameterized runs and programmatic task generation for large or variable pipelines.
- Centralized metadata enables audit-friendly tracking of runs, task states, and lineage-adjacent operational context.
- Role-based access control supports governance over who can view, trigger, and modify workflows.
Airflow is best suited for batch-oriented orchestration and dependency-heavy pipelines, not low-latency streaming execution. Teams should plan for operational overhead such as scheduler tuning, metadata database management, and disciplined DAG design to avoid brittle workflows.
Common alternatives include Prefect, Dagster, and Argo Workflows, with trade-offs in deployment model, developer experience, and orchestration scope. For core concepts and architecture details, see the Apache Airflow documentation.