Airflow: Tasks are queued but not running
I am new to airflow and am trying to set up airflow to run ETL pipelines. I was able to install
- air flow
- Postgres
- celery
- RabbitMQ
I can test a test tour. When I try to schedule jobs, the scheduler can pick it up and queue up jobs that I could see in the UI, but the jobs don't run. Can anyone help me solve this problem? I find I am missing the most basic airflow concept here. below is airflow.cfg
Here is my config file:
[core]
airflow_home = /root/airflow
dags_folder = /root/airflow/dags
base_log_folder = /root/airflow/logs
executor = CeleryExecutor
sql_alchemy_conn = postgresql+psycopg2://xxxx.amazonaws.com:5432/airflow
api_client = airflow.api.client.local_client
[webserver]
web_server_host = 0.0.0.0
web_server_port = 8080
web_server_worker_timeout = 120
worker_refresh_batch_size = 1
worker_refresh_interval = 30
[celery]
celery_app_name = airflow.executors.celery_executor
celeryd_concurrency = 16
worker_log_server_port = 8793
broker_url = amqp://rabbit:rabbit@x.x.x.x/rabbitmq_vhost
celery_result_backend = db+postgresql+psycopg2://postgres:airflow@xxx.amazonaws.com:5432/airflow
flower_host = 0.0.0.0
flower_port = 5555
default_queue = default
DAG: This is the tutorial I used
and the start date for my dag is 'start_date': datetime (2017, 4, 11),
source to share
start all three components of the airflow, namely:
airflow webserver
airflow scheduler
airflow worker
If you only use the previous two, the tasks will be queued but not completed. airflow worker will provide workers who actually do the dags.
Also btw, celery 4.0.2 is not compatible with airflow 1.7 or 1.8 at the moment. Use celery instead 3.
source to share
I also tried to upgrade to airflow v1.8 today and struggled with celery and rabbit. What helped change from librabbitmq (which is the default when using amqp) for pyamqp in airflow.cfg
broker_url = pyamqp://rabbit:rabbit@x.x.x.x/rabbitmq_vhost
(This is where I got this idea from: https://github.com/celery/celery/issues/3675 )
source to share
I realize your problem was already answered and was due to a mismatch in the celery version, but I also saw the task queue and never started because I changed the location of the logs to a location where the airflow service user was not allowed to write.
In the above example airflow.cfg:
base_log_folder = /root/airflow/logs
I am using an AWS EC2 machine and changing the logs to write
base_log_folder = /mnt/airflow/logs
The UI has no indication of why the tasks are queued, it just says "unknown, all dependencies met ..." Granting airflow / service daemon user permissions to write fixed.
source to share
If an option suits you LocalExecutor
, you can always try to return to it. I've heard of some problems with CeleryExecutor
.
Just replace executor = CeleryExecutor
with executor = LocalExecutor
in your file airflow.cfg
(most of the time ~/airflow/airflow.cfg
).
Restart the scheduler and that's it!
source to share