Airflow: Tasks are queued but not running

I am new to airflow and am trying to set up airflow to run ETL pipelines. I was able to install

  1. air flow
  2. Postgres
  3. celery
  4. RabbitMQ

I can test a test tour. When I try to schedule jobs, the scheduler can pick it up and queue up jobs that I could see in the UI, but the jobs don't run. Can anyone help me solve this problem? I find I am missing the most basic airflow concept here. below is airflow.cfg

Here is my config file:

[core]

airflow_home = /root/airflow

dags_folder = /root/airflow/dags

base_log_folder = /root/airflow/logs

executor = CeleryExecutor

sql_alchemy_conn = postgresql+psycopg2://xxxx.amazonaws.com:5432/airflow

api_client = airflow.api.client.local_client


[webserver]


web_server_host = 0.0.0.0

web_server_port = 8080

web_server_worker_timeout = 120

worker_refresh_batch_size = 1

worker_refresh_interval = 30

[celery]

celery_app_name = airflow.executors.celery_executor

celeryd_concurrency = 16

worker_log_server_port = 8793

broker_url = amqp://rabbit:rabbit@x.x.x.x/rabbitmq_vhost

celery_result_backend = db+postgresql+psycopg2://postgres:airflow@xxx.amazonaws.com:5432/airflow


flower_host = 0.0.0.0

flower_port = 5555

default_queue = default

      

DAG: This is the tutorial I used

and the start date for my dag is 'start_date': datetime (2017, 4, 11),

+6


source to share


4 answers


start all three components of the airflow, namely:

airflow webserver
airflow scheduler
airflow worker

      



If you only use the previous two, the tasks will be queued but not completed. airflow worker will provide workers who actually do the dags.

Also btw, celery 4.0.2 is not compatible with airflow 1.7 or 1.8 at the moment. Use celery instead 3.

+8


source


I also tried to upgrade to airflow v1.8 today and struggled with celery and rabbit. What helped change from librabbitmq (which is the default when using amqp) for pyamqp in airflow.cfg

broker_url = pyamqp://rabbit:rabbit@x.x.x.x/rabbitmq_vhost

      



(This is where I got this idea from: https://github.com/celery/celery/issues/3675 )

0


source


I realize your problem was already answered and was due to a mismatch in the celery version, but I also saw the task queue and never started because I changed the location of the logs to a location where the airflow service user was not allowed to write.

In the above example airflow.cfg: base_log_folder = /root/airflow/logs

I am using an AWS EC2 machine and changing the logs to write base_log_folder = /mnt/airflow/logs

The UI has no indication of why the tasks are queued, it just says "unknown, all dependencies met ..." Granting airflow / service daemon user permissions to write fixed.

0


source


If an option suits you LocalExecutor

, you can always try to return to it. I've heard of some problems with CeleryExecutor

.

Just replace executor = CeleryExecutor

with executor = LocalExecutor

in your file airflow.cfg

(most of the time ~/airflow/airflow.cfg

).

Restart the scheduler and that's it!

0


source







All Articles