7 common mistakes to check when debugging Airflow DAGs

Tasks not running? DAG not working? The logs are not found? We had the same problems. Here is a list of common errors and some related fixes to keep in mind when debugging your Airflow deployment.







Apache Airflow has become the leading open source task scheduler for almost any kind of work, from training a machine learning model to general ETL orchestration. It's an incredibly flexible tool that we can tell from experience supports mission-critical projects for both five-person startups and Fortune 50 teams.







With that said, the very tool that many consider to be a powerful "blank canvas" can quickly become a double-edged sword if you're just starting out. And, unfortunately, there isn't a particularly overwhelming wealth of resources and best practices a step or two above the basic foundations of Apache Airflow.







In an effort to fill this gap as much as possible, we have compiled some of the most common problems that nearly every user faces, no matter how experienced and large their team is. Whether you're new to Airflow or a power user, check out this list of common mistakes and some related fixes to keep in mind.







1. Your DAG is not working at the required time



You wrote a new DAG that should start every hour. You set an hourly interval starting today at 2:00 pm and set a reminder to check it in a couple of hours. You check it at 3:30 pm and find that while your DAG did work, your logs indicate that there is only one recorded due date at 2:00 pm. What happened at 3 pm?







Before you jump into the top fix mode (you won't be the first), rest assured that this is the expected behavior. The functionality of the Airflow scheduler is a bit counterintuitive (and causes some controversy in the Airflow community), but you'll get the hang of it. Two things:







  • Airflow DAG schedule_interval



    .


    , schedule_interval



    . , DAG 2 , 3 . , Airflow , , 2 , .

    Airflow, , .
  • Airflow UTC.

    , , API, , , .

    , DAG- . , DAG 19:00 12:00 .

    1.10, Airflow , - DAG UTC .


2. DAG



, , DAG , - , .







,

datetime.now()





start_date



.







, DAG



,



. , , Airflow datetime.now()



.







DAG start_date



, Airflow , . Airflow DAG, datetime.now()



(.. ) , . , Airflow DAG 5-10 , .







DAG, (, datetime(2019,1,1)



) catchup=False ( ).







. DAG Airflow ( Play



). , , DAG, . , run_id



manual__



scheduled__



.







3. 503



Airflow , , -, , - -.

503, , -.







-



503 - ( deployment kubernetes), Airflow, Airflow. - , -.







, 503 , - (, Astronomer kubernetes CrashLoopBackOff



). deployment kubernetes, - - , (10 ), , . , .







deployment , , - DAG ( , ).









  1. -?

    Airflow 1.10 , Airflow 1.9, ( ), , 503- . , -.

    Astronomer, - 5 AU (Astronomer Units).







  2. -?

    - ( ), web_server_master_timeout



    web_server_worker_timeout



    .

    - Airflow , 503 (-). , deployment , , , 503.







  3. ?

    API, JSON , - - .

    Airflow DAG, ( ). , , , , .

    , , Python.









4.



, .







Sensors



Airflow 1.10.1 , , , , . , , , , .







, X , (sensors?), X-3 . , (sensors?), , ( (sensors?) ).







:







  1. DAG, .

    , β€” , .







    2. -









: Airflow v1.10.2 mode = reschedule



. , , up_for_reschedule



, .







5. ,



, , , , : Env



, + Worker Scheduler.







1. Env (Concurrency)



( ), β€” , DAG DAG ? , , , . , :







1. ()







  • , (parallel) DAG , . Β« Β».







  • ENV AIRFLOW__CORE__PARALLELISM=18
          
          







2. Concurrency DAG (dag_concurrency)







  • , DAG. Β« , DAGΒ».







  • ENV AIRFLOW__CORE__DAG_CONCURRENCY=16
          
          







3. (Nonpooledtaskslotcount)







  • , Β« Β», .







  • ENV AIRFLOW__CORE__NON_POOLED_TASK_SLOT_COUNT=256
          
          







4. DAG (maxactiverunsperdag)







  • , DAG DAG.







  • ENV AIRFLOW__CORE__MAX_ACTIVE_RUNS 3
          
          







5. Concurrency (worker_concurrency)







  • , . , CeleryExecutor 16 . Β« Β».







  • , , , dagconcurrency. 1 , , workerconcurrency



    = parallelism



    .







  • ENV AIRFLOW__CELERY__WORKER_CONCURRENCY=9
          
          







6. ()







  • . «» DAG , DAG. , , airflow.cfg



    .


: DAG API, «» β€” t, , .







2.



concurrency , , (deployment) . Astronomer, 5 AU Scheduler 10 AU Celery, .







, , :







  • DAG , , , 2 3 «» .
  • , , , , , Β« Β» , .


Executors Airflow Executors: Explained Guide.







6.



, - , .







- :







Failed to fetch log file from worker. Invalid URL 'http://:8793/log/staging_to_presentation_pipeline_v5/redshift_to_s3_Order_Payment_17461/2019-01-11T00:00:00+00:00/1.log': No host supplied
      
      





, :







  1. () , , , .







    /







  2. log_fetch_timeout_sec



    5 ( ).







    ( ), - (handshake) .













  3. Astronomer, Configure



    Astronomer.







  4. , 15 ?







    Astronomer, β€” , . , 15 .







  5. Celery, .







    , Kubernetes.







    Kubectl : kubectl exec -it {worker_name} bash









    ~/logs



    . DAG/TASK/RUN



    .







  6. .







    , , , . , Airflow , - .







    airflow.cfg



    run_duration



    . run_duration



    , -1



    , , , , run_duration



    , 3600



    , . . , , , .









Astronomer, :







  • AIRFLOW__SCHEDULER__RUN_DURATION={num_seconds_between_restarts}



    Astronomer
  • Run astro airflow deploy



    through the CLI to restart everything immediately (if you are using Celery you can take advantage of the worker termination grace period which you can use here to minimize existing immediate failures in task execution)


This list is based on our experience in helping Astronomer customers solve basic Airflow issues, but we want to hear from you. Feel free to contact us at people@astronomer.io if we missed something that you think would be useful to include.







If you have further questions or are looking for Airflow support from our team, please contact us here.








All Articles