Installing and Configuring Airflow on Ubuntu Server 20

The first time I installed Airflow by tutorials in 1 hour. It opened fine from the site, looked nice, but unfortunately it didn't work.





It took me another 10-15 hours for further reinstallation and debugging.





I am writing this article in hot pursuit, I will try to note all the problems that I had to face. Some questions were answered only on the page on the 10th English Google. Even in the English version of the Airflow manuals, there is not information on all issues.





For a start, an unobvious fact: when you start installing airflow, you think it will be one program. In fact, this is not at all the case. These are 2 services:





  • airflow-webserver - responsible for the part that you see in the web interface





  • airflow-scheduler - responsible for launching DAGs and in general for the ETL part





Accordingly, you need to configure them separately. If something does not work, the problem may be in only one of these services, or in both at the same time. Correct error localization will help to cut the debugging time in half. To understand what has fallen, you need to look at their status:





systemctl status airflow-webserver
systemctl status airflow-scheduler
      
      



The system log also helps a lot: / var / log / syslog





But we will use this at the debugging stage, and first we need to install everything.





Airflow - , . . ubuntu pip - .





python 3. .





:





apt update
apt install software-properties-common
add-apt-repository ppa:deadsnakes/ppa
apt install python3.8
      
      



, :





python3 ––version
      
      



pip





apt install python3-pip
      
      



Airflow

Airflow ,





export AIRFLOW_HOME=~/airflow/
      
      



- root, , , .





, Airflow:





pip3 install apache-airflow
      
      



:





  • airflow-webserver.pid -   web-,





  • airflow.cfg -   Airflow, -





  • airflow.db - SQLite - .





  • unittests.cfg





  • webserver_config.py





, Airflow :





mkdir dags
      
      



, . , airflow /. - - .





airflow.cfg dags_folder





- - :





systemctl start airflow-webserver
systemctl start airflow-scheduler
      
      



- - 8080 ip .





, , postgress:





PostgresSQL Airflow

PostgreSQL:





apt-get install postgresql
      
      



 postgres. :





sudo -u postgres psql
      
      



Airflow:





postgres=# create database airflow_metadata;

postgres=# CREATE USER airflow WITH password 'password';

postgres=# grant all privileges on database airflow_metadata to airflow;
      
      



Airflow :





airflow.cfg





   sql_alchemy_conn



  postgresql+psycopg2://airflow:password@localhost/airflow_metadata







psycopg2, , - , :





pip3 install psycopg2-binary
      
      



:





airflow initdb
      
      



- - :





systemctl restart airflow-webserver
systemctl restart airflow-scheduler
      
      



Airflow, web-:





airflow users create --username AirflowAdmin --firstname name1 --lastname name2 --role Admin --email airflow@airflow.com
      
      



.





- ,

airflow , home root, - .





- airflow.





root - airflow - , root - - .





airflow root, airflow.





grep root ./*
      
      



- .





/usr/lib/systemd/system , :





airflow-webserver.service

[Unit]





Description=Airflow webserver daemon





After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service





Wants=postgresql.service mysql.service redis.service rabbitmq-server.service











[Service]





EnvironmentFile=/etc/sysconfig/airflow





User=airflow





Group=airflow





Type=simple





ExecStart=/usr/local/bin/airflow webserver --pid /airflow/airflow-webserver.pid





Restart=on-failure





RestartSec=5s





PrivateTmp=true











[Install]





WantedBy=multi-user.target





--pid /airflow/airflow-webserver.pid , airflow-webserver.pid - .





airflow-scheduler.service

[Unit]





Description=Airflow scheduler daemon





After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service





Wants=postgresql.service mysql.service redis.service rabbitmq-server.service











[Service]





EnvironmentFile=/etc/sysconfig/airflow





User=airflow





Group=airflow





Type=simple





ExecStart=/usr/local/bin/airflow scheduler





Restart=always





RestartSec=10s











[Install]





WantedBy=multi-user.target





:





/etc/sysconfig/ airflow - AIRFLOW_CONFIG AIRFLOW_HOME





:





daemon-reload
systemctl restart airflow-scheduler
systemctl restart airflow-webserver
      
      



: Airflow. "login failed".





, - , - , .





(/var/log/syslog):




, , .





, , :





systemctl status airflow-webserver

 airflow-webserver.service - Airflow webserver daemon





     Loaded: loaded (/lib/systemd/system/airflow-webserver.service; enabled; vendor preset: enabled)





     Active: activating (auto-restart) (Result: exit-code) since Tue 2021-03-16 18:00:03 MSK; 2s ago





    Process: 761523 ExecStart=/usr/local/bin/airflow webserver --pid /run/airflow/webserver.pid (code=exited, status=1/FAILURE)





   Main PID: 761523 (code=exited, status=1/FAILURE)





Mar 16 18:00:03 digitalberd systemd[1]: airflow-webserver.service: Main process exited, code=exited, status=1/FAILURE





Mar 16 18:00:03 digitalberd systemd[1]: airflow-webserver.service: Failed with result 'exit-code'.





8080 Airflow , .





, : systemctl stop airflow-webserver: , , 8080 -.





? , :





lsof -i tcp:8080
      
      



it turned out that after stopping airflow-webserver gunicorn remained running, which occupied port 8080 and rendered the interface.





After killing him by ID and restarting the web server, everything finally worked fine.





Look like that's it. If you forgot something or there are still problems during installation - write, I will add it to the article.








All Articles