External files in DAG Airflow

I am trying to access external files in Airflow Task to read some sql and I am getting "file not found". Has anyone come across this?

from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from datetime import datetime, timedelta

dag = DAG(
    'my_dat',
    start_date=datetime(2017, 1, 1),
    catchup=False,
    schedule_interval=timedelta(days=1)
)

def run_query():
    # read the query
    query = open('sql/queryfile.sql')
    # run the query
    execute(query)

tas = PythonOperator(
    task_id='run_query', dag=dag, python_callable=run_query)

      

The log states the following:

IOError: [Errno 2] No such file or directory: 'sql/queryfile.sql'

      

I realize that I can just copy and paste the request into the same file, this is really not a neat solution. There are multiple requests and the text is really big, embedding it with Python code can impair readability.

+9


source to share


2 answers


Here's an example of using a variable to simplify it.

  • First add a variable in Airflow UI

    Admin

    Variable

    , for example.{key: 'sql_path', values: 'your_sql_script_folder'}

  • Then add the following code to the DAG to use the variable from the Airflow you just added.

DAG code:



import airflow
from airflow.models import Variable

tmpl_search_path = Variable.get("sql_path")

dag = airflow.DAG(
   'tutorial',
    schedule_interval="@daily",
    template_searchpath=tmpl_search_path,  # this
    default_args=default_args
)

      

  • Now you can use sql script name or path in Variable folder.

  • You can find out more in this

+6


source


All relative paths are taken against the AIRFLOW_HOME environment variable . Try:



  • Providing an absolute path
  • put the file relative to AIRFLOW_HOME
  • try writing the PWD in the python being called, then decide which path to provide (best option)
+2


source







All Articles