External files in DAG Airflow
I am trying to access external files in Airflow Task to read some sql and I am getting "file not found". Has anyone come across this?
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from datetime import datetime, timedelta
dag = DAG(
'my_dat',
start_date=datetime(2017, 1, 1),
catchup=False,
schedule_interval=timedelta(days=1)
)
def run_query():
# read the query
query = open('sql/queryfile.sql')
# run the query
execute(query)
tas = PythonOperator(
task_id='run_query', dag=dag, python_callable=run_query)
The log states the following:
IOError: [Errno 2] No such file or directory: 'sql/queryfile.sql'
I realize that I can just copy and paste the request into the same file, this is really not a neat solution. There are multiple requests and the text is really big, embedding it with Python code can impair readability.
source to share
Here's an example of using a variable to simplify it.
-
First add a variable in
Airflow UI
→Admin
→Variable
, for example.{key: 'sql_path', values: 'your_sql_script_folder'}
-
Then add the following code to the DAG to use the variable from the Airflow you just added.
DAG code:
import airflow
from airflow.models import Variable
tmpl_search_path = Variable.get("sql_path")
dag = airflow.DAG(
'tutorial',
schedule_interval="@daily",
template_searchpath=tmpl_search_path, # this
default_args=default_args
)
-
Now you can use sql script name or path in Variable folder.
-
You can find out more in this
source to share