I have been using Apache Airflow (v2.6.4) in a Docker setup, utilizing the official docker-compose file to spin up the necessary containers. Within my setup, I have a DAG that converts Excel files to CSV format, using the xlrd library.
The xlrd library (v2.0.1) is installed inside the scheduler container, and I have verified through a bash script that I am using only one Python version and that xlrd is listed in the pip packages.
When the xlrd library is not installed, the airflow web server complains about the missing xlrd library. However, after installing it, the error goes away. The problem arises when I run the DAG, as it fails without providing any descriptive log or error message. This is error log that I am getting:
Log file does not exist: /opt/airflow/logs/dag_id=test_fusion_files_DAG/run_id=manual__2023-08-07T18:02:23.451157+00:00/task_id=read_general_tests_configuration_file/attempt=1.log
*** Fetching from: http://:8793/log/dag_id=test_fusion_files_DAG/run_id=manual__2023-08-07T18:02:23.451157+00:00/task_id=read_general_tests_configuration_file/attempt=1.log
*** Failed to fetch log file from worker. Request URL is missing an 'http://' or 'https://' protocol.
I checked the log file and everything seems to be okay.
Interestingly, if I remove the xlrd library import from the DAG or avoid converting xls files, the DAG runs successfully and provides descriptive log messages.
I have checked the logs file inside the bash container, but there is nothing relevant to the specific DAG runs. I have also encountered similar issues with other libraries, such as voluptuous.
I would appreciate any insights or suggestions regarding this issue. Thank you in advance.