We are using astropy to perform large SKA simulations running in Dask on single nodes or cluster. We use both Time and astroplan.Observer in our calculations. Some fraction of the time, we see errors in access of the IERS data. For example:
rascil/processing_components/visibility/base.py:275: in create_blockvisibility
570 from astroplan import Observer
571 /usr/local/lib/python3.7/site-packages/astroplan/__init__.py:32: in <module>
572 get_IERS_A_or_workaround()
573 /usr/local/lib/python3.7/site-packages/astroplan/utils.py:57: in get_IERS_A_or_workaround
574 if IERS_A_in_cache():
575 /usr/local/lib/python3.7/site-packages/astroplan/utils.py:78: in IERS_A_in_cache
576 with _open_shelve(urlmapfn, True) as url2hash:
577 /usr/local/lib/python3.7/site-packages/astroplan/utils.py:322: in _open_shelve
578 shelf = shelve.open(shelffn, protocol=2)
579 /usr/local/lib/python3.7/shelve.py:243: in open
580 return DbfilenameShelf(filename, flag, protocol, writeback)
581 /usr/local/lib/python3.7/shelve.py:227: in __init__
582 Shelf.__init__(self, dbm.open(filename, flag), protocol, writeback)
583 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
584 > return mod.open(file, flag, mode)
585 E _gdbm.error: [Errno 11] Resource temporarily unavailable
586 /usr/local/lib/python3.7/dbm/__init__.py:94: error
Our interpretation is that this arises from multiple Dask workers trying to access the same cache file. It does occur whenever using multiple processes on the same node or processes spread across a cluster accessing a shared mount point. The same functions performed serially do not cause this error.
For workarounds we have tried the following:
iers.conf.auto_max_age = None
iers.conf.remote_timeout = 100.0
data.conf.download_cache_lock_attempts = 10
None of these have helped. We still see the same gdbm errors. Can you offer any suggestions?
Thanks.