I have a task to move millions of images from one network share drive to another in Windows. These are not very large images there are just many of them. I have 900,000 directories to move each containing 1-10 images. My goal is to leverage the OS to achieve maximum concurrency and I think asyncio might help me achieve this since most of the time will be spent waiting for network io. This is a snippet of what I have so far but it still seems too slow as it takes me 5 mins to move around 250MB of images. Here's a sample of what I've got so far, I'm not completely convinced my implementation is sound.
async def iter_copytree(src, dst):
try:
shutil.copytree(src, dst)
return []
except Exception e:
return [e]
async def iter_dircmp(src, dst):
dcmp = filecmp.dircmp(src, dst)
if dcmp.funny_files or dcmp.diff_files:
return [dcmp]
return []
async def iter_rmtree(src):
try:
shutil.rmtree(src)
return []
except Exception as e:
return [e]
async def iter_move(src, dst):
if await iter_copytree(src, dst):
return
if await iter_dircmp(src, dst):
return
await iter_rmtree(src)
async def move_files(src_root, dst_root, file_names):
tasks = [iter_move(os.path.join(src_root, i), ...) for i in file_names]
await asyncio.gather(*tasks)
loop = asyncio.get_event_loop()
loop.run_until_complete(move_files(...))
rsync, xcopy, robocopy multiple highly scalable solutions before you'd need to write a line of code