How to copy many images fast with Python

725 Views Asked by At

I have a task to move millions of images from one network share drive to another in Windows. These are not very large images there are just many of them. I have 900,000 directories to move each containing 1-10 images. My goal is to leverage the OS to achieve maximum concurrency and I think asyncio might help me achieve this since most of the time will be spent waiting for network io. This is a snippet of what I have so far but it still seems too slow as it takes me 5 mins to move around 250MB of images. Here's a sample of what I've got so far, I'm not completely convinced my implementation is sound.

async def iter_copytree(src, dst):
    try:
        shutil.copytree(src, dst)
        return []
    except Exception e:
        return [e]

async def iter_dircmp(src, dst):
    dcmp = filecmp.dircmp(src, dst)
    if dcmp.funny_files or dcmp.diff_files:
         return [dcmp]
    return []

async def iter_rmtree(src):
    try:
        shutil.rmtree(src)
        return []
    except Exception as e:
        return [e]

async def iter_move(src, dst):
    if await iter_copytree(src, dst):
        return
    if await iter_dircmp(src, dst):
        return
    await iter_rmtree(src)

async def move_files(src_root, dst_root, file_names):
    tasks = [iter_move(os.path.join(src_root, i), ...) for i in file_names]
    await asyncio.gather(*tasks)


loop = asyncio.get_event_loop()
loop.run_until_complete(move_files(...))
1

There are 1 best solutions below

0
On

rsync, xcopy, robocopy multiple highly scalable solutions before you'd need to write a line of code