use asyncio and aiofiles to generate flat files asynchronously isn't working perfectly

1.7k Views Asked by At

I'm learning async programming in python, and knowing asyncio is the latest package we should use, so I tried to write a simple script to generate some files asynchronously(I used threading to generate files in parallel, and it works perfectly). Since write IO costs most of the time when generating a file, what here is my script.

Versions

(py37) C:\Users\Hong\Desktop>
(py37) C:\Users\Hong\Desktop>pip freeze
aiofiles==0.4.0
asn1crypto==0.24.0
certifi==2018.8.24
cffi==1.11.5
chardet==3.0.4
cryptography==2.3.1
idna==2.7
pycparser==2.18
pyOpenSSL==18.0.0
PySocks==1.6.8
requests==2.19.1
six==1.11.0
urllib3==1.23
win-inet-pton==1.0.1
wincertstore==0.2

(py37) C:\Users\Hong\Desktop>python --version
Python 3.7.0

(py37) C:\Users\Hong\Desktop>

Asynchronous way

import os
import asyncio
import aiofiles
import time
import datetime
import urllib

async def produce_content(c):
    return c*1000

async def create_file(file_name):
    tmp_file = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'tmp.txt')
    try:
        async with aiofiles.open(tmp_file, mode='r') as rf:
            print('--> read start', datetime.datetime.now(), file_name)
            content = await rf.read()
            print('--> read end', datetime.datetime.now(), file_name)
            content = await produce_content(content)
            async with aiofiles.open(file_name, mode='w') as wf:
                print('--> write start', datetime.datetime.now(), file_name)
                await wf.write(content)
                await wf.flush()
                print('--> write end', datetime.datetime.now(), file_name)
    except Exception as e:
        print(e)
        raise e
    #return file_name

id = 0
async def my_action(file_name):
    global id
    id += 1
    local_id = id
    print('start to run %s'%local_id, datetime.datetime.now())
    await create_file(file_name)
    print('end to run %s'%local_id, datetime.datetime.now())

def run():
    files = [
        os.path.join(os.path.dirname(os.path.abspath(__file__)), 'f%s.txt'%i) for i in range(0,3)
    ]

    start_ts = datetime.datetime.now()
    print('start', start_ts)

    loop = asyncio.get_event_loop()
    tasks = [asyncio.ensure_future(my_action(f)) for f in files]
    try:
        loop.run_until_complete(asyncio.wait(tasks))
    finally:
        loop.close()

    end_ts = datetime.datetime.now()
    print('end', end_ts)
    print('time elapse', end_ts-start_ts)


if __name__=='__main__':
    run()

In my example, tmp.txt is a file the size is 240K, I used it as a base and create target file which is 1000 times bigger than it. In order to compare the time cost between asynchronous way and synchronous way, here is the synchronous way, replace the create_file body with the below (just use regular method instead of aiofiles)

async def create_file(file_name):
    tmp_file = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'tmp.txt')
    try:
        with open(tmp_file, mode='r') as rf:
            print('--> read start', datetime.datetime.now(), file_name)
            content = rf.read()
            print('--> read end', datetime.datetime.now(), file_name)
            content = produce_content(content)
            with open(file_name, mode='w') as wf:
                print('--> write start', datetime.datetime.now(), file_name)
                wf.write(content)
                print('--> write end', datetime.datetime.now(), file_name)
    except Exception as e:
        print(e)
        raise e

There is the result

(py37) C:\Users\Hong\Desktop>python non_async.py
start 2018-09-16 22:33:12.929901
start to run 1 2018-09-16 22:33:12.929901
--> read start 2018-09-16 22:33:12.929901 C:\Users\Hong\Desktop\f0.txt
--> read end 2018-09-16 22:33:12.945520 C:\Users\Hong\Desktop\f0.txt
--> write start 2018-09-16 22:33:13.531200 C:\Users\Hong\Desktop\f0.txt
--> write end 2018-09-16 22:33:19.701563 C:\Users\Hong\Desktop\f0.txt
end to run 1 2018-09-16 22:33:19.831177
start to run 2 2018-09-16 22:33:19.831177
--> read start 2018-09-16 22:33:19.831177 C:\Users\Hong\Desktop\f1.txt
--> read end 2018-09-16 22:33:19.846803 C:\Users\Hong\Desktop\f1.txt
--> write start 2018-09-16 22:33:20.483649 C:\Users\Hong\Desktop\f1.txt
--> write end 2018-09-16 22:33:26.917791 C:\Users\Hong\Desktop\f1.txt
end to run 2 2018-09-16 22:33:27.073904
start to run 3 2018-09-16 22:33:27.073904
--> read start 2018-09-16 22:33:27.075903 C:\Users\Hong\Desktop\f2.txt
--> read end 2018-09-16 22:33:27.085896 C:\Users\Hong\Desktop\f2.txt
--> write start 2018-09-16 22:33:27.807891 C:\Users\Hong\Desktop\f2.txt
--> write end 2018-09-16 22:33:34.627992 C:\Users\Hong\Desktop\f2.txt
end to run 3 2018-09-16 22:33:34.746507
end 2018-09-16 22:33:34.762129
time elapse 0:00:21.832228

(py37) C:\Users\Hong\Desktop>
(py37) C:\Users\Hong\Desktop>
(py37) C:\Users\Hong\Desktop>
(py37) C:\Users\Hong\Desktop>python async.py
start 2018-09-16 22:33:50.945612
start to run 1 2018-09-16 22:33:50.948609
start to run 2 2018-09-16 22:33:50.953824
start to run 3 2018-09-16 22:33:50.953824
--> read start 2018-09-16 22:33:50.953824 C:\Users\Hong\Desktop\f0.txt
--> read start 2018-09-16 22:33:50.953824 C:\Users\Hong\Desktop\f1.txt
--> read start 2018-09-16 22:33:50.969449 C:\Users\Hong\Desktop\f2.txt
--> read end 2018-09-16 22:33:50.985078 C:\Users\Hong\Desktop\f0.txt
--> read end 2018-09-16 22:33:51.525238 C:\Users\Hong\Desktop\f1.txt
--> read end 2018-09-16 22:33:52.057857 C:\Users\Hong\Desktop\f2.txt
--> write start 2018-09-16 22:33:52.643887 C:\Users\Hong\Desktop\f0.txt
--> write start 2018-09-16 22:33:57.036816 C:\Users\Hong\Desktop\f1.txt
--> write start 2018-09-16 22:34:01.509756 C:\Users\Hong\Desktop\f2.txt
--> write end 2018-09-16 22:34:05.952100 C:\Users\Hong\Desktop\f0.txt
--> write end 2018-09-16 22:34:05.952100 C:\Users\Hong\Desktop\f1.txt
end to run 1 2018-09-16 22:34:06.105765
end to run 2 2018-09-16 22:34:06.206030
--> write end 2018-09-16 22:34:07.393667 C:\Users\Hong\Desktop\f2.txt
end to run 3 2018-09-16 22:34:07.525176
end 2018-09-16 22:34:07.525176
time elapse 0:00:16.579564

(py37) C:\Users\Hong\Desktop>

Asynchronous method does run faster than synchronous method (16.6s v.s. 21.8s), but my expectation is that async should run more faster... When we look at the log, we can see read tmp file actually done very closely.

--> read end 2018-09-16 22:33:50.985078 C:\Users\Hong\Desktop\f0.txt
--> read end 2018-09-16 22:33:51.525238 C:\Users\Hong\Desktop\f1.txt
--> read end 2018-09-16 22:33:52.057857 C:\Users\Hong\Desktop\f2.txt

But write start are not close

--> write start 2018-09-16 22:33:52.643887 C:\Users\Hong\Desktop\f0.txt
--> write start 2018-09-16 22:33:57.036816 C:\Users\Hong\Desktop\f1.txt
--> write start 2018-09-16 22:34:01.509756 C:\Users\Hong\Desktop\f2.txt

What I expect is that "write start" should be very close to "read end" for each task, because produce_content should only take very little time, but why "write start" for each task are so differently?

Thanks,

Hong

0

There are 0 best solutions below