When uploading 100 files of 100 bytes each with SFTP, it takes 17 seconds here (after the connection is established, I don't even count the initial connection time). This means it's 17 seconds to transfer 10 KB only, i.e. 0.59 KB/sec!
I know that sending SSH commands to open
, write
, close
, etc. probably creates a big overhead, but still, is there a way to speed up the process when sending many small files with SFTP?
Or a special mode in paramiko
/ pysftp
to keep all the writes operations to do in a memory buffer (let's say all operations for the last 2 seconds), and then do everything in one grouped pass of SSH/SFTP? This would avoid to wait for the ping time between each operation.
Note:
- I have a ~ 100 KB/s connection upload speed (tested 0.8 Mbit upload speed), 40 ms ping time to the server
- Of course, if instead of sending 100 files of 100 bytes, I send 1 file of 10 KB bytes, it takes < 1 second
- I don't want to have to run a binary program on remote, only SFTP commands are accepted
import pysftp, time, os
with pysftp.Connection('1.2.3.4', username='root', password='') as sftp:
with sftp.cd('/tmp/'):
t0 = time.time()
for i in range(100):
print(i)
with sftp.open('test%i.txt' % i, 'wb') as f: # even worse in a+ append mode: it takes 25 seconds
f.write(os.urandom(100))
print(time.time() - t0)
I'd suggest you to parallelize the upload using multiple connections from multiple threads. That's easy and reliable solution.
If you want to do the hard way by using buffering the requests, you can base your solution on the following naive example.
The example:
If I do plain
SFTPClient.put
for 100 files, it takes about 10-12 seconds. Using the code below, I achieve the same about 50-100 times faster.But! The code is really naive:
upload.localhandle.read(32*1024)
. That's true for small files only.SFTPClient
class.