How to run osm2pgsql on multiple files using pexpect? Stuck on "Using PBF parser."

221 Views Asked by At

I am attempting to create a single SQL table from multiple .pbf files.

I am using osm2pgsql to load the file into a remote database, and am attempting to automate the process using python and pexpect.

While the first osm2pgsql command runs successfully, subsequent commands seem to get stuck after printing "Using PBF parser."

Here is my code:

child = pexpect.spawn('bash', timeout=20000)
child.logfile_read = sys.stdout.buffer # show output for debugging

filenames = os.listdir('pbf_files')
for i, filename in enumerate(filenames):

    print(filename)
    upload_command_args = [
        "pbf_files/{}".format(filename),
        "-l",
        "-s",
        "-d", db_name,
        "-U", username,
        "-P", port,
        "-H", host,
        "-W",
        "-S", "default.style",
        "-r", "pbf",
        "-p", table_name,
        "--hstore",
        ]

    # Need the append option since table already exists after first iteration
    if i > 0:
        upload_command_args = upload_command_args + ["--append"]

    print(upload_command_args)
    child.sendline('osm2pgsql ' + ' '.join(upload_command_args))
    child.expect('Password:')
    child.sendline('myFakePass')
    child.expect('Osm2pgsql took .+ overall')

child.close()
sys.exit(child.status)

The 0th iteration runs normally, but the 1st gets stuck after the shell prints:

Reading in file: pbf_files/my_partition_1.pbf
Using PBF parser.

Am I misunderstanding how .expect() works?

1

There are 1 best solutions below

0
On

appending takes a whole lot longer than the initial insertion. You might want to try using -C with a reasonable amount of cache (the default is 800MB). Also, we're talking about hours after the initial insertion. So perhaps you'd like to make sure that you always insert the biggest file first. If your files are extremely big, perhaps use --slim as well to make sure you don't crash when the cache is depleted.