I just upgraded 2 machines from Fedora 31 to 33, and with that upgrade, Perl went from 5.30.3 to 5.32.1.
The first thing I noticed is that GDBM_File.pm is no longer included in the Perl Core, but that was no problem.
The second thing I noticed is that GDBM write in fc33/perl5.32.1 is incredibly slow. That's a problem.
I noticed something amiss on the first machine, so I ran a little benchmark with fc31/perl5.30.3 on the second machine before doing the upgrade.
gdbm1.pl is rebuilding a db file from an ascii text file, about 33M entries. gdbm0.pl is reading the same ascii text file, and doing everything exactly the same as gdbm1.pl, except not executing the actual hash assignments "$db{...} = ...". That is the only diff. (The ascii file is around 11GB.)
FC31/Perl5.30.3:
[259] time ./gdbm0.pl 16
real 4m51.593s
user 4m49.808s
sys 0m1.306s
[260] time ./gdbm1.pl 16
real 11m39.682s
user 6m30.619s
sys 3m19.260s
FC33/Perl5.32.1:
[287] time ./gdbm0.pl 16
real 5m10.379s
user 5m8.764s
sys 0m1.299s
[288] time ./gdbm1.pl 16
real 554m48.187s
user 7m49.315s
sys 433m42.435s
Obviously it takes longer to write the DB than not: I always expect gdbm0.pl to be faster than gdbm1.pl. But the only diff btwn gdbm0 and gdbm1 is writing the DB, so the time diff is all due to that. On fc31/perl5.30.3, that diff is under 7m. On fc33/perl5.32.1, the time diff is a staggering 550m - over 9 HOURS, vs 7 MINUTES before.
I've done some web searches for anything about GDBM_File being slow in perl5.32.1, I've found nothing. I don't even know if Perl is the problem, it might be fc33, or some combination of both.
Or it could be that some C lib is missing in fc33, and GDBM_File is doing everything in native perl. I don't know where to go from here.
Update:
@davem: Okay, I have 3 machines: a, b, c. "a" is oldest and slowest, "c" is newest and fastest. Machine "a" runs ubuntu, the other two both run fedora:
Linux a 5.8.0-50-generic #56~20.04.1-Ubuntu SMP Mon Apr 12 21:46:35 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Linux b 5.11.18-200.fc33.x86_64 #1 SMP Mon May 3 15:05:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Linux c 5.11.18-200.fc33.x86_64 #1 SMP Mon May 3 15:05:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
I ran two benchmarks on each machine, first with an in-mem hash, then with a gdbm hash. The in-mem hash result gives a very rough idea of the relative single-thread performance of each machine:
[mem] time perl -e'my %h; $h{$_} = 1 for ("a" .. "zzzzz"); print "@{[scalar(keys(%h))]}\n";'
[gdbm] time perl -e'use GDBM_File; my ($h, %h); $h = "gdbm_write_test"; tie(%h, "GDBM_File", $h, GDBM_NEWDB, 0600); $h{$_} = 1 for ("a" .. "zzzzz"); print "@{[scalar(keys(%h))]}\n"; untie(%h);'
machine_a:
[mem] 12356630
real 0m29.051s
user 0m27.975s
sys 0m0.995s
[gdbm] 12356630
real 4m5.431s
user 2m2.033s
sys 1m36.209s
machine_b:
[mem] 12356630
real 0m12.101s
user 0m11.520s
sys 0m0.559s
[gdbm] 12356630
real 106m35.326s
user 1m0.607s
sys 103m48.518s
machine_c:
[mem] 12356630
real 0m9.498s
user 0m9.163s
sys 0m0.317s
[gdbm] 12356630
real 58m46.555s
user 0m39.566s
sys 48m16.447s
Update 2:
I spent a while fiddling with Perl-DB_File and Perl-BerkeleyDB as possible replacements for Perl-GDBM_File. Because I was too lazy to try to figure out how to file a bug.
False laziness, of course. I finally filed a bug just 2 days ago, and there's already a fix checked in and pending release.
@davem was exactly right, the issue was not Perl itself, but the underlying gdbm library. From the fix commit comment:
"Commit 4fb2326a4a introduced pre-reading of memory mapped regions. While speeding up searches, it has a negative impact on write operatons, since every remapping effectively re-reads the entire database."