Improve efficiency of sqlite3 query in python

Question

Improve efficiency of sqlite3 query in python

66 Views Asked by Squatch At 18 August 2025 at 07:35

I have a table with around 7 million rows of geographical regions where usage of a communication protocol results in channel conflicts. A simplified example of this table is as follows, where HexGeoBinId comes from HexGeoBins table and ChannelId is from Channels table.

Id	HexGeoBinId	ChannelId
1	450187	1
2	450187	2
3	1200447	7
4	1050395	7
5	1650603	16
6	1650603	18
7	1650603	19
8	1650603	20
9	450187	20
10	450187	22

ChannelId links to the table Channels with the following structure:

Id	Protocol	Frequency
0	protocol1	1005
1	protocol2	1005
2	protocol3	1005
3	protocol1	1010
4	protocol2	1010
5	protocol3	1010
6	protocol1	1015
7	protocol2	1015
8	protocol3	1015
9	protocol1	1020

I am using the below code to loop through all non-repetitive permutations of three potential frequencies, where the frequency can be 1000 to 1200 in steps of 5, to find the number of geographical regions that still have channel conflicts. This essentially counts the number of geographical bins that contain conflicts on all three frequencies.

I can get individual results with nested for loops and individual queries (query below). Each query takes over 60s to process.

    separation = 10
    f = list(map(int, np.arange(1000, 1200, 5)));
    protocol = 'protocol1'

    query = '''SELECT COUNT(DISTINCT("Hex1")) FROM 
    (SELECT HexGeoBins.HexId AS "Hex1", Channels.Frequency 
    FROM ChannelConflicts JOIN Channels ON ChannelConflicts.ChannelId == Channels.Id 
    JOIN HexGeoBins ON ChannelConflicts.HexGeoBinId == HexGeoBins.Id 
    WHERE Channels.Frequency = ? AND Channels.Protocol = ?) 
    JOIN 
    (SELECT HexGeoBins.HexId AS "Hex2", Channels.Frequency 
    FROM ChannelConflicts JOIN Channels ON ChannelConflicts.ChannelId == Channels.Id 
    JOIN HexGeoBins ON ChannelConflicts.HexGeoBinId == HexGeoBins.Id 
    WHERE Channels.Frequency = ? AND Channels.Protocol = ?) 
    ON "Hex1" = "Hex2" 
    JOIN 
    (SELECT HexGeoBins.HexId AS "Hex3", Channels.Frequency 
    FROM ChannelConflicts JOIN Channels ON ChannelConflicts.ChannelId == Channels.Id 
    JOIN HexGeoBins ON ChannelConflicts.HexGeoBinId == HexGeoBins.Id 
    WHERE Channels.Frequency = ? AND Channels.Protocol = ?) 
    ON "Hex1" = "Hex3";'''

    for f1 in f:
        for f2 in f:
            if (f1 <= f2 - separation):
                for f3 in f: 
                    if (f2 <= f3 - separation):
                        hexcount = cur.execute(query,[f1,protocol,f2,protocol,f3,protocol]).fetchone()[0];
                        dict1 = {"Protocol": protocol,
                                 "Frequency1": f1,
                                 "Frequency2": f2,
                                 "Frequency3": f3,
                                 "HexCount": hexcount,
                                 "UsPercent": hexcount/1800631.0*100};
                        rowList.append(dict1);
    df = pd.DataFrame(rowList);

A sample output of this df is:

Id	Protocol	Frequency1	Frequency2	Frequency3	HexCount	UsPercent
0	protocol1	1005	1015	1025	1096	0.060867551
1	protocol1	1005	1015	1030	3185	0.176882437
2	protocol1	1005	1015	1035	1248	0.069309037
3	protocol1	1005	1015	1040	181	0.010052032
4	protocol1	1005	1015	1045	911	0.050593375
5	protocol1	1005	1015	1050	1052	0.058423964
6	protocol1	1005	1015	1055	2935	0.162998416
7	protocol1	1005	1015	1060	1751	0.097243688
8	protocol1	1005	1015	1065	2059	0.114348803
9	protocol1	1005	1015	1070	6043	0.335604574

This currently takes over 60s per query to run. With a significant number of permutations, the overall run time is unacceptable.

Any suggestions on how to expedite this effort?

In the short-term I'd prefer to not have to modify the database, but if that it brings significant performance gains, I will make the changes eventually. Ideally, there is a simpler, faster query that can be created to do this as-is.

UPDATE

I was able to speed up the query with the simple realization that I don't care about the actual HexGeoBinIds, just the count of distinct ones. So, that removes one JOIN. Also, I can avoid another JOIN by using ChannelId instead of using Protocol and Frequency from Channels table. This removes another JOIN.

This gives the current query:

    query = '''SELECT COUNT(DISTINCT("Hex1")) FROM 
    (SELECT ChannelConflicts.HexGeoBinId AS "Hex1", ChannelConflicts.ChannelId 
     FROM ChannelConflicts WHERE ChannelConflicts.ChannelId = ?) 
    JOIN 
    (SELECT ChannelConflicts.HexGeoBinId AS "Hex2", ChannelConflicts.ChannelId 
     FROM ChannelConflicts WHERE ChannelConflicts.ChannelId = ?) 
    ON "Hex1" = "Hex2" 
    JOIN 
    (SELECT ChannelConflicts.HexGeoBinId AS "Hex3", ChannelConflicts.ChannelId 
     FROM ChannelConflicts WHERE ChannelConflicts.ChannelId = ?) 
    ON "Hex1" = "Hex3";'''

I have improved from 60s to 2s per query, but would like it much faster. Any thoughts?

Original Q&A

There are 1 best solutions below

**gog** · Answer 1

Not sure I understood, but FWIW one possible solution would be to treat different valueIds as "bits" and reduce their combinations into numbers ("bitmasks"), on which you can conveniently perform aggregate operations.

Basically, this

select binId, sum(1 << valueId) as bitmask from conflicts group by binId

will return something like

binId   bitmask
1       14
2       12
3       14

where "bitmask" uniquely identifies a specific set of "conflicts".

To find out how many "bins" are associated with each combination, just group once again:

select bitmask, count(*) from (select ... etc) group by bitmask

Here's a full transcript:

sqlite> create table conflicts(binId int, valueId int);
sqlite> insert into conflicts values
(1,2),
(1,3),
(1,1),
(2,2),
(2,3),
(3,1),
(3,3),
(3,2);
sqlite> select binId, sum(1 << valueId) as bitmask from conflicts group by binId;
1|14
2|12
3|14
sqlite> select bitmask, count(*) from (
    select binId, sum(1 << valueId) as bitmask from conflicts group by binId
) group by bitmask;
12|1
14|2

To convert these bitmasks back to valueIds you'll need some trivial code on the app side.

Improve efficiency of sqlite3 query in python

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PERFORMANCE

Related Questions in SQLITE3-PYTHON

Trending Questions

Popular # Hahtags

Popular Questions