best method to improve awk speed in writing to sqlite3 db

61 Views Asked by At

i have a device that runs BusyBox, has ONE processor available for client use, and needs to run as a flow pulse counter (among other things), catching zero or many pulses of 20ms in width at random intervals within each second.

the device has a second processor walled off from the client dedicated to creating a muxed datapool of it's I/O port signals. the muxed signal is available via a manufacturer provided awk script, which i have modified to provide a TICKCOUNT that increments when the demuxed signal reports '1'. i can't share the manufacturer's code, obviously, but i can share mine.

alongside of this runs a python application that collects signal data from the mux stream and reports it via MQTT to an Ignition dashboard for client consumption/management/whatever. the MQTT reporting interval is 5sec. the pulse count check against the store value (text file/sqlite db) is every 2sec, enabling plenty of time to perform a flow rate calculation and report it back via MQTT to the consuming dashboard.

in order to be able to capture the pulse signal, i had to get sampling rate below 15ms. running the awk script balls out in an eternal while loop did the trick; i'm able to consistenly get ~7-11ms read intervals. and i was just outputting to a text file so python could read from THAT and do what it wanted with it, using the MQTT timestamp to calculate flow rates.

enter management. flow rate calculations must now be done on the device, not on the dashboard server. wth.

so now Read/Write times matter a hell of a lot more and i've switched the script to writing to an sqlite database to get a rolling timestamp on the writes that python can then access and pass on via MQTT.

my issue is, the count appears to no longer be accurate because the Read from the database in python's subshell PLUS the subshell required to excute the sqlite3 query in awk are interrupting the sampling speed. if this had another core to dump the sampling to, this would be a non-issue.

so. how can i best optimize this awk code for as much speed as possible to claw back as many cycles as possible to retain coherence with actual pulse counts? and i'm fully willing to accept this is no longer doable.

the awk code (from memory because i'm not currently sitting at the device):

#!/bin/awk -f
# proprietary code that sets the mux config
BEGIN {
  TICKCOUNT = 0
  db = "path/to/pulseCount.db"
  commandSQL = "sqlite3 -noheader " db " \"%s\";"

  while(1) {
    # more proprietary code that reads the mux and dumps value to DIN variable
    if (DIN == '1') {
      TICKCOUNT++
      querySQL = "update pulseCount set pulses=" TICKCOUNT ";"
      system(sprintf(commandSQL, querySQL))
    }
  }
}
END {
}

this works; dumping a crapton of return values into stdout from the sqlite3 call, but i'm told this isn't a performance concern. (plus, as i think about it, when running in production, nohup will dump it to /dev/null anyway). HOWEVER, i'm told the overhead of opening system() is killing speed faster than PolPot 'cleansing' West Timor in the 90's. that's bad.

i'm not sure where to optimize from here. i am historically an InfoSec/Network guy coming out of WebDev. i'm not hardware experienced. so, go gently. i am open to any advice on how to beat this thing and, if it can't be done, i am open to that option as well; in which case i'll find some other way to get done what needs doing.

thank you for your patience. this is a long one.

0

There are 0 best solutions below