Saving data without an external .txt file in python

162 Views Asked by At

For some context: I have a greenhouse that has a sensor that reads in the temperature every minute and adds it to a temp array for an hour. After an hour the average of the array is taken and that value is stored in an hourly array which is used to build and send me an email of a graph every 24 hours.

When I let my system run for 24 hours it is sending the email as intended. However, there are sometimes when I need to make physical changes to the system in which I have to turn the system off to do so. This causes the temp minute and hourly arrays to be lost upon restarting.

The solution I came up with is to store the data in two different .txt files. I know this would work because I am currently doing it to track the minutes elapsed. However, I don't really want to keep adding a bunch of text files to store data. Is there another way?

2

There are 2 best solutions below

0
OysterShucker On

Instead of creating text files for each individual variable, you can simply store the information as json and load it as a dict.

import json
from   datetime import datetime as dt
from   random   import randint


#represents your temperature function
def get_ftemp() -> int:
    return randint(-20,120)


#take readings and save data
def set_sample() -> None:
    with open('sensors.json', 'w') as file:
    
        #sensor data for this sample
        sample = dict(
            timestamp = dt.timestamp(dt.now()),
            temp      = get_ftemp()
        )
    
        #overwrite file with updated sample
        file.write(json.dumps(sample))
    
    
def get_sample() -> dict:
    sample = {}
    
    #get sample as dict
    with open('sensors.json', 'r') as file:
        sample = json.loads(file.read())
        
    return sample

If you wanted to be able to gather statistical data, over an extended period of time, you can expand on the above and do something similar to the below.

import json
from   datetime import datetime as dt
from   random   import randint


LONG_FORMAT = '%I:%M:%S.%f %A, %B %d, %Y'
TIME_FORMAT = '%I:%M:%S.%f'
DATE_FORMAT = '%Y-%m-%d'

#represents your temperature function
def get_ftemp():
    return randint(-20,120)

#take readings and save data
def sample() -> list:
    readings = []
    
    with open('sensors.json', 'r+') as file: 
        #get json if file is not empty
        if text := file.read(): 
            readings = json.loads(text)
            
        #sensor data for this reading
        sample = dict(
            timestamp = dt.timestamp(dt.now()),
            temp      = get_ftemp()
        )
        
        #update readings
        readings.append(sample)
    
        #overwrite file with updated readings
        file.seek(0)
        file.truncate()
        file.write(json.dumps(readings))
    
    return readings

#get readings, reformat where necessary
def readings(fmt:str):
    readings = []
    
    with open('sensors.json', 'r') as file: 
        #get json if file is not empty
        if text := file.read(): 
            readings = json.loads(text)
            
    for entry in readings:
        date = dt.fromtimestamp(entry['timestamp'])
        date = dt.strftime(date, fmt)
        del entry['timestamp']

        #yield reformatted date and "rest"
        yield date, *entry.values()
    
#represents statistical analysis
def print_range(m:int, M:int, fmt:str) -> None:
    for date, temp in readings(fmt):
        if m <= temp <= M:
            print(temp, date)


print_range(60, 120, LONG_FORMAT)

Keep in mind that the json method has an issue. Every time you store a sample, the entire json file is loaded, the sample is appended to the return value, and the entire file is rewritten with the new data. There will definitely be some point where you should clear the json. One improvement to my script would be to keep an instance of readings alive at all times. This way you can cut read() down to one initial call, and just write() when necessary. The only way around this is to use a "real database", meaning a system that does not have to load in it's entirety to be updated.

0
Mark Setchell On

My suggestion would be to use sqlite to store your readings. It is a well-loved and well-tested technique and has lots of community support. It is "server-less" which means you do not have to start any big database management processes, instead it is all contained in a single file and its Python interface which is built into Python's standard library - so nothing to install. There is a good write-up of it here.

The benefits of using sqlite are:

  • just a single file for the whole database,
  • simple, yet powerful interface used in many iOS apps by Apple, and in many desktop apps such as for Firefox's bookmarks
  • accessible from command-line as well as via Python, so you can check readings, remove old readings from the command-line
  • can be created "in-memory" or on disk, as you see fit - some folk seem excessively concerned about SD card wear on Raspberry Pi, so create in memory with single change at any time, see here
  • can be easily read into Python, or Pandas and used with matplotlib or other Python ecosystem tools

I am no expert at all on sqlite, but can show you enough material to get you started. So, I am suggesting creating a database with one table called readings and a UTC timestamp for each reading. If you don't supply a timestamp, the database will add the current time for you - simple! Here's the SQL to create the schema, I saved it as schema.sql:

-- schema.sql
-- Schema for "Logger" application

CREATE TABLE readings (
    timestamp   TEXT DEFAULT CURRENT_TIMESTAMP,
    reading     NUMERIC  NOT NULL);

You can run it from the shell to initialise the database (called logger.db) like this:

sqlite3 -init schema.sql logger.db

Or you can create it from Python and write some dummy readings into it as though you had been acquiring data for the last 30 hours at 10 minute intervals like this:

#!/usr/bin/env python3

import os
import time
import random
import sqlite3
import pandas as pd
from datetime import datetime, timedelta, timezone

dbFilename     = 'logger.db'
schemaFilename = 'schema.sql'

# Check if db exist
newDb = not os.path.exists(dbFilename)

with sqlite3.connect(dbFilename) as conn:
    if newDb:
        # Create schema:
        # You can do same thing from command-line with:
        #   sqlite3 -init schema.sql logger.db
        print('Create schema')
        with open(schemaFilename, 'rt') as f:
            schema = f.read()
        conn.executescript(schema)

        # Populate database with some synthetic data
        # https://github.com/royalosyin/Practice-SQL-with-SQLite-and-Jupyter-Notebook/blob/master/ex13-Using%20SQLite%20as%20a%20Time%20Series%20Database.ipynb
        # H: start this many hours ago
        # M: minutes between readings
        H, M = 30, 10
        print(f'Inserting sample data starting {H} hours ago at {M} minute intervals')
        start = datetime.now(timezone.utc) - timedelta(hours=H)
        for t in range(0, H * 60, M):
           sampleTimestamp = start + timedelta(minutes=t)
           sampleValue     = random.randrange(10,30)
           print(f'{sampleTimestamp.isoformat()}: {sampleValue}')
           conn.execute(f'''INSERT INTO readings(timestamp, reading) VALUES 
                            ('{sampleTimestamp}', '{sampleValue}')''')

        # Commit all our sample values
        conn.commit()

Now you can see all the readings in the database from the command-line like this:

sqlite3 -column -header logger.db "select * from readings"

timestamp                         reading
--------------------------------  -------
2023-08-09 10:26:13.741923+00:00  28     
2023-08-09 10:36:13.741923+00:00  10     
2023-08-09 10:46:13.741923+00:00  11     
2023-08-09 10:56:13.741923+00:00  23     
2023-08-09 11:06:13.741923+00:00  23     
2023-08-09 11:16:13.741923+00:00  11     
2023-08-09 11:26:13.741923+00:00  13     
2023-08-09 11:36:13.741923+00:00  29 
...
...    

Let's look at the readings from the last 60 minutes:

sqlite3 -column -header logger.db 'select * from readings WHERE (timestamp >= datetime("now","-60 minutes"))'

Let's delete readings older than 36 hours to be sure the database doesn't grow too large:

sqlite3 logger.db 'DELETE FROM readings WHERE (timestamp <= datetime("now","-36 hour"))'

Insert a reading with default, current timestamp from the command-line:

sqlite3 logger.db "INSERT INTO readings (reading) VALUES (20.4)"

Insert a reading with non-default timestamp:

sqlite3 logger.db "INSERT INTO readings (reading, timestamp) VALUES (21.6, '2023-08-10 17:33:00')"

Let's say you live in UK like me, so you would be concerned about frost and the minimum temperature:

sqlite3 logger.db "select min(reading) from readings;"

Let's say you live in Spain, and are maybe more concerned with the peak temperature:

sqlite3 logger.db "select max(reading) from readings;"

Let's say you want to convert your database to a CSV and Email it to yourself before truncating it to the last 36 hours... that's easy:

sqlite3 -header -csv logger.db "select * from readings;" > readings.csv

Then you can plot the readings using matplotlib like this:

import matplotlib.pyplot as plt
import pandas as pd
import sqlite3

dbFilename     = 'logger.db'

with sqlite3.connect(dbFilename) as conn:
    df = pd.read_sql("SELECT * FROM readings", conn)
    df.set_index("timestamp", inplace = True)
    df.index = pd.to_datetime(df.index, utc=True, format='ISO8601')
    df.plot(figsize=(12, 6))
    plt.show()

enter image description here

If you want smoothing or averaging, you could certainly add that in here, at the plotting stage.


If you want to show your plot in a web-browser, you can save the plot as an "in-memory" PNG and return that image to browser like in this example. You just add three lines at the end of the code above.


Note that, although I have just illustrated techniques for adding and reading values using the command-line, all those techniques can be done in Python in an exactly analogous fashion and you can see them in the Python script above that initialises a dummy database.

Note also you can do your averaging and sorting/ordering of your values either with SQL or by grabbing all the values in Python and doing the maths yourself - whichever is easiest.


As I said, I am no expert at all on sqlite, so there may be some "sub-optimal" things in my code, but they can be addressed by further questions tagged . I understand it is best practice to store all times in UTC and then adjust to local time for presentation - but I may be wrong. It may be better to add an index to the database for faster retrieval and storage, but I don't think performance is an issue if you are storing readings once a minute for just a few days. if any sqlite experts have any thoughts, I am here to learn and be corrected.