Avoiding duplicate data in hdf5 or sqlite

491 Views Asked by trench At 27 July 2025 at 12:44

I am taking 30 days of historical data and modifying it.

Hopefully, I can read the historical data and have it refer to a dynamic rolling date of 30 days. The 'DateTime' value is a column in the raw data.

df_new = df = pd.read_csv(loc+filename)
max_date = df_new['DateTime'].max()
date_range = max_date - Timedelta(30, unit='d')

df_old = pd.read_hdf(loc+filename,'TableName', where = [('max_date > date_range')])

Then I would read the new data which is a separate file which is always Month to Date values (all of June for example, this file is replaced daily with the latest data), and concat them to the old dataframe.

frames = [df_old, df_new]
df = pd.concat(frames)

Then I do some things to the file (I am checking if certain values repeat within a 30 day window, if they do then I place a timestamp in a column).

Now I would want to add this modified data back into my original file (it was HDF5 but it could be a .sqlite file too) called df_old. For sure, there are a ton of duplicates since I am reading the previous 30 days data and the MTD data. How do I manage this?

My only solution is to read the entire file (df_old along with the new data I added) and then drop duplicates and then overwrite it again. This isn't very efficient.

Can .sqlite or .hdf formats enforce non-duplicates? If so then I have 3 columns which identify a unique value (Date, EmpID, CustomerID). I do not want exact duplicate rows.

Original Q&A

There are 1 best solutions below

Christian W. On 25 June 2015 at 14:42

Define them as primary keys in sqlite. It wont allow you to have set of non-unique primary keys.

e.g.

CREATE TABLE table (
a INT,
b INT,
c INT,
PRIMARY KEY(a,b)
);

wont allow you to have duplicates of a,b added to the data. Then use INSERT OR IGNORE to add data, and any duplicates will be ignored.

http://sqlite.org/lang_insert.html

Avoiding duplicate data in hdf5 or sqlite

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in SQLITE

Related Questions in PANDAS

Related Questions in HDF5

Trending Questions

Popular # Hahtags

Popular Questions