Fill NaN value to continuous time series data where some timeframe were missing

Question

Fill NaN value to continuous time series data where some timeframe were missing

2.9k Views Asked by Han Zhengzu At 05 December 2025 at 21:22

An illustration of my question

Dealing with hourly varying time series data which contain datetime from 2016-01-01 00:00 to 2016-01-07 23:00 and some feature corresponding to each timeframe.

Ideally, there would be 7 x 24 rows of data covering all the time period. But sometimes, the hourly data may interrupt by some reason (e.g capturing hourly weather information, but the website were broke in specific hours.)

My dataframe now contain all identical data without missing hour. The length of my dataframe were 7 *24 - 5, which mean there were 5 missing hourly data.

The datetime are saved in the format of 201601010100 representing 2016-01-01 01:00:00

My attempt.

I have tried to generated new dataframe which is length of 7*24 containing continuous hourly datetime

data = {"datetime":[],"feature1":[],"feature2":[]}
ff = pd.DataFrame(data)
rng = pd.date_range('01/01/2016', periods=600, freq='H')
new_date = list(rng.strftime('%Y%m%d%H'))
ff['datetime'] = new_date

The original dataframe was df which containing datetime, feature1,feature2.

I tried to f3 = ff.merge(df, on ='dataframe'').

But f3 was still in the length of 7*24.

My target

Due to large temporal coverage and feature dimension in pratical, I don't want to loop the two dataframe and compare their datetime.values one by one.

I want to replace the value of feature in ff based on the identical datetime in df and keep the NaN information still.

Original Q&A

There are 2 best solutions below

Jonathan Eunice On 17 December 2016 at 07:52

Missing data is pretty common in dealing with date/time-indexed data. There are no perfect solutions--the perfect solution would been to have had all the data in the first place, on exactly the cadence you wanted--but there are ways of managing with missing data.

Let's start with some sample data:

import pandas as pd
import numpy as np

data = {"datetime": [201601010100,201601010200,201601010400,201601010500],
        "feature1": np.random.randint(1,100,4),
        "feature2": np.random.rand(4) }
ff = pd.DataFrame(data)

Let's give it a real DatetimeIndex:

ff.index = pd.to_datetime(ff.datetime.astype(str))
ff.index.name = None

And finally reindex with a similar index that has all the hours

di = pd.DatetimeIndex(start=ff.index[0], 
                      end=ff.index[-1], 
                      freq='H')
ff = ff.reindex(di, fill_value='--missing--')

Voila! Your data is presented with the missing data points clearly marked. As a final grace note, you might decide to remove the original low-level timestamps:

del ff['datetime']

Yielding:

There are still some complexities to worry about. Pandas isn't perfect, and it struggles to deal with columns that can contain multiple types. It's fundamentally tied to Numpy arrays as its underlying storage engine, which demands homogeneous arrays. So when you reindex like this, you introduce NaN or another designated missing values into your other columns. I chose a string value for the missing column, which looks nice, but recasts all your columns to object type, which has performance implications if your data frame is large. The default alternative, inserting NaN in missing values, will recast all your numerical columns to floating point. One or the other may be more palatable to you. The floating point one works well, e.g., if all your feature columns are floating point values already. But neither is perfect.

**DYZ** · Accepted Answer

Create an empty dataframe with rng as index:

skeleton = pd.DataFrame(index=rng)

Convert the original dates to numpy.datetime64 to make them compatible with timerange:

df['datetime_ns'] = df['datetime'].astype(numpy.datetime64)

Perform an outer join of the frames on index and datetime_ns:

new_df = df.merge(skeleton, left_on='datetime_ns',right_index=True,how='outer')

Sort the new dataframe, if necessary:

new_df.sort_values('datetime_ns', inplace=True)

Fill NaN value to continuous time series data where some timeframe were missing

An illustration of my question

My attempt.

My target

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in PYTHON-2.7

Related Questions in DATETIME

Related Questions in PANDAS

Related Questions in MISSING-DATA

Trending Questions

Popular # Hahtags

Popular Questions