I'm experimenting with the characterization of data over time after downsampling. After studying this post, I generated some synthesized data with certain periodic patterns over time every 5mins (granularity of 5mins= data generated with the interval of 5 mins). It means that for each hour I generate 12 observations and at the end of the day (24hrs) I have 12*24 = 288 observations\data points over time. I generated the PWM data pattern with a certain period. I could generate other patterns i.e. Positive Triangle Pulse, Positive Pulses (Rectangles), Impulses,... but I picked PWM for a better understanding of changes in behavior.

import numpy as np
import pandas as pd
import scipy.signal as signal
import matplotlib.pyplot as plt

# Generate periodic data : PWM Modulated Sinusoidal Signal
# Set the duty cycle percentage for PWM
percent = 40.0
# Set the time period for one cycle of the PWM signal
TimePeriod = 6.0
# Set the desired number of samples
desired_samples = 278
# Calculate the time step (dt) to achieve the desired number of samples
# 30 is the original number of cycles in the provided code
dt = TimePeriod / (desired_samples / 30)
# Calculate the number of cycles needed to achieve the desired number of samples
Cycles = int(desired_samples * dt / TimePeriod) + 1
# Create a time array
t = np.arange(0, Cycles * TimePeriod, dt)
# Generate a PWM signal
pwm = (t % TimePeriod) < (TimePeriod * percent / 100)
# Create a sinusoidal signal
x = np.linspace(-10, 10, len(pwm))
y_pwm = np.sin(x)
# Zero out the sinusoidal signal where PWM is zero
y_pwm[pwm == 0] = 0

# Convert data to a Pandas DataFrame
data = {'datetime': t_num, "PWM":y_pwm}
df = pd.DataFrame(data)
df.shape #(288, 2)

So now I have a univariate time series including timestamp datetime and some periodic values in form of PWM signal.

Before downsampling, I made sure about datetime column by using:

  • df.datetime = pd.to_timedelta(df.datetime, unit='T') ref
  • df['datetime'] = pd.to_datetime(df['datetime']) ref

Then I applied resample() to downsample 5mins to 1hour like this post.

resampled_df = (df.set_index('datetime')          # Conform data by setting a datetime column as dataframe index needed for resample
                  .resample('1H')                 # resample with frequency of 1 hour
                  .mean()                         # used mean() to aggregate
                  .interpolate()                  # filling NaNs and missing values [just in case] 
                )
resampled_df.shape                                # (24, 1)

another way inspired from here

resampled_df2 = (df.set_index('datetime')                   # Conform data by setting a datetime column as dataframe index needed for resample
                  .groupby([pd.Grouper(freq='1H')])         # resample with frequency of 1 hour
                  .mean()                                   # used mean() to aggregate
                 )

resampled_df2.shape                                         # (24, 1)

I used mean() method because I think average of each 12 observations within each hour could be good representative of behavior and has less negative impact on behavior for Periodic Patterns\behavior Identification.

Now I want to demonstrate raw periodic data and resampled version:

import matplotlib.pyplot as plt
import pandas as pd

fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(15, 4))

#  PWM
axes[0].plot(   df['datetime'],  df['PWM'], color='blue')
axes[0].scatter(df['datetime'],  df['PWM'], color='blue', marker='o', s=10)
axes[0].set_title(f'PWM incl. {len(df)} observations')

# Resample of PWM
axes[1].plot(   resampled_df.index,  resampled_df['PWM'], color='blue')
axes[1].scatter(resampled_df.index,  resampled_df['PWM'], color='blue', marker='o', s=10)
axes[1].set_title(f'PWM (resampled frequency=1H) incl. {len(resampled_df)} observations')

for ax in axes:
    ax.set_xticks(selected_ticks)
    ax.set_xticklabels(selected_ticks, rotation=90)


plt.show()

Output: img

My objective is to detect periodic/cyclic behavior for further characterization of data over time but sometimes potential (nearly) periodic patterns can be seen in different data resolutions and one needs to downsample also because of volume of (big-)data. So I'm looking for best practices to downsample and detect periodic/cyclic behavior or potentially (nearly) periodic patterns.

I find the approach Averaging periodic data with a variable sampling rate but I'm not sure if fits my problem.

The closest workaround I have found so far is How to decompose multiple periodicities present in the data without specifying the period?. Another thing is always people propose FFT to see possible patterns instead of time-domain in frequency-domain but there are some discussion which state:

The FFT is the wrong tool to use for finding the periodicity. ... ref

If FFT or DFT approaches are useful what is the best practice to set sampling frequency for PWM I have generated according to sample_freq=n_samples/(tmax-tmin) which should reflect periodicity by sample_preiod=1/sample_freq ref.

Please note that I'm not interested in this solution such as this answer using .iloc because I believe (de-)selecting by position would ruin the periodic patterns of data.

I also I'm aware of some facts like:

  1. "Downsampling always means a loss of information, which is why in general downsampling is preferably avoided." ref. following picture from my own experiments for the showcase: img
  2. "...down sample how much of data observations..." ref.

Time-series:

Signal processing:

0

There are 0 best solutions below