Pandas resample signal series with its corresponding label

Question

Pandas resample signal series with its corresponding label

94 Views Asked by Muhammad Ikhwan Perwira At 21 January 2024 at 10:19

I have this table with these columns: Seconds, Amplitude, Labels, Metadata. Basically it's ECG Signal.

You can download the csv here: https://tmpfiles.org/3951223/question.csv

Here how the data looks like (first page):

Here is (fourth page):

Here is (fifth page):

Here is (last page):

As you see that the seconds timestep is 0.004. How to resample that with desired new timestep such as 0.002 without destructing another columns.

Such as label_encoding, that column is intended for machine learning y label purpose especially multiclassification problem, it's segmentation region. It's unique value are (24, 1, 27).

While bound_or_peak is intended for displaying or plotting purpose of region. It consists 3 bits (max value is 7). If most significant bit set, then it started region to plot (onset). If second bit is set, then it must be peak of ECG signal wave. If least significant bit is set then it must be offset region to plot.

Here is the table produced by this code:

%load_ext google.colab.data_table
import numpy as np
import pandas as pd

# Create a NumPy matrix with row and column labels
matrix_data = signal.signal_arr

dtype_dict = {'seconds': float, 'amplitude': float, 'label_encoding': int, 'bound_or_peak': int}

# Convert the NumPy matrix to a Pandas DataFrame with labels
df = pd.DataFrame(matrix_data, columns=dtype_dict.keys()).astype(dtype_dict)

# Display the DataFrame
df[:250]

What I mean with without destruction another columns is: after resampled, another column such as labels and bound_or_peak are located as is following df before resampled. While amplitude should have interpolated especially linear interpolated.

Actually I have idea to ignore seconds column. Instead. that column can be compressed in single value such as frequency sampling. So converting timestep to frequency sampling is good idea I think. 0.004 means 1/0.004 therefore the frequency sampling is 250.

Now the problem is how to resample or interpolate the amplitude to another frequency sampling without destructing another columns.

Update: As commentator said, I should have use textual representation to show table instead of picture:

index	seconds	amplitude	label_encoding	bound_or_peak
0	0.0	0.035	0	0
1	0.004	0.06	0	0
2	0.008	0.065	0	0
3	0.012	0.075	0	0
4	0.016	0.085	0	0
5	0.02	0.075	0	0
6	0.024	0.065	0	0
7	0.028	0.065	0	0
8	0.032	0.065	0	0
9	0.036000000000000004	0.07	0	0
10	0.04	0.075	0	0
11	0.044	0.075	0	0
12	0.048	0.075	0	0
13	0.052000000000000005	0.07	0	0
14	0.056	0.065	0	0
15	0.06	0.065	0	0
16	0.064	0.065	0	0
17	0.068	0.065	0	0
18	0.07200000000000001	0.065	0	0
19	0.076	0.06	0	0
20	0.08	0.055	0	0
21	0.084	0.04	0	0
22	0.088	0.03	0	0
23	0.092	0.015	0	0
24	0.096	0.0	0	0
25	0.1	-0.01	0	0
26	0.10400000000000001	-0.02	0	0
27	0.108	-0.03	0	0
28	0.112	-0.04	0	0
29	0.116	-0.05	0	0
30	0.12	-0.06	0	0
31	0.124	-0.07	0	0
32	0.128	-0.08	0	0
33	0.132	-0.09	0	0
34	0.136	-0.095	0	0
35	0.14	-0.09	0	0
36	0.14400000000000002	-0.085	0	0
37	0.148	-0.085	0	0
38	0.152	-0.085	0	0
39	0.156	-0.09	0	0
40	0.16	-0.095	0	0
41	0.164	-0.09	0	0
42	0.168	-0.085	0	0
43	0.17200000000000001	-0.085	0	0
44	0.176	-0.085	0	0
45	0.18	-0.085	0	0
46	0.184	-0.08	0	0
47	0.188	-0.075	0	0
48	0.192	-0.075	0	0
49	0.196	-0.075	0	0
50	0.2	-0.075	0	0
51	0.20400000000000001	-0.075	0	0
52	0.20800000000000002	-0.075	0	0
53	0.212	-0.075	0	0
54	0.216	-0.07	0	0
55	0.22	-0.065	0	0
56	0.224	-0.06	0	0
57	0.228	-0.055	0	0
58	0.232	-0.055	0	0
59	0.23600000000000002	-0.055	0	0
60	0.24	-0.065	0	0
61	0.244	-0.075	0	0
62	0.248	-0.075	0	0
63	0.252	-0.075	0	0
64	0.256	-0.07	0	0
65	0.26	-0.065	0	0
66	0.264	-0.06	0	0
67	0.268	-0.06	0	0
68	0.272	-0.07	0	0
69	0.276	-0.075	0	0
70	0.28	-0.075	0	0
71	0.28400000000000003	-0.075	0	0
72	0.28800000000000003	-0.075	0	0
73	0.292	-0.07	0	0
74	0.296	-0.06	0	0
75	0.3	-0.06	0	0
76	0.304	-0.07	0	0
77	0.308	-0.075	0	0
78	0.312	-0.08	0	0
79	0.316	-0.085	0	0
80	0.32	-0.085	0	0
81	0.324	-0.085	0	0
82	0.328	-0.08	0	0
83	0.332	-0.075	0	0
84	0.336	-0.075	0	0
85	0.34	-0.08	0	0
86	0.34400000000000003	-0.085	24	4
87	0.34800000000000003	-0.08	24	0
88	0.352	-0.075	24	0
89	0.356	-0.06	24	0
90	0.36	-0.045	24	0
91	0.364	-0.035	24	0
92	0.368	-0.025	24	0
93	0.372	-0.025	24	0
94	0.376	-0.025	24	0
95	0.38	-0.02	24	0
96	0.384	-0.015	24	0
97	0.388	-0.01	24	0
98	0.392	-0.005	24	0
99	0.396	0.005	24	0
100	0.4	0.02	24	0
101	0.404	0.035	24	0
102	0.40800000000000003	0.045	24	2
103	0.41200000000000003	0.05	24	0
104	0.41600000000000004	0.055	24	0
105	0.42	0.05	24	0
106	0.424	0.035	24	0
107	0.428	0.015	24	0
108	0.432	-0.005	24	0
109	0.436	-0.035	24	0
110	0.44	-0.05	24	0
111	0.444	-0.065	24	1
112	0.448	-0.08	0	0
113	0.452	-0.09	0	0
114	0.456	-0.095	0	0
115	0.46	-0.09	0	0
116	0.464	-0.085	0	0
117	0.468	-0.09	0	0
118	0.47200000000000003	-0.095	0	0
119	0.47600000000000003	-0.095	0	0
120	0.48	-0.095	0	0
121	0.484	-0.1	0	0
122	0.488	-0.105	0	0
123	0.492	-0.105	0	0
124	0.496	-0.105	0	0
125	0.5	-0.105	0	0
126	0.504	-0.115	0	0
127	0.508	-0.115	0	0
128	0.512	-0.11	0	0
129	0.516	-0.105	0	0
130	0.52	-0.105	0	0
131	0.524	-0.105	0	0
132	0.528	-0.095	0	0
133	0.532	-0.085	0	0
134	0.536	-0.09	0	0
135	0.54	-0.095	0	0
136	0.544	-0.09	0	0
137	0.548	-0.085	0	0
138	0.552	-0.08	1	4
139	0.556	-0.075	1	0
140	0.56	-0.08	1	0
141	0.5640000000000001	-0.07	1	0
142	0.5680000000000001	-0.025	1	0
143	0.5720000000000001	0.075	1	0
144	0.5760000000000001	0.25	1	0
145	0.58	0.54	1	0
146	0.584	0.96	1	0
147	0.588	1.41	1	2
148	0.592	1.885	1	0
149	0.596	1.735	1	0
150	0.6	1.09	1	0
151	0.604	0.35	1	0
152	0.608	-0.455	1	0
153	0.612	-0.725	1	0
154	0.616	-0.705	1	0
155	0.62	-0.54	1	0
156	0.624	-0.315	1	0
157	0.628	-0.195	1	0
158	0.632	-0.115	1	1
159	0.636	-0.09	0	0
160	0.64	-0.08	0	0
161	0.644	-0.075	0	0
162	0.648	-0.08	0	0
163	0.652	-0.085	0	0
164	0.656	-0.085	0	0
165	0.66	-0.085	0	0
166	0.664	-0.08	0	0
167	0.668	-0.08	0	0
168	0.672	-0.085	0	0
169	0.676	-0.085	0	0
170	0.68	-0.085	0	0
171	0.684	-0.075	0	0
172	0.6880000000000001	-0.065	0	0
173	0.6920000000000001	-0.07	0	0
174	0.6960000000000001	-0.075	0	0
175	0.7000000000000001	-0.07	0	0
176	0.704	-0.065	0	0
177	0.708	-0.06	0	0
178	0.712	-0.055	0	0
179	0.716	-0.05	0	0
180	0.72	-0.045	0	0
181	0.724	-0.04	0	0
182	0.728	-0.035	27	4
183	0.732	-0.035	27	0
184	0.736	-0.035	27	0
185	0.74	-0.035	27	0
186	0.744	-0.035	27	0
187	0.748	-0.03	27	0
188	0.752	-0.02	27	0
189	0.756	-0.01	27	0
190	0.76	-0.005	27	0
191	0.764	0.0	27	0
192	0.768	0.005	27	0
193	0.772	0.005	27	0
194	0.776	0.005	27	0
195	0.78	0.01	27	0
196	0.784	0.025	27	0
197	0.788	0.04	27	0
198	0.792	0.045	27	0
199	0.796	0.05	27	0
200	0.8	0.055	27	0
201	0.804	0.055	27	0
202	0.808	0.055	27	0
203	0.812	0.06	27	0
204	0.8160000000000001	0.065	27	0
205	0.8200000000000001	0.07	27	0
206	0.8240000000000001	0.085	27	0
207	0.8280000000000001	0.1	27	0
208	0.8320000000000001	0.105	27	0
209	0.836	0.105	27	0
210	0.84	0.11	27	0
211	0.844	0.115	27	0
212	0.848	0.12	27	0
213	0.852	0.125	27	0
214	0.856	0.12	27	2
215	0.86	0.115	27	0
216	0.864	0.115	27	0
217	0.868	0.115	27	0
218	0.872	0.115	27	0
219	0.876	0.115	27	0
220	0.88	0.115	27	0
221	0.884	0.115	27	0
222	0.888	0.115	27	0
223	0.892	0.115	27	0
224	0.896	0.11	27	0
225	0.9	0.105	27	0
226	0.904	0.1	27	0
227	0.908	0.09	27	0
228	0.912	0.07	27	0
229	0.916	0.05	27	0
230	0.92	0.035	27	0
231	0.924	0.015	27	0
232	0.928	-0.005	27	0
233	0.932	-0.02	27	0
234	0.936	-0.03	27	0
235	0.9400000000000001	-0.04	27	0
236	0.9440000000000001	-0.05	27	0
237	0.9480000000000001	-0.055	27	0
238	0.9520000000000001	-0.06	27	0
239	0.9560000000000001	-0.07	27	1
240	0.96	-0.08	0	0
241	0.964	-0.085	0	0
242	0.968	-0.085	0	0
243	0.972	-0.085	0	0
244	0.976	-0.085	0	0
245	0.98	-0.085	0	0
246	0.984	-0.08	0	0
247	0.988	-0.075	0	0
248	0.992	-0.08	0	0
249	0.996	-0.085	0	0

Original Q&A

There are 1 best solutions below

**Reinderien** · Accepted Answer · 2024-01-21T14:06:10.107000

This is a straightforward application of resample(), but you have to make some aggregation decisions.

from io import StringIO

import pandas as pd

content = '''
index   seconds     amplitude   label_encoding  bound_or_peak
0   0.0     0.035   0   0
1   0.004   0.06    0   0
2   0.008   0.065   0   0
3   0.012   0.075   0   0
4   0.016   0.085   0   0
5   0.02    0.075   0   0
6   0.024   0.065   0   0
...
245     0.98    -0.085  0   0
246     0.984   -0.08   0   0
247     0.988   -0.075  0   0
248     0.992   -0.08   0   0
249     0.996   -0.085  0   0
'''
with StringIO(content) as file:
    df = pd.read_csv(file, delim_whitespace=True)
df['seconds'] *= pd.Timedelta(1, 's')
df.set_index('seconds', drop=True, inplace=True)

sampler = df.resample(rule='2ms')
resampled = sampler.nearest()[['index', 'label_encoding']]
resampled['amplitude'] = sampler.interpolate('time')['amplitude']
resampled['bound_or_peak'] = sampler.asfreq(fill_value=0)['bound_or_peak']

pd.options.display.width = 200
pd.options.display.max_columns = 10
print(resampled)

                        index  label_encoding  amplitude  bound_or_peak
seconds                                                                
0 days 00:00:00             0               0     0.0350              0
0 days 00:00:00.002000      1               0     0.0475              0
0 days 00:00:00.004000      1               0     0.0600              0
0 days 00:00:00.006000      2               0     0.0625              0
0 days 00:00:00.008000      2               0     0.0650              0
...                       ...             ...        ...            ...
0 days 00:00:00.988000    247               0    -0.0750              0
0 days 00:00:00.990000    248               0    -0.0775              0
0 days 00:00:00.992000    248               0    -0.0800              0
0 days 00:00:00.994000    249               0    -0.0825              0
0 days 00:00:00.996000    249               0    -0.0850              0

[499 rows x 4 columns]

Pandas resample signal series with its corresponding label

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in NUMPY

Related Questions in PANDAS-RESAMPLE

Trending Questions

Popular # Hahtags

Popular Questions