Pandas resample signal series with its corresponding label

94 Views Asked by At

I have this table with these columns: Seconds, Amplitude, Labels, Metadata. Basically it's ECG Signal.

You can download the csv here: https://tmpfiles.org/3951223/question.csv

Here how the data looks like (first page): first page

Here is (fourth page): fourth page

Here is (fifth page): fifth page

Here is (last page): last page

As you see that the seconds timestep is 0.004. How to resample that with desired new timestep such as 0.002 without destructing another columns.

Such as label_encoding, that column is intended for machine learning y label purpose especially multiclassification problem, it's segmentation region. It's unique value are (24, 1, 27).

While bound_or_peak is intended for displaying or plotting purpose of region. It consists 3 bits (max value is 7). If most significant bit set, then it started region to plot (onset). If second bit is set, then it must be peak of ECG signal wave. If least significant bit is set then it must be offset region to plot.

Here is the table produced by this code:

%load_ext google.colab.data_table
import numpy as np
import pandas as pd

# Create a NumPy matrix with row and column labels
matrix_data = signal.signal_arr

dtype_dict = {'seconds': float, 'amplitude': float, 'label_encoding': int, 'bound_or_peak': int}

# Convert the NumPy matrix to a Pandas DataFrame with labels
df = pd.DataFrame(matrix_data, columns=dtype_dict.keys()).astype(dtype_dict)

# Display the DataFrame
df[:250]

What I mean with without destruction another columns is: after resampled, another column such as labels and bound_or_peak are located as is following df before resampled. While amplitude should have interpolated especially linear interpolated.

Actually I have idea to ignore seconds column. Instead. that column can be compressed in single value such as frequency sampling. So converting timestep to frequency sampling is good idea I think. 0.004 means 1/0.004 therefore the frequency sampling is 250.

Now the problem is how to resample or interpolate the amplitude to another frequency sampling without destructing another columns.

Update: As commentator said, I should have use textual representation to show table instead of picture:

index seconds amplitude label_encoding bound_or_peak
0 0.0 0.035 0 0
1 0.004 0.06 0 0
2 0.008 0.065 0 0
3 0.012 0.075 0 0
4 0.016 0.085 0 0
5 0.02 0.075 0 0
6 0.024 0.065 0 0
7 0.028 0.065 0 0
8 0.032 0.065 0 0
9 0.036000000000000004 0.07 0 0
10 0.04 0.075 0 0
11 0.044 0.075 0 0
12 0.048 0.075 0 0
13 0.052000000000000005 0.07 0 0
14 0.056 0.065 0 0
15 0.06 0.065 0 0
16 0.064 0.065 0 0
17 0.068 0.065 0 0
18 0.07200000000000001 0.065 0 0
19 0.076 0.06 0 0
20 0.08 0.055 0 0
21 0.084 0.04 0 0
22 0.088 0.03 0 0
23 0.092 0.015 0 0
24 0.096 0.0 0 0
25 0.1 -0.01 0 0
26 0.10400000000000001 -0.02 0 0
27 0.108 -0.03 0 0
28 0.112 -0.04 0 0
29 0.116 -0.05 0 0
30 0.12 -0.06 0 0
31 0.124 -0.07 0 0
32 0.128 -0.08 0 0
33 0.132 -0.09 0 0
34 0.136 -0.095 0 0
35 0.14 -0.09 0 0
36 0.14400000000000002 -0.085 0 0
37 0.148 -0.085 0 0
38 0.152 -0.085 0 0
39 0.156 -0.09 0 0
40 0.16 -0.095 0 0
41 0.164 -0.09 0 0
42 0.168 -0.085 0 0
43 0.17200000000000001 -0.085 0 0
44 0.176 -0.085 0 0
45 0.18 -0.085 0 0
46 0.184 -0.08 0 0
47 0.188 -0.075 0 0
48 0.192 -0.075 0 0
49 0.196 -0.075 0 0
50 0.2 -0.075 0 0
51 0.20400000000000001 -0.075 0 0
52 0.20800000000000002 -0.075 0 0
53 0.212 -0.075 0 0
54 0.216 -0.07 0 0
55 0.22 -0.065 0 0
56 0.224 -0.06 0 0
57 0.228 -0.055 0 0
58 0.232 -0.055 0 0
59 0.23600000000000002 -0.055 0 0
60 0.24 -0.065 0 0
61 0.244 -0.075 0 0
62 0.248 -0.075 0 0
63 0.252 -0.075 0 0
64 0.256 -0.07 0 0
65 0.26 -0.065 0 0
66 0.264 -0.06 0 0
67 0.268 -0.06 0 0
68 0.272 -0.07 0 0
69 0.276 -0.075 0 0
70 0.28 -0.075 0 0
71 0.28400000000000003 -0.075 0 0
72 0.28800000000000003 -0.075 0 0
73 0.292 -0.07 0 0
74 0.296 -0.06 0 0
75 0.3 -0.06 0 0
76 0.304 -0.07 0 0
77 0.308 -0.075 0 0
78 0.312 -0.08 0 0
79 0.316 -0.085 0 0
80 0.32 -0.085 0 0
81 0.324 -0.085 0 0
82 0.328 -0.08 0 0
83 0.332 -0.075 0 0
84 0.336 -0.075 0 0
85 0.34 -0.08 0 0
86 0.34400000000000003 -0.085 24 4
87 0.34800000000000003 -0.08 24 0
88 0.352 -0.075 24 0
89 0.356 -0.06 24 0
90 0.36 -0.045 24 0
91 0.364 -0.035 24 0
92 0.368 -0.025 24 0
93 0.372 -0.025 24 0
94 0.376 -0.025 24 0
95 0.38 -0.02 24 0
96 0.384 -0.015 24 0
97 0.388 -0.01 24 0
98 0.392 -0.005 24 0
99 0.396 0.005 24 0
100 0.4 0.02 24 0
101 0.404 0.035 24 0
102 0.40800000000000003 0.045 24 2
103 0.41200000000000003 0.05 24 0
104 0.41600000000000004 0.055 24 0
105 0.42 0.05 24 0
106 0.424 0.035 24 0
107 0.428 0.015 24 0
108 0.432 -0.005 24 0
109 0.436 -0.035 24 0
110 0.44 -0.05 24 0
111 0.444 -0.065 24 1
112 0.448 -0.08 0 0
113 0.452 -0.09 0 0
114 0.456 -0.095 0 0
115 0.46 -0.09 0 0
116 0.464 -0.085 0 0
117 0.468 -0.09 0 0
118 0.47200000000000003 -0.095 0 0
119 0.47600000000000003 -0.095 0 0
120 0.48 -0.095 0 0
121 0.484 -0.1 0 0
122 0.488 -0.105 0 0
123 0.492 -0.105 0 0
124 0.496 -0.105 0 0
125 0.5 -0.105 0 0
126 0.504 -0.115 0 0
127 0.508 -0.115 0 0
128 0.512 -0.11 0 0
129 0.516 -0.105 0 0
130 0.52 -0.105 0 0
131 0.524 -0.105 0 0
132 0.528 -0.095 0 0
133 0.532 -0.085 0 0
134 0.536 -0.09 0 0
135 0.54 -0.095 0 0
136 0.544 -0.09 0 0
137 0.548 -0.085 0 0
138 0.552 -0.08 1 4
139 0.556 -0.075 1 0
140 0.56 -0.08 1 0
141 0.5640000000000001 -0.07 1 0
142 0.5680000000000001 -0.025 1 0
143 0.5720000000000001 0.075 1 0
144 0.5760000000000001 0.25 1 0
145 0.58 0.54 1 0
146 0.584 0.96 1 0
147 0.588 1.41 1 2
148 0.592 1.885 1 0
149 0.596 1.735 1 0
150 0.6 1.09 1 0
151 0.604 0.35 1 0
152 0.608 -0.455 1 0
153 0.612 -0.725 1 0
154 0.616 -0.705 1 0
155 0.62 -0.54 1 0
156 0.624 -0.315 1 0
157 0.628 -0.195 1 0
158 0.632 -0.115 1 1
159 0.636 -0.09 0 0
160 0.64 -0.08 0 0
161 0.644 -0.075 0 0
162 0.648 -0.08 0 0
163 0.652 -0.085 0 0
164 0.656 -0.085 0 0
165 0.66 -0.085 0 0
166 0.664 -0.08 0 0
167 0.668 -0.08 0 0
168 0.672 -0.085 0 0
169 0.676 -0.085 0 0
170 0.68 -0.085 0 0
171 0.684 -0.075 0 0
172 0.6880000000000001 -0.065 0 0
173 0.6920000000000001 -0.07 0 0
174 0.6960000000000001 -0.075 0 0
175 0.7000000000000001 -0.07 0 0
176 0.704 -0.065 0 0
177 0.708 -0.06 0 0
178 0.712 -0.055 0 0
179 0.716 -0.05 0 0
180 0.72 -0.045 0 0
181 0.724 -0.04 0 0
182 0.728 -0.035 27 4
183 0.732 -0.035 27 0
184 0.736 -0.035 27 0
185 0.74 -0.035 27 0
186 0.744 -0.035 27 0
187 0.748 -0.03 27 0
188 0.752 -0.02 27 0
189 0.756 -0.01 27 0
190 0.76 -0.005 27 0
191 0.764 0.0 27 0
192 0.768 0.005 27 0
193 0.772 0.005 27 0
194 0.776 0.005 27 0
195 0.78 0.01 27 0
196 0.784 0.025 27 0
197 0.788 0.04 27 0
198 0.792 0.045 27 0
199 0.796 0.05 27 0
200 0.8 0.055 27 0
201 0.804 0.055 27 0
202 0.808 0.055 27 0
203 0.812 0.06 27 0
204 0.8160000000000001 0.065 27 0
205 0.8200000000000001 0.07 27 0
206 0.8240000000000001 0.085 27 0
207 0.8280000000000001 0.1 27 0
208 0.8320000000000001 0.105 27 0
209 0.836 0.105 27 0
210 0.84 0.11 27 0
211 0.844 0.115 27 0
212 0.848 0.12 27 0
213 0.852 0.125 27 0
214 0.856 0.12 27 2
215 0.86 0.115 27 0
216 0.864 0.115 27 0
217 0.868 0.115 27 0
218 0.872 0.115 27 0
219 0.876 0.115 27 0
220 0.88 0.115 27 0
221 0.884 0.115 27 0
222 0.888 0.115 27 0
223 0.892 0.115 27 0
224 0.896 0.11 27 0
225 0.9 0.105 27 0
226 0.904 0.1 27 0
227 0.908 0.09 27 0
228 0.912 0.07 27 0
229 0.916 0.05 27 0
230 0.92 0.035 27 0
231 0.924 0.015 27 0
232 0.928 -0.005 27 0
233 0.932 -0.02 27 0
234 0.936 -0.03 27 0
235 0.9400000000000001 -0.04 27 0
236 0.9440000000000001 -0.05 27 0
237 0.9480000000000001 -0.055 27 0
238 0.9520000000000001 -0.06 27 0
239 0.9560000000000001 -0.07 27 1
240 0.96 -0.08 0 0
241 0.964 -0.085 0 0
242 0.968 -0.085 0 0
243 0.972 -0.085 0 0
244 0.976 -0.085 0 0
245 0.98 -0.085 0 0
246 0.984 -0.08 0 0
247 0.988 -0.075 0 0
248 0.992 -0.08 0 0
249 0.996 -0.085 0 0
1

There are 1 best solutions below

4
Reinderien On BEST ANSWER

This is a straightforward application of resample(), but you have to make some aggregation decisions.

from io import StringIO

import pandas as pd

content = '''
index   seconds     amplitude   label_encoding  bound_or_peak
0   0.0     0.035   0   0
1   0.004   0.06    0   0
2   0.008   0.065   0   0
3   0.012   0.075   0   0
4   0.016   0.085   0   0
5   0.02    0.075   0   0
6   0.024   0.065   0   0
...
245     0.98    -0.085  0   0
246     0.984   -0.08   0   0
247     0.988   -0.075  0   0
248     0.992   -0.08   0   0
249     0.996   -0.085  0   0
'''
with StringIO(content) as file:
    df = pd.read_csv(file, delim_whitespace=True)
df['seconds'] *= pd.Timedelta(1, 's')
df.set_index('seconds', drop=True, inplace=True)

sampler = df.resample(rule='2ms')
resampled = sampler.nearest()[['index', 'label_encoding']]
resampled['amplitude'] = sampler.interpolate('time')['amplitude']
resampled['bound_or_peak'] = sampler.asfreq(fill_value=0)['bound_or_peak']

pd.options.display.width = 200
pd.options.display.max_columns = 10
print(resampled)
                        index  label_encoding  amplitude  bound_or_peak
seconds                                                                
0 days 00:00:00             0               0     0.0350              0
0 days 00:00:00.002000      1               0     0.0475              0
0 days 00:00:00.004000      1               0     0.0600              0
0 days 00:00:00.006000      2               0     0.0625              0
0 days 00:00:00.008000      2               0     0.0650              0
...                       ...             ...        ...            ...
0 days 00:00:00.988000    247               0    -0.0750              0
0 days 00:00:00.990000    248               0    -0.0775              0
0 days 00:00:00.992000    248               0    -0.0800              0
0 days 00:00:00.994000    249               0    -0.0825              0
0 days 00:00:00.996000    249               0    -0.0850              0

[499 rows x 4 columns]