Looking for python help for converting data from wide to long (?)
My data looks something like this:
channelId,utc,scet,val1,val2
A-0001,2024-061T22:00:05.02064,0.03,3,
A-0002,2024-061T22:00:06.02064,0.07,2,
A-0001,2024-061T22:00:11.02064,0.02,2,
A-0002,2024-061T22:00:12.02064,0.05,7,
A-0001,2024-061T22:01:12.365611,0.01,1.5,
A-0002,2024-061T22:01:14.365611,.07,16
and I want to generate a table where you have the following columns:
Time, A-0001_val1, A-0001_val2, A-0002_val1, A-0002_val2....
Because not all values share the same time stamps, I would like to collapse down to 1 minute intervals.
So far I have this:
import pandas as pd
# Read the input table into a DataFrame
df = pd.read_csv('~/Desktop/test_file_1.csv')
# Convert timestamp columns to datetime format with explicit format specification
df['utc'] = pd.to_datetime(df['utc'], format='%Y-%jT%H:%M:%S.%f')
# Round timestamps to the nearest minute
df['utc'] = df['utc'].dt.round('min')
# Pivot the DataFrame
df_pivot = df.pivot_table(index=['utc'], columns='channelId', values=['val1', 'val2'])
df_reset = df_pivot.reset_index()
df_reset['utc'] = pd.to_datetime(df_reset['utc'])
df_reset.set_index('utc', inplace=True)
# Resample the DataFrame to get values for every minute
df_resampled = df_reset.resample('T').last().ffill()
# Flatten multi-level column index
df_resampled.columns = [f'{col[1]}_{col[0]}' for col in df_resampled.columns.values]
# Reset index
df_resampled.reset_index(inplace=True)
# Rename columns
df_resampled.rename(columns={'ert': 'Time'}, inplace=True)
df_final = df_resampled[['Time', *sorted(df_resampled.columns[1:])]]
# Write the output table to a CSV file
df_final.to_csv('output_table_3.csv', index=False)
My output looks like this:
Time, A-0001_val1, A-0001_val2, A-0002_val1, A-0002_val2
2024-03-01 22:00:00,0.02,2,0.05,7
2024-03-01 22:00:01,0.01,1.5,0.06,16
I think it's okay but I'm curious to see if anyone has a better approach here.
You were very close to solving it:
The original data is
and the code ought to be
which gives