Column and Row manipulation Python Pandas

1k Views Asked by At

This is my first program in Pandas on my own and I am trying to perform some csv manipulations by column and rows. I have a transition repository of multiple files. The transition repository keep adding file into it. I am trying to dynamically read the files and perform some operations and write to a master csv file in another folder.

Input

1. Folder_1: `Transition_Data`  


Test_1.csv, Test_2.csv

    Nos,Time,Count          Nos,Time,Count
    -------------------     ------------------
    2341,12:00:00,9865      1234,12:30:00,7865
    7352,12:00:00,8969      8435,12:30:00,7649

2. Folder2: Data_repository:Master_2.csv


    Nos,00:00:00
    ------------
    1234,1000
    8435,5243
    2341,563
    7352,345

3.Expected Output 

Nos,00:00:00,12:00:00,12:30:00
----------------------------------
1234,1000,0,6865
8435,5243,0,2406
2341,563,9302,0
7352,345,8624,0

Read Nos column from transition_data files and check where Nos is located in Master_2.csv The create a new column with Time as new header every time and subtract the col[2] of Transition_data files with col[1] of Master_2.csv fill the new value in newly created column if data gaps be filled with 0. I did try with few examples around but I messed up.

Program Updates as explained below and now have problem routing the logic with file read and write

    import pandas as pd
    import os
    import numpy as np
    import glob

path_1 = '/Transition_Data/'
path_2 = 'Data_repository/Master_2.csv'

df_1 = pd.DataFrame(dict(Nos=Nos, Time=Time, Count=Count))

pivot = pd.pivot_table(path_1, '/.*CSV, index='Nos', columns='Time', values='Count')

df_master = pd.DataFrame('Master_2.csv', {'Nos':, '00:00:00':}).set_index('Nos')

result = df_master.join(pivot, how='inner')

result[result.columns[1:]] = result[result.columns[1:]].sub(result[result.columns[0]], axis=0)

result.fillna(0)

I tried the above program and got the following error

Traceback (most recent call last):
  File "read_test.py", line 19, in <module>
    df = pd.read_csv(filename, header='Count')
  File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 420, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 218, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 502, in __init__
    self._make_engine(self.engine)
  File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 610, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 972, in __init__
    self._reader = _parser.TextReader(src, **kwds)
  File "parser.pyx", line 476, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4538)
TypeError: an integer is required
1

There are 1 best solutions below

0
On

The easiest way I can see to do this is to join them all into a single DataFrame, sort the columns by time, then shift and subtract to get the delta:

import pandas as pd
import os

path_1 = 'Transition_Data/'
path_2 = 'Data_repository/Master_2.csv'

# Read data, and combine "transition" data into 
# single joined data frame
master = pd.read_csv(path_2)
other_data = pd.concat([
        pd.read_csv(path_1 + f) for f in os.listdir(path_1)
    ])

# Index master data frame by Nos
master.set_index('Nos', inplace=True)

# Index transition data by Nos and Time
other_data.set_index(['Nos', 'Time'], inplace=True)

# Convert to series (to remove Count column heading)
# and unstack time to convert to columns
other_data = other_data['Count'].unstack('Time')

# Join the data sets on the Time axis
joined = pd.concat([master, other_data], axis=1)

# Sort the data sets by Time
joined = joined.sort_index(axis=1)

# Fill na values with data in previous period
joined = joined.fillna(method='pad',axis=1)

# Shift dataframe and subtract to get delta
delta = joined - joined.shift(axis=1).fillna(0)

print(delta)

This gives your desired output:

      00:00:00  12:00:00  12:30:00
Nos                               
1234      1000         0      6865
2341       563      9302         0
7352       345      8624         0
8435      5243         0      2406