Operations on columns of multiple Pandas files

Question

Operations on columns of multiple Pandas files

I am trying to perform some arithmetic operations in Python Pandas and concatenate the result into one of the files.

Path_1: File_1.csv, File_2.csv, ....

This path has several files that are expected to grow with time. with the following columns

    File_1.csv    |  File_2.csv
    Nos,12:00:00  |  Nos,12:30:00

    123,1451         485,5464
    656,4544         456,4865
    853,5484         658,4584

Path_2: Master_1.csv

Nos,00:00:00
123,2000
485,1500
656,1000
853,2500
456,4500
658,5000

I am trying to read n

number of .csv

files from Path_1

and compare header timers col[1]

to col[last]

timeservers Master_1.csv

.

If it Master_1.csv

does not have this time, it should create a new column with timeservers from the path_1 .csv

files and update the values with the given col['Nos']

, subtracting them from col[1]

Master_1.csv

.

If present col

with time from path_1 file

, search col['Nos']

and then replace NAN

with subtracted values relative to that col['Nos']

.

i.e.

Expected result in Master_1.csv

Nos,00:00:00,12:00:00,12:30:00,
    123,2000,549,NAN,
    485,1500,NAN,3964,
    656,1000,3544,NAN
    853,2500,2984,NAN
    456,4500,NAN,365
    658,5000,NAN,-416

I can understand arithmetic calculations, but I cannot get hung up on Nos

and timeseries

. I tried to flatten the code and try to work around the loop. Need help in this context. thank

import pandas as pd 
import numpy as np

path_1 = '/'
path_2 = '/'

df_1 = pd.read_csv(os.path_1('/.*csv'), Index=None, columns=['Nos', 'timeseries'] #times series is different in every file eg: 12:00, 12:30, 17:30 etc
df_2 = pd.read_csv('master_1.csv', Index=None, columns=['Nos', '00:00:00']) #00:00:00 time series

for Nos in df_1 and df_2:
    df_1['Nos'] = df_2['Nos']
    new_tseries = df_2['00:00:00'] - df_1['timeseries']

merged.concat('master_1.csv', Index=None, columns=['Nos', '00:00:00', 'new_tseries'], axis=0) # new_timeseries is the dynamic time series that every .csv file will have from path_1

+1

python file pandas time-series csv

Phani.lav June 26. 15 at 13:02

source to share

1 answer

maxymoo · Accepted Answer · 2015-06-29T01:17:28+0000

You can do it in three steps

Read your csv into a list of data frames
Concatenate data files (Equivalent to left join SQL or Excel VLOOKUP
Calculate derived columns using vectorized subtraction.

Here's some code you can try:

#read dataframes into a list
import glob
L = []
for fname in glob.glob(path_1+'*.csv'):
   L.append(df.read_csv(fname))

#read master dataframe, and merge in other dataframes
df_2 = pd.read_csv('master_1.csv')
for df in L:
   df_2 = pd.merge(df_2,df, on = 'Nos', how = 'left')

#for each column, caluculate the difference with the master column
df_2.apply(lambda x: x - df_2['00:00:00'])

Operations on columns of multiple Pandas files

More articles: