Discussion in CSV file
I have a CSV file with three columns as shown below and I want to count the "AT" and "AT + LH" value for each type and date, and then put it in a separate column in another CSV file.
Input CSV file:
╔═══════╦═══════╦═══════╗
║ Date ║ Type ║ Value ║
╠═══════╬═══════╬═══════╣
║ Nov-2 ║ M2222 ║ AT ║
║ Nov-2 ║ M2222 ║ LH ║
║ Nov-3 ║ M2222 ║ LH ║
║ Nov-3 ║ M2223 ║ LH ║
║ Nov-3 ║ M2223 ║ AT ║
╚═══════╩═══════╩═══════╝
I need the output like this:
Output CSV file:
╔═══════╦═══════╦═════════════════════════╦═════════════╗
║ Date ║ Type ║ Total (Total = AT + LH) ║ Value of AT ║
╠═══════╬═══════╬═════════════════════════╬═════════════╣
║ Nov-2 ║ M2222 ║ 2 ║ 1 ║
║ Nov-3 ║ M2222 ║ 1 ║ 0 ║
║ Nov-3 ║ M2223 ║ 2 ║ 1 ║
╚═══════╩═══════╩═════════════════════════╩═════════════╝
I've tried using dictionaries for this and couldn't get the desired result. But as a result I end up with Date and Type columns in one column.
Code:
from collections import Counter
import csv
from collections import defaultdict
from collections import OrderedDict
import collections
import pandas as pd
with open(r'file1','wb') as crack,open(r'file2','wb') as crack1:
w = csv.writer(crack,delimiter=',', quoting = csv.QUOTE_ALL)
w1 = csv.writer(crack1,delimiter=',', quoting = csv.QUOTE_ALL)
for row in zip(['DATE & BASIC TYPE'],['Total Triggerred']):
w.writerow(row)
for row in zip(['DATE & BASIC TYPE'],['No of LOH']):
w1.writerow(row)
with open(r'inputfile, 'rb') as infile:
reader = csv.reader(infile)
next(reader, None)
dict1=defaultdict(list)
for row in reader:
dict1[(row[0],row[1])].append(int(row[3]))
for key, value in dict1.iteritems():
for row in zip([key], [sum(dict1[key])]):
#print key, sum(dict1[key])
w.writerow(row)
with open(r'inputfile', 'rb') as infile:
reader = csv.reader(infile)
next(reader, None)
dict2=defaultdict(list)
for row in reader:
if row[2]=='AUTO RELEASE':
dict2[(row[0],row[1])].append(int(row[3]))
for key, value in dict2.iteritems():
for row in zip([key], [sum(dict2[key])]):
#print key, sum(dict2[key])
w1.writerow(row)
a = pd.read_csv(r'file1.csv')
b = pd.read_csv(r'file2.csv')
merged = a.merge(b, on='DATE & BASIC TYPE')
merged.to_csv(r'outputfile.csv', index=False)
I am getting Output like this:
╔═══════════════════╦═══════╦═════════════╗
║ Date & Type ║ Total ║ Value of AT ║
╠═══════════════════╬═══════╬═════════════╣
║ ('Nov-2','M2222') ║ 2 ║ 1 ║
║ ('Nov-3','M2222') ║ 1 ║ 0 ║
║ ('Nov-3','M2223') ║ 2 ║ 1 ║
╚═══════════════════╩═══════╩═════════════╝
source to share
I think this might help you:
import pandas as pd
df = pd.read_csv('input_csv.csv', sep=';')
print df.groupby(['Date','Type'])['Type','Value'].count()
Here's some additional code that Albe originally provided in the comments.
import pandas as pd
#read csv file using data frame
df = pd.read_csv('input_csv.csv', sep=',')
#count how many "AT" values in a data frame
df['Value AT'] = (df['Value'] == 'AT').astype(int)
#sum "AT" and "LH" of a data frame
df['TOT'] = (df['Value'] == 'AT').astype(int) + (df['Value'] == 'LH').astype(int)
#grouping by data and type
print df.groupby(['Date', 'Type'])['TOT', 'Value AT'].sum()
... At the end, you can add this piece of code to create the output csv file:
#print df_grouped
df_grouped.to_csv(r'outputfile.csv', index=True)
source to share
I think you are exaggerating the task. In such a situation, it is much easier to use T-SQL. For example, you can use MySQL, create a table for a CSV file, load data into it using this approach: Syntax LOAD DATA INFILE . Then select the aggregated data you need using a query like this:
SELECT
t.Date,
t.Type,
(
SELECT COUNT(*)
FROM table as t1
WHERE t1.Date = t.Date and t1.Type = t.Type and t1.Value IN ('AT', 'LH')
) as `Total (Total = AT + LH)`,
(
SELECT COUNT(*)
FROM table as t2
WHERE t2.Date = t.Date and t2.Type = t.Type and t2.Value IN ('AT')
) as `Value of AT`
FROM table as t
GROUP BY t.Data, t.Type;
And load the data back to CSV: SELECT ... INTO Syntax
source to share