Discussion in CSV file

Question

Discussion in CSV file

I have a CSV file with three columns as shown below and I want to count the "AT" and "AT + LH" value for each type and date, and then put it in a separate column in another CSV file.

Input CSV file:

╔═══════╦═══════╦═══════╗
║ Date  ║ Type  ║ Value ║
╠═══════╬═══════╬═══════╣
║ Nov-2 ║ M2222 ║ AT    ║
║ Nov-2 ║ M2222 ║ LH    ║
║ Nov-3 ║ M2222 ║ LH    ║
║ Nov-3 ║ M2223 ║ LH    ║
║ Nov-3 ║ M2223 ║ AT    ║
╚═══════╩═══════╩═══════╝

I need the output like this:

Output CSV file:

╔═══════╦═══════╦═════════════════════════╦═════════════╗
║ Date  ║ Type  ║ Total (Total = AT + LH) ║ Value of AT ║
╠═══════╬═══════╬═════════════════════════╬═════════════╣
║ Nov-2 ║ M2222 ║                       2 ║           1 ║
║ Nov-3 ║ M2222 ║                       1 ║           0 ║
║ Nov-3 ║ M2223 ║                       2 ║           1 ║
╚═══════╩═══════╩═════════════════════════╩═════════════╝

I've tried using dictionaries for this and couldn't get the desired result. But as a result I end up with Date and Type columns in one column.

Code:

from collections import Counter
import csv
from collections import defaultdict
from collections import OrderedDict
import collections
import pandas as pd

with open(r'file1','wb') as crack,open(r'file2','wb') as crack1:

  w = csv.writer(crack,delimiter=',', quoting = csv.QUOTE_ALL)
  w1 = csv.writer(crack1,delimiter=',', quoting = csv.QUOTE_ALL)

  for row in zip(['DATE & BASIC TYPE'],['Total Triggerred']):
    w.writerow(row)

  for row in zip(['DATE & BASIC TYPE'],['No of LOH']):
   w1.writerow(row)

 with open(r'inputfile, 'rb') as infile:
  reader = csv.reader(infile)
  next(reader, None)
  dict1=defaultdict(list)

  for row in reader:
   dict1[(row[0],row[1])].append(int(row[3]))

  for key, value in dict1.iteritems():
   for row in zip([key], [sum(dict1[key])]):
    #print key, sum(dict1[key])
      w.writerow(row)

 with open(r'inputfile', 'rb') as infile:
   reader = csv.reader(infile)
   next(reader, None)
   dict2=defaultdict(list)

 for row in reader:
   if row[2]=='AUTO RELEASE':
      dict2[(row[0],row[1])].append(int(row[3]))

 for key, value in dict2.iteritems():
   for row in zip([key], [sum(dict2[key])]):
    #print key, sum(dict2[key])
     w1.writerow(row)

 a = pd.read_csv(r'file1.csv')
 b = pd.read_csv(r'file2.csv')
 merged = a.merge(b, on='DATE & BASIC TYPE')
 merged.to_csv(r'outputfile.csv', index=False)

I am getting Output like this:

╔═══════════════════╦═══════╦═════════════╗
║    Date & Type    ║ Total ║ Value of AT ║
╠═══════════════════╬═══════╬═════════════╣
║ ('Nov-2','M2222') ║     2 ║           1 ║
║ ('Nov-3','M2222') ║     1 ║           0 ║
║ ('Nov-3','M2223') ║     2 ║           1 ║
╚═══════════════════╩═══════╩═════════════╝

+3

python-2.7 csv

user3827728 Dec 15. '14 at 1:15

source to share

2 answers

Albe · Answer 1 · 2014-12-16T15:15:26+0000

I think this might help you:

import pandas as pd

df = pd.read_csv('input_csv.csv', sep=';')
print df.groupby(['Date','Type'])['Type','Value'].count()

Here's some additional code that Albe originally provided in the comments.

import pandas as pd

#read csv file using data frame 
df = pd.read_csv('input_csv.csv', sep=',')

#count how many "AT" values in a data frame 
df['Value AT'] = (df['Value'] == 'AT').astype(int)

#sum "AT" and "LH" of a data frame
df['TOT'] = (df['Value'] == 'AT').astype(int) + (df['Value'] == 'LH').astype(int)

#grouping by data and type 
print df.groupby(['Date', 'Type'])['TOT', 'Value AT'].sum()

... At the end, you can add this piece of code to create the output csv file:

#print df_grouped 
df_grouped.to_csv(r'outputfile.csv', index=True)

AlexB · Answer 2 · 2014-12-16T06:47:15+0000

I think you are exaggerating the task. In such a situation, it is much easier to use T-SQL. For example, you can use MySQL, create a table for a CSV file, load data into it using this approach: Syntax LOAD DATA INFILE . Then select the aggregated data you need using a query like this:

SELECT 
t.Date,
t.Type,
(
    SELECT COUNT(*) 
    FROM table as t1
    WHERE t1.Date = t.Date and t1.Type = t.Type and t1.Value IN ('AT', 'LH')
) as `Total (Total = AT + LH)`,
(
    SELECT COUNT(*) 
    FROM table as t2
    WHERE t2.Date = t.Date and t2.Type = t.Type and t2.Value IN ('AT')
) as `Value of AT`
FROM table as t
GROUP BY t.Data, t.Type;

And load the data back to CSV: SELECT ... INTO Syntax

Discussion in CSV file

More articles: