Concatenate two csv files according to matching lines and add new columns in linux

I am developing an application using java, but for this I need a csv file in some order. I don't know Linux very well, but wondering if there is a way to combine csv files in the required format.

I have two csv files containing hundreds of thousands of entries. sample below:

name,Direction,Date
abc,sent,Jan 21 2014 02:06 
xyz,sent,Nov 21 2014 01:09
pqr,sent,Oct 21 2014 03:06  

      

and

name,Direction,Date
abc,received,Jan 22 2014 02:06
xyz,received,Nov 22 2014 02:06

      

So this second csv file will contain some records of file 1. I need a new csv like this:

name,Direction,Date,currentDirection,receivedDate
abc,sent,Jan 21 2014 02:06,received,Jan 22 2014 02:06
xyz,sent,Nov 21 2014 01:09,received,Nov 22 2014 02:06
pqr,sent,Oct 21 2014 03:06

      

You must add columns (4th and 5th columns) according to the corresponding data in column1. if there is no corresponding data in the second file, the columns must be empty as stated above.

so is there a bash command in linux for this?

+3


source to share


2 answers


awk might work for you:

kent$  awk -F, -v OFS="," 
       'BEGIN{print "name,Direction,Date,currentDirection,receivedDate"}
        NR==FNR&&NR>1{a[$1]=$0;next}
        FNR>1{printf "%s%s\n",$0,($1 in a?FS a[$1]:"")}' 2.csv 1.csv
name,Direction,Date,currentDirection,receivedDate
abc,sent,Jan 21 2014 02:06,abc,received,Jan 22 2014 02:06
xyz,sent,Nov 21 2014 01:09,xyz,received,Nov 22 2014 02:06
pqr,sent,Oct 21 2014 03:06

      



Update

kent$  awk -F, -v OFS="," 'BEGIN{print "name,Direction,Date,currentDirection,receivedDate"}
        NR==FNR&&NR>1{a[$1]=$2 FS $3;next}
        FNR>1{printf "%s%s\n",$0,($1 in a?FS a[$1]:"")}' 2.csv 1.csv 
name,Direction,Date,currentDirection,receivedDate
abc,sent,Jan 21 2014 02:06,received,Jan 22 2014 02:06
xyz,sent,Nov 21 2014 01:09,received,Nov 22 2014 02:06
pqr,sent,Oct 21 2014 03:06

      

+3


source


You can use this command join

to accomplish this. The first file is 1.csv

and the second is2.csv

join -1 1 -2 1 -t, -a 1  1.csv 2.csv | sed "s/Direction,Date/currentDirection,receivedDate/2"

      

Output:

name,Direction,Date,currentDirection,receivedDate
abc,sent,Jan 21 2014 02:06,received,Jan 22 2014 02:06
xyz,sent,Nov 21 2014 01:09,received,Nov 22 2014 02:06
pqr,sent,Oct 21 2014 03:06 

      

Explanation:



You want to join the first field in both files, so -1 1 -2 1

You want to use a comma, so -t,

You want to display all unmatched records in file 1, so -a 1

you can add a -a 2

.

The / 2 command in sed tells you sed

to replace the second occurrence

+3


source







All Articles