How to transpose in bash / awk / other
How to transfer the following data
colA colB colC colD
val1 val2 val3 val4
val5 val6 val7 val8
To make it look like this:
colA val1 val5
colB val2 val6
colC val3 val7
colD val4 val8
This data is delimited by tabs, but can also be separated by commas.
I can do this very easily in excel, but I am wondering how to do it in bash using awk or something?
EDIT1
I can make it work with spaces
$ cat testdata2_withspace.txt
colA colB colC colD
val1 val2 val3 val4
val5 val6 val7 val8
$ ./transpose3.sh testdata2_withspace.txt
colA val1 val5
colB val2 val6
colC val3 val7
colD val4 val8
but it doesn't do the same with comma delimited files
$ cat testdata2.txt colA,colB,colC,colD val1,val2,val3,val4 val5,val6,val7,val8 $ ./transpose3.sh testdata2.txt colA,colB,colC,colD val1,val2,val3,val4 val5,val6,val7,val8
source to share
$ cat tst.awk
{ for (i=1; i<=NF; i++) cell[NR,i]=$i }
END {
for (row=1; row<=NF; row++) {
for (col=1; col<=NR; col++) {
printf "%s%s", cell[col,row], (col<NR?OFS:ORS)
}
}
}
$ awk -f tst.awk file
colA val1 val5
colB val2 val6
colC val3 val7
colD val4 val8
To use commas instead of spaces, add BEGIN{ FS=OFS="," }
at the beginning as a separator .
source to share
There is an awk script here that might work:
#!/bin/bash
transpose()
{
awk '
BEGIN {
FS = ",";
OFS = ",";
}
{
if (max_nf<NF)
max_nf=NF
max_nr=NR
for (x=1; x<=NF; ++x)
vector[x, NR]=$x
}
END {
for (x=1; x<=max_nf; ++x) {
for (y=1; y<=max_nr; ++y)
printf("%s ", vector[x, y])
printf("\n")
}
}' ${1}
}
transpose ${1}
source to share
Since the title of the question contains "/ other" I like to post the Python + Bash answer.
The following is for a tab / space delimited file, minor adjustments needed for CSV
$ trans () { python -c 'print "\n".join("\t".join(t)for t in zip(*[l.split()for l in open("'"$1"'")]))' ; }
$ cat test2.csv
Roll_num Marks Grade
1 75 A
2 60 C
27 68 B
61 45 E
$ trans test2.csv
Roll_num 1 2 27 61
Marks 75 60 68 45
Grade A C B E
The wrapper function is defined to be reliable with respect to the names "fun ny"
$ cp test2.csv fun\ ny
$ trans fun\ ny
Roll_num 1 2 27 61
Marks 75 60 68 45
Grade A C B E
$
Adding
Below are the minor adjustments needed for the CSV file
$ trans, () { python -c'print"\n".join(",".join(r)for r in zip(*[[i.strip()for i in l.strip().split(",")]for l in open("'"$1"'")]))'; }
$ cat test111.csv
Sales #, Date, Tel Number, Comment
393ED3, 5/12/2010, 5555551212, left message
585E54, 6/15/2014, 5555551213, voice mail
585868, 8/16/2010, , number is 5555551214
$ trans, test111.csv
Sales #,393ED3,585E54,585868
Date,5/12/2010,6/15/2014,8/16/2010
Tel Number,5555551212,5555551213,
Comment,left message,voice mail,number is 5555551214
For those who are not familiar with Python and do not know how it works,
$ python -c'string'
executes a Python statement 'string'
, which in our case (given that it fname
is an extension S1
)
print "\n".join("\t".join(t)for t in zip(*[l.split()for l in open("fname")]))
Python strings have a method .join(iterable)
that acts on the iteration of other strings,
' r '.join(('a','b','c')) => 'a r b r c'
and here we have '\n
.join (...) that takes the elements in
... and joins them with a new line, but what we have in
... `?
We have a generator expression where for each iterable element t
returned by the inline zip
we attach the elements of the iterable using tabs
We can conclude that we are printing individual lines of tab-delimited elements, with those elements being elements t
, each line printed corresponding to an iterable t
returned zip
.
The next question is, what are the elements of each printed line? we must study
zip(*[l.split() for l in open("fname")])
internally, l
is a line in the original file, we get them iterated over the file object returned open("fname")
, we work on each line with a method .split()
that breaks into spaces by default, and returns a split list
elements
[l.split() for l in open("fname")]
is a so-called list comprehension, meaning that in this case the value of the expression is a list of lists, one list of separated elements for each line of the input file.
We are now left with the equivalent
lol = [l.split() for l in open("fname")]
zip(*lol)
zip
is a builtin that takes N
sequences as arguments and returns a sequence of sequences as a list of tuples, taking one element from each sequence in turn
zip((1,2,3),(4,5,6),(7,8,9)) => [(1,4,7),(2,5,8),(3,6,9)]
ultimately the prefix operator *
unpacks the list, so we can conclude that the elements of each printable line (elements t for t in zip(....)
) are the first elements from each line, the second elements from each line, etc. so we completed the task of printing the transposition of the input file.
source to share