How to transpose in bash / awk / other

How to transfer the following data

colA    colB    colC    colD
val1    val2    val3    val4
val5    val6    val7    val8

      

To make it look like this:

colA    val1    val5
colB    val2    val6
colC    val3    val7
colD    val4    val8

      

This data is delimited by tabs, but can also be separated by commas.

I can do this very easily in excel, but I am wondering how to do it in bash using awk or something?

EDIT1

I can make it work with spaces

$ cat testdata2_withspace.txt
colA colB colC colD
val1 val2 val3 val4
val5 val6 val7 val8

$ ./transpose3.sh testdata2_withspace.txt
colA val1 val5
colB val2 val6
colC val3 val7
colD val4 val8

      

but it doesn't do the same with comma delimited files

$ cat testdata2.txt
colA,colB,colC,colD
val1,val2,val3,val4
val5,val6,val7,val8

$ ./transpose3.sh testdata2.txt
colA,colB,colC,colD val1,val2,val3,val4 val5,val6,val7,val8

      

+3


source to share


3 answers


$ cat tst.awk
{ for (i=1; i<=NF; i++) cell[NR,i]=$i }
END {
    for (row=1; row<=NF; row++) {
        for (col=1; col<=NR; col++) {
            printf "%s%s", cell[col,row], (col<NR?OFS:ORS)
        }
    }
}

$ awk -f tst.awk file
colA val1 val5
colB val2 val6
colC val3 val7
colD val4 val8

      



To use commas instead of spaces, add BEGIN{ FS=OFS="," }

at the beginning as a separator .

+1


source


There is an awk script here that might work:



#!/bin/bash 

transpose() 
{ 
  awk ' 
      BEGIN {
          FS = ",";
          OFS = ",";
      }

      { 
              if (max_nf<NF) 
                    max_nf=NF 
              max_nr=NR 
              for (x=1; x<=NF; ++x) 
                     vector[x, NR]=$x 
      } 

      END { 
              for (x=1; x<=max_nf; ++x) { 
                   for (y=1; y<=max_nr; ++y) 
                        printf("%s ", vector[x, y]) 
                   printf("\n") 
               } 
          }'  ${1} 
} 

transpose ${1} 

      

0


source


Since the title of the question contains "/ other" I like to post the Python + Bash answer.

The following is for a tab / space delimited file, minor adjustments needed for CSV

$ trans () { python -c 'print "\n".join("\t".join(t)for t in zip(*[l.split()for l in open("'"$1"'")]))' ; }
$ cat test2.csv
Roll_num Marks Grade
1        75    A
2        60    C
27       68    B
61       45    E
$ trans test2.csv 
Roll_num        1       2       27      61
Marks   75      60      68      45
Grade   A       C       B       E

      

The wrapper function is defined to be reliable with respect to the names "fun ny"

$ cp test2.csv fun\ ny
$ trans fun\ ny
Roll_num        1       2       27      61
Marks   75      60      68      45
Grade   A       C       B       E
$

      

Adding

Below are the minor adjustments needed for the CSV file

$ trans, () { python -c'print"\n".join(",".join(r)for r in zip(*[[i.strip()for i in l.strip().split(",")]for l in open("'"$1"'")]))'; }
$ cat test111.csv
Sales #, Date, Tel Number, Comment
393ED3, 5/12/2010, 5555551212, left message
585E54, 6/15/2014, 5555551213, voice mail
585868, 8/16/2010, , number is 5555551214
$ trans, test111.csv
Sales #,393ED3,585E54,585868
Date,5/12/2010,6/15/2014,8/16/2010
Tel Number,5555551212,5555551213,
Comment,left message,voice mail,number is 5555551214

      

For those who are not familiar with Python and do not know how it works,

$ python -c'string'

      

executes a Python statement 'string'

, which in our case (given that it fname

is an extension S1

)

 print "\n".join("\t".join(t)for t in zip(*[l.split()for l in open("fname")]))

      

Python strings have a method .join(iterable)

that acts on the iteration of other strings,

' r '.join(('a','b','c')) => 'a r b r c'

      

and here we have '\n

.join (...) that takes the elements in

... and joins them with a new line, but what we have in

... `?

We have a generator expression where for each iterable element t

returned by the inline zip

we attach the elements of the iterable using tabs

We can conclude that we are printing individual lines of tab-delimited elements, with those elements being elements t

, each line printed corresponding to an iterable t

returned zip

.

The next question is, what are the elements of each printed line? we must study

zip(*[l.split() for l in open("fname")])

      

internally, l

is a line in the original file, we get them iterated over the file object returned open("fname")

, we work on each line with a method .split()

that breaks into spaces by default, and returns a split list

elements

[l.split() for l in open("fname")]

      

is a so-called list comprehension, meaning that in this case the value of the expression is a list of lists, one list of separated elements for each line of the input file.

We are now left with the equivalent

lol = [l.split() for l in open("fname")]
zip(*lol) 

      

zip

is a builtin that takes N

sequences as arguments and returns a sequence of sequences as a list of tuples, taking one element from each sequence in turn

zip((1,2,3),(4,5,6),(7,8,9)) => [(1,4,7),(2,5,8),(3,6,9)]

      

ultimately the prefix operator *

unpacks the list, so we can conclude that the elements of each printable line (elements t for t in zip(....)

) are the first elements from each line, the second elements from each line, etc. so we completed the task of printing the transposition of the input file.

-1


source







All Articles