Sort columns from BASH file
I have the following shell script that reads data from a file entered at the command line. The file is a matrix of numbers and I need to separate the file by columns and then sort the columns. Right now I can read the file and output the individual columns, but I am at a loss as to how to sort. I entered a sort operator, but it only sorts the first column.
EDIT: I decided to take another route and actually transpose the matrix in order to turn the columns into rows. Since I have to calculate the mean and mean later and have already successfully done this for the file in different ways earlier in the script - I was suggested to try and "spin" the matrix if you want to turn columns into rows.
Here is my UPDATED code
declare -a col=( )
read -a line < "$1"
numCols=${#line[@]} # save number of columns
index=0
while read -a line ; do
for (( colCount=0; colCount<${#line[@]}; colCount++ )); do
col[$index]=${line[$colCount]}
((index++))
done
done < "$1"
for (( width = 0; width < numCols; width++ )); do
for (( colCount = width; colCount < ${#col[@]}; colCount += numCols ) ); do
printf "%s\t" ${col[$colCount]}
done
printf "\n"
done
This gives me the following output:
1 9 6 3 3 6
1 3 7 6 4 4
1 4 8 8 2 4
1 5 9 9 1 7
1 5 7 1 4 7
Although I'm currently looking for:
1 3 3 6 6 9
1 3 4 4 6 7
1 2 4 4 8 8
1 1 5 7 9 9
1 1 4 5 7 7
To try and sort the data, I've tried the following:
sortCol=${col[$colCount]}
eval col[$colCount]='($(sort <<<"${'$sortCol'[*]}"))'
Also: (how I sorted the string after reading from the string)
sortCol=( $(printf '%s\t' "${col[$colCount]}" | sort -n) )
If you could provide any insight on this it would be appreciated!
source to share
Please note that, as mentioned in the comments, a pure bash solution is not very pretty. There are several ways to do this, but this is probably the most direct one. The following requires reading all the values ββin the row into an array and storing the matrix stride
so that it can be transposed to read all the column values ββinto the row matrix and sort. All sorted columns are inserted into a new matrix of rows a2
. Transposing this matrix of rows returns the original matrix in the sort order of the columns.
Note , this will work for any rank of the column matrix in your file.
#!/bin/bash
test -z "$1" && { ## validate number of input
printf "insufficient input. usage: %s <filename>\n" "${0//*\//}"
exit 1;
}
test -r "$1" || { ## validate file was readable
printf "error: file not readable '%s'. usage: %s <filename>\n" "$1" "${0//*\//}"
exit 1;
}
## function: my sort integer array - accepts array and returns sorted array
## Usage: array=( "$(msia ${array[@]})" )
msia() {
local a=( "$@" )
local sz=${#a[@]}
local _tmp
[[ $sz -lt 2 ]] && { echo "Warning: array not passed to fxn 'msia'"; return 1; }
for((i=0;i<$sz;i++)); do
for((j=$((sz-1));j>i;j--)); do
[[ ${a[$i]} -gt ${a[$j]} ]] && {
_tmp=${a[$i]}
a[$i]=${a[$j]}
a[$j]=$_tmp
}
done
done
echo ${a[@]}
unset _tmp
unset sz
return 0
}
declare -a a1 ## declare arrays and matrix variables
declare -a a2
declare -i cnt=0
declare -i stride=0
declare -i sz=0
while read line; do ## read all lines into array
a1+=( $line );
(( cnt == 0 )) && stride=${#a1[@]} ## calculate matrix stride
(( cnt++ ))
done < "$1"
sz=${#a1[@]} ## calculate matrix size
## print original array
printf "\noriginal array:\n\n"
for ((i = 0; i < sz; i += stride)); do
for ((j = 0; j < stride; j++)); do
printf " %s" ${a1[i+j]}
done
printf "\n"
done
## sort columns from stride array
for ((j = 0; j < stride; j++)); do
for ((i = 0; i < sz; i += stride)); do
arow+=( ${a1[i+j]} )
done
a2+=( $(msia ${arow[@]}) ) ## create sorted array
unset arow
done
## print the sorted array
printf "\nsorted array:\n\n"
for ((j = 0; j < cnt; j++)); do
for ((i = 0; i < sz; i += cnt)); do
printf " %s" ${a2[i+j]}
done
printf "\n"
done
exit 0
Output
$ bash sort_cols2.sh dat/matrix.txt
original array:
1 1 1 1 1
9 3 4 5 5
6 7 8 9 7
3 6 8 9 1
3 4 2 1 4
6 4 4 7 7
sorted array:
1 1 1 1 1
3 3 2 1 1
3 4 4 5 4
6 4 4 7 5
6 6 8 9 7
9 7 8 9 7
source to share
Awk script
awk '
{for(i=1;i<=NF;i++)a[i]=a[i]" "$i} #Add to column array
END{
for(i=1;i<=NF;i++){
split(a[i],b) #Split column
x=asort(b) #sort column
for(j=1;j<=x;j++){ #loop through sort
d[j]=d[j](d[j]~/./?" ":"")b[j] #Recreate lines
}
}
for(i=1;i<=NR;i++)print d[i] #Print lines
}' file
Output
1 1 1 1 1
3 3 2 1 1
3 4 4 5 4
6 4 4 7 5
6 6 8 9 7
9 7 8 9 7
Here's my entry for this little exercise. An arbitrary number of columns should be processed. I assume they are separated by spaces:
#!/bin/bash
linenumber=0
while read line; do
i=0
# Create an array for each column.
for number in $line; do
[ $linenumber == 0 ] && eval "array$i=()"
eval "array$i+=($number)"
(( i++ ))
done
(( linenumber++ ))
done <$1
IFS=$'\n'
# Sort each column
for j in $(seq 0 $i ); do
thisarray=array$j
eval array$j='($(sort <<<"${'$thisarray'[*]}"))'
done
# Print each array 0'th entry, then 1, then 2, etc...
for k in $(seq 0 ${#array0[@]}); do
for j in $(seq 0 $i ); do
eval 'printf ${array'$j'['$k']}" "'
done
echo ""
done
source to share
Not bash
, but I think this code is python
worth looking at how this task can be achieved with built-in functions.
From interpreter
:
$ cat matrix.txt
1 1 1 1 1
9 3 4 5 5
6 7 8 9 7
3 6 8 9 1
3 4 2 1 4
6 4 4 7 7
$ python
Python 2.7.3 (default, Jun 19 2012, 17:11:17)
[GCC 4.4.3] on hp-ux11
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> f = open('./matrix.txt')
>>> for row in zip(*[sorted(list(a))
for a in zip(*[a.split() for a in f.readlines()])]):
... print ' '.join(row)
...
1 1 1 1 1
3 3 2 1 1
3 4 4 5 4
6 4 4 7 5
6 6 8 9 7
9 7 8 9 7
source to share