Sort the file to put 10, 11, 12 ... to 1, 2, 3 ... and X, Y
I have a list of chromosomal data with columns (chromosome, start and end), for example:
chr1 6252071 6253740
chr1 6965107 6966070
chr1 6966038 6967016
chr1 7066595 7068694
chr1 7100956 7102296
chr1 7153422 7154635
chr1 7155112 7156181
....
chr2
....
chr10
....
chrX
....
chrY
....
and etc.
I am trying to use bash to sort the chromosome sections in this order:
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr18
chr19
chr1
chr2
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chrM
chrX
chrY
in the first column and then numerically by starting position in the second column, but no variation sort
seems to do the job. Any ideas? Thank.
+3
source to share
2 answers
perl -E '
open $f, "<", shift;
say join "",
map {$_->[0]}
sort {length($b->[1]) <=> length($a->[1]) or $a->[1] cmp $b->[1]}
map {[$_, (split)[0]]}
<$f>
' file
File first open
. Then it uses the Schwartz transform: read the following command from bottom to top:
- read the lines:
<$f>
- convert the strings to a list of pairs: original string and first word:
map {[$_, (split)[0]}
- sort, first by length (from longest to shortest), then lexically (from A to Z)
- converts a list of pairs to a list of strings (the first element of a pair)
map {$_->[0]}
- join (lines still have their newlines, so joins to empty line
0
source to share