Sort the file to put 10, 11, 12 ... to 1, 2, 3 ... and X, Y

I have a list of chromosomal data with columns (chromosome, start and end), for example:

chr1    6252071 6253740
chr1    6965107 6966070
chr1    6966038 6967016
chr1    7066595 7068694
chr1    7100956 7102296
chr1    7153422 7154635
chr1    7155112 7156181
....
chr2
....
chr10
....
chrX
....
chrY
....

      

and etc.

I am trying to use bash to sort the chromosome sections in this order:

chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr18
chr19
chr1
chr2
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chrM
chrX
chrY

      

in the first column and then numerically by starting position in the second column, but no variation sort

seems to do the job. Any ideas? Thank.

+3


source to share


2 answers


Split the file into two streams with separate filtering and then recompile them:



cat <(grep    '^chr1[[:digit:]][[:space:]]' <inputfile | sort) \
    <(grep -v '^chr1[[:digit:]][[:space:]]' <inputfile | sort) \
    >outputfile

      

+1


source


perl -E '
  open $f, "<", shift; 
  say join "", 
      map {$_->[0]}
      sort {length($b->[1]) <=> length($a->[1]) or $a->[1] cmp $b->[1]} 
      map {[$_, (split)[0]]}
      <$f>
' file

      

File first open

. Then it uses the Schwartz transform: read the following command from bottom to top:



  • read the lines: <$f>

  • convert the strings to a list of pairs: original string and first word:
    map {[$_, (split)[0]}

  • sort, first by length (from longest to shortest), then lexically (from A to Z)
  • converts a list of pairs to a list of strings (the first element of a pair)
    map {$_->[0]}

  • join (lines still have their newlines, so joins to empty line
0


source







All Articles