# Sorting strings by those that contain numbers, ignoring numbers attached to the letter

Sorting strings by those that contain numbers, ignoring numbers attached to the letter

I need to sort the lines in a file so that lines containing at least one number (0-9), not counting the numbers 1-5 when preceded by one of those letters ("a", "e", "g", "i "," n "," o "," r "," u "," v "or" u: "(u + :)) move to the end of the file.

Here's a sample file:

``````I want to buy some food.
I want 3 chickens.
I have no3 basket for the eggs.
I have no3 basket which can hold 24 eggs.
Move the king to A3.
Can you move the king to a6?
```

```

The sample file contains the notes for which they correspond:

``````I want to buy some food. % does not match
I want 3 chickens. % matches
I have no3 basket for the eggs. % does not match, because "3" is preceded by "o"
I have no3 basket which can hold 24 eggs. % matches, because contains "24"
Move the king to A3. % matches, words preceded by "A" are not ignored.
Can you move the king to a6? % matches, 6 is not 1-5
```

```

The output will contain all matching lines at the bottom:

``````I want to buy some food.
I have no3 basket for the eggs.
I want 3 chickens.
Move the king to A3.
Can you move the king to a6?
I have no3 basket which can hold 24 eggs.
```

```

Preferably (although not required), the solution sorts the rows containing the most matching digits to the end. For example. "I have 10 chickens and 12 bats." (4 digits) appears after "I have 99 chickens". (2 digits).

Solutions using BASH, Perl, Python 2.7, Ruby `sed`

, `awk`

or `grep`

, are exact.

+3

source to share

7 replies

If yours `grep`

supports the `-P`

(perl-regexp) variant:

``````pat='(?<=[^0-9]|^)((?<!u:)(?<![aeginoruv])[1-5]|[06-9])'

{ grep -vP "\$pat" input.txt; grep -P "\$pat" input.txt; } >output.txt
```

```

If you have `ssed`

(super sed) installed :

``````ssed -nR '
/(?<=[^0-9]|^)((?<!u:)(?<![aeginoruv])[1-5]|[06-9])/{
H
\$!d
}
\$!p
\${
g
s/\n//
p
}' input.txt
```

```
+5

source

When this program runs on your dataset:

``````#!/usr/bin/env perl
use strict;
use warnings;

my @moved = ();

my \$pat = qr{
[67890]                   # these big digits anywhere, or else...
| (?<! [aeginoruv]   )      # none of those letters before
(?<! u:            )      # nor a "u:" before
[12345]                   # these little digits
}x;

while (<>) {
if (/\$pat/) {
push @moved, \$_;
} else {
print;
}
}

print @moved;
```

```

It outputs the desired output:

``````I want to buy some food.
I want 3 chickens.
I have no3 basket for the eggs.
I have no3 basket which can hold 24 eggs.
Move the king to A3.
Can you move the king to a6?
```

```

# EDIT

To enable sorting, change the final imprint to this:

``````print for sort {
\$a =~ y/0-9// <=> \$b =~ y/0-9//
} @moved;
```

```

And now the output will be as follows:

``````I want to buy some food.
I have no3 basket for the eggs.
I want 3 chickens.
Move the king to A3.
Can you move the king to a6?
I have no3 basket which can hold 24 eggs.
```

```
+3

source

This seems to work for perl!

Seriously, sed will struggle with the request to move "u:" to the end of the file. sed is indeed line based. Awk can do it, but perl is probably better.

Use \ d + to match a string with numbers

Then use [aeginorv] \ d + to filter out your letters

u: \ d + to handle your special case u: stuff (you'll need to buffer it (for example, just store the corresponding strings in an array) so you can output it at the end)

+1

source

[Edited because everyone else had code that took a file argument:]

For a non-regex solution in Python, how about

``````import sys

def keyfunc(s):
ignores = ("a", "e", "g", "i", "n", "o", "r", "u", "v", "u:")
return sum(c.isdigit() and not (1 <= int(c) <= 5 and s[:i].endswith(ignores))
for i,c in enumerate(s))

with open(sys.argv[1]) as infile:
for line in sorted(infile, key=keyfunc):
print line,
```

```

which produces:

``````I want to buy some food.
I have no3 basket for the eggs.
I want 3 chickens.
Move the king to A3.
Can you move the king to a6?
I have no3 basket which can hold 24 eggs.
I have 99 chickens.
I have 10 chickens and 12 bats.
```

```
+1

source

``````use strict;
use v5.10.1;
my @matches;
my @no_matches;
while (my \$line = <DATA>) {
chomp \$line;

if (\$line =~ / \d+\W/) {
#say "MATCH \$line";
push @matches, \$line;
}
elsif (\$line =~ /u:[1-5]+\b/) {
#say "NOMATCH   \$line";
push @no_matches, \$line;
}
elsif (\$line =~ /[^aeginoruv][1-5]+\b/) {
#say "MATCH \$line";
push @matches, \$line;
}
elsif (\$line =~ /.[6-90]/) {
#say "MATCH \$line";
push @matches, \$line;
}
else {
#say "NOMATCH   \$line";
push @no_matches, \$line;
}
}

foreach (@no_matches){
say \$_;
}
foreach (@matches){
say \$_;
}

__DATA__
I want to buy some food.
I want 3 chickens.
I have no3 basket for the eggs.
I have no3 basket which can hold 24 eggs.
What is u:34?                              <- custom test
Move the king to A3.
Can you move the king to a6?
```

```

PROMPT> perl regex.pl

``````I want to buy some food.
I have no3 basket for the eggs.
What is u:34?                              <- custom test
I want 3 chickens.
I have no3 basket which can hold 24 eggs.
Move the king to A3.
Can you move the king to a6?
```

```
+1

source

# ruby

( Edit : now includes optional sorting)

``````matches = []
non_matches = []
File.open("lines.txt").each do |line|
if line.match(/[67890]|(?<![aeginoruv])(?<!u:)[12345]/)
matches.push line
else
non_matches.push line
end
end
puts non_matches + matches.sort_by{|m| m.scan(/\d/).length}
```

```

gives:

``````I want to buy some food.
I want 3 chickens.
I have no3 basket for the eggs.
Move the king to A3.
Can you move the king to a6?
I have no3 basket which can hold 24 eggs.
```

```
+1

source

This might work for you:

``````sed 'h;s/[aeginoruv][1-5]\|u:[1-5]//g;s/[^0-9]//g;s/^\$/0/;G;s/\n/\t/' file |
sort -sn |
sed 's/^[^\t]*\t//'
I want to buy some food.
I have no3 basket for the eggs.
I want 3 chickens.
Move the king to A3.
Can you move the king to a6?
I have no3 basket which can hold 24 eggs.
```

```

Basically a three stage stroke:

• Make a numeric key to sort the output. Strings that do not need sorting are assigned key 0, all others are assigned their numeric value.
• Sort by storage order of numeric keys `-s`

• Remove digital key.
+1

source

All Articles