Remove spaces between individual letters

Question

Remove spaces between individual letters

I have a string that can contain an arbitrary number of individual letters, separated by spaces. I'm looking for a regex (in Perl) that will remove spaces between all (unknown number) single letters.

For example:

ab c d

should become ab cd

a bcd e f gh

should become a bcd ef gh

a b c

should become abc

and

abc d

must be immutable (because there are no single letters followed or preceded by a single space).

Thanks for any ideas.

0

regex perl

itzy 19 nov. 10 at 19:42

source to share

7 replies

You can do this with lookahead and lookbehind assertions, as described in perldoc perlre :

use strict;
use warnings;

use Test::More;

is(tran('ab c d'), 'ab cd');
is(tran('a bcd e f gh'), 'a bcd ef gh');
is(tran('a b c'), 'abc');
is(tran('abc d'), 'abc d');

sub tran
{
    my $input = shift;

    (my $output = $input) =~ s/(?<![[:lower:]])([[:lower:]]) (?=[[:lower:]])/$1/g;
    return $output;
}

done_testing;

Note that the current code fails in the second test case, as the output is:

ok 1
not ok 2
#   Failed test at test.pl line 7.
#          got: 'abcd efgh'
#     expected: 'a bcd ef gh'
ok 3
ok 4
1..4
# Looks like you failed 1 test of 4.

I left it this way that your second and third examples seem to contradict each other as to how the leading single characters should be handled. However, this structure should be sufficient so that you can experiment with different looks and lookbehinds to get the exact results you are looking for.

+5

Ether 19 nov. 10 at 19:57

source to share

This piece of code

#!/usr/bin/perl

use strict;

my @strings = ('a b c', 'ab c d', 'a bcd e f gh', 'abc d');

foreach my $string (@strings) {
   print "$string --> ";
   $string =~ s/\b(\w)\s+(?=\w\b)/$1/g; # the only line that actually matters
   print "$string\n";
}

prints this:

a b c --> abc
ab c d --> ab cd
a bcd e f gh --> a bcd ef gh
abc d --> abc d

I think / hope this is what you are looking for.

+1

canavanin 19 nov. 10 at 21:02

source to share

This should do the trick:

my $str = ...;

$str =~ s/ \b(\w) \s+ (\w)\b /$1$2/gx;

This removes the space between all single nonspatial symbols. Feel free to replace with \S

a stricter character class if needed. There might also be some cross-punctuation-related cases that you need to deal with, but I can't guess that from the information you provided.

As Ether helps, this fails in one case. Here's a version that should work (although not as clean as the first one):

s/ \b(\w) ( (?:\s+ \w\b)+ ) /$1 . join '', split m|\s+|, $2/gex;

I liked the ether-based approach (imitation is the most sincere form of flattery and that's it):

use warnings;
use strict;
use Test::Magic tests => 4;

sub clean {
    (my $x = shift) =~ s{\b(\w) ((?: \s+ (\w)\b)+)}
                        {$1 . join '', split m|\s+|, $2}gex;
    $x
}

test 'space removal',
  is clean('ab c d')       eq 'ab cd',
  is clean('a bcd e f gh') eq 'a bcd ef gh',
  is clean('a b c')        eq 'abc',
  is clean('abc d')        eq 'abc d';

returns:

1..4
ok 1 - space removal 1
ok 2 - space removal 2
ok 3 - space removal 3
ok 4 - space removal 4

0

Eric Strom 19 nov. 10 at 19:51

source to share

It's not a regex, but since I'm naturally lazy, I would do it like this.

#!/usr/bin/env perl
use warnings;
use 5.012;

my @strings = ('a b c', 'ab c d', 'a bcd e f gh', 'abc d');
for my $string ( @strings ) {
    my @s; my $t = '';
    for my $el ( split /\s+/, $string ) {
        if ( length $el > 1 ) {
        push @s, $t if $t;
        $t = '';
        push @s, $el;
        } else { $t .= $el; }
    }
    push @s, $t if $t;
    say "@s";
}

OK, my path is the slowest:

no_regex   130619/s         --       -60%       -61%       -63%
Alan_Moore 323328/s       148%         --        -4%        -8%
Eric_Storm 336748/s       158%         4%         --        -5%
canavanin  352654/s       170%         9%         5%         --

I have not used Ether code because (as tested) it returns different results.

0

sid_com 19 nov. At 21:22

source to share

Now I have the slowest and fastest.

#!/usr/bin/perl
use 5.012;
use warnings;
use Benchmark qw(cmpthese);
my @strings = ('a b c', 'ab c d', 'a bcd e f gh', 'abc d');

cmpthese( 0, {
    Eric_Storm  => sub{ for my $string (@strings) { $string =~ s{\b(\w) ((?: \s+ (\w)\b)+)}{$1 . join '', split m|\s+|, $2}gex; } },
    canavanin   => sub{ for my $string (@strings) { $string =~ s/\b(\w)\s+(?=\w\b)/$1/g; } },
    Alan_Moore  => sub{ for my $string (@strings) { $string =~ s/(?<=(?<!\pL)\pL) (?=\pL(?!\pL))//g; } },
    keep_uni    => sub{ for my $string (@strings) { $string =~ s/\PL\pL\K (?=\pL(?!\pL))//g; } },
    keep_asc    => sub{ for my $string (@strings) { $string =~ s/[^a-zA-Z][a-zA-Z]\K (?=[a-zA-Z](?![a-zA-Z]))//g; } },
    no_regex    => sub{ for my $string (@strings) { my @s; my $t = ''; 
    for my $el (split /\s+/, $string) {if (length $el > 1) { push @s, $t if $t; $t = ''; push @s, $el; } else { $t .= $el; } }
    push @s, $t if $t;
    #say "@s";
    } },
});

...

           Rate  no_regex Alan_Moore Eric_Storm canavanin  keep_uni keep_asc                                                                                                                                                             
no_regex    98682/s        --       -64%       -65%      -66%      -81%     -87%                                                                                                                                                             
Alan_Moore 274019/s      178%         --        -3%       -6%      -48%     -63%                                                                                                                                                             
Eric_Storm 282855/s      187%         3%         --       -3%      -46%     -62%                                                                                                                                                             
canavanin  291585/s      195%         6%         3%        --      -45%     -60%
keep_uni   528014/s      435%        93%        87%       81%        --     -28%
keep_asc   735254/s      645%       168%       160%      152%       39%       --

0

sid_com 24 nov. 10 at 8:37 am

source to share

This will complete the task.

(?<=\b\w)\s(?=\w\b)

0

Irwin Nawrocki Oct 20 17 at 14:45

source to share

Alan moore · Accepted Answer · 2010-11-20T15:12:54+0000

Your description doesn't match your examples. It looks to me like you want to remove any space that is (1) must precede a letter that does not itself precede a letter, and (2) that is followed by a letter that is not followed by a letter. These conditions can be expressed exactly as nested images:

/(?<=(?<!\pL)\pL) (?=\pL(?!\pL))/

tested:

use strict;
use warnings;

use Test::Simple tests => 4;

sub clean {
  (my $x = shift) =~ s/(?<=(?<!\pL)\pL) (?=\pL(?!\pL))//g;
  $x;
}

ok(clean('ab c d')        eq 'ab cd');
ok(clean('a bcd e f gh')  eq 'a bcd ef gh');
ok(clean('a b c')         eq 'abc');
ok(clean('ab c d')        eq 'ab cd');

output:

1..4
ok 1
ok 2
ok 3
ok 4

I'm assuming you really meant a single whitespace character (U + 0020); if you want to match any spaces you can replace the space with \s+

.

Remove spaces between individual letters

More articles: