How do I write a regex that will match characters in any order?

I am trying to write regular expressions that will match a character set without regard for order. For example:

str = "act" 
str.scan(/Insert expression here/)

      

will match:

cat
act
tca
atc
tac
cta

      

but will not match ca

, ac

or cata

.

I've read a lot of similar questions and answers here on StackOverflow but haven't found one that exactly matches my goals.

To clarify a bit, I'm using ruby ​​and don't want to allow duplicate characters.

+3


source to share


5 answers


Here is your solution

^(?:([act])(?!.*\1)){3}$

      

See here in Regexr



^                  # matches the start of the string
    (?:            # open a non capturing group 
        ([act])    # The characters that are allowed and a capturing group
        (?!.*\1)   # That character is matched only if it does not occur once more, Lookahead assertion
    ){3}           # Defines the amount of characters
$

      

The only thing that might seem like a lookahead statement is that the character doesn't repeat itself.

^

and $

are anchors to match the beginning and end of the line.

+5


source


[act]{3}

or ^[act]{3}$

will do it in most regex dialects. If you can narrow down the system you are using it will help you get a more specific answer.

Edit: As mentioned by @georgydyer in the comments below, it is not clear from your question whether duplicate characters are allowed. If not, you can adapt the answer from this question and get:



^(?=[act]{3}$)(?!.*(.).*\1).*$

      

That is, a positive look at checking for a match, and then a negative result with a backlink to eliminate duplicate characters.

+3


source


This is how I would do it:

regex = /\b(?:#{ Regexp.union(str.split('').permutation.map{ |a| a.join }).source })\b/
# => /(?:act|atc|cat|cta|tac|tca)/

%w[
  cat act tca atc tac cta
  ca ac cata
].each do |w|
  puts '"%s" %s' % [w, w[regex] ? 'matches' : "doesn't match"]
end

      

This outputs:

"cat" matches
"act" matches
"tca" matches
"atc" matches
"tac" matches
"cta" matches
"ca" doesn't match
"ac" doesn't match
"cata" doesn't match

      

I use the array-in method Regexp.union

for many things; I work fine with hash keys and pass the hash in gsub

for quick find / replace text patterns. This is an example from the documentation gsub

:

'hello'.gsub(/[eo]/, 'e' => 3, 'o' => '*') #=> "h3ll*"

      

Regexp.union creates a regex and it is important to use source

instead to_s

when extracting the actual pattern that is generated:

puts regex.to_s
=> (?-mix:\b(?:act|atc|cat|cta|tac|tca)\b)

puts regex.source
=> \b(?:act|atc|cat|cta|tac|tca)\b

      

Note that it to_s

inserts template flags inside the string. If you don't expect them, you might accidentally insert this template into another, which won't behave the way you expect. Been there, did it and made a dented helmet as evidence.

If you really want to have some fun, take a look at the Perl Regexp :: Assemble module available on CPAN. Using this plus List :: Permutor allows us to create more complex templates. On such a simple line it won't save much space, but on long lines or large arrays of desired strokes it can make a huge difference. Unfortunately Ruby has nothing like this, but you can write a simple Perl script with a word or an array of words and generate a regular expression and pass it back:

use List::Permutor;
use Regexp::Assemble;

my $regex_assembler = Regexp::Assemble->new;
my $perm = new List::Permutor split('', 'act');
while (my @set = $perm->next) {
    $regex_assembler->add(join('', @set));
}
print $regex_assembler->re, "\n";
(?-xism:(?:a(?:ct|tc)|c(?:at|ta)|t(?:ac|ca)))

      

See " Is there an efficient way to perform hundreds of textual replacements in Ruby? " For more information on using Regexp :: Assemble with Ruby.

+2


source


I'll take a few things here: - You are looking for permutations of given characters - You are using ruby

str = "act"
permutations = str.split(//).permutation.map{|p| p.join("")}

# and for the actual test
permutations.include?("cat")

      

This is not a regular expression.

+1


source


No doubt about it - a regex using positive / negative imagery and backreferences is slick, but if you're only dealing with three characters, I'd be wrong on the verbosity side of explicitly listing character permutations like @scones.

"act".split('').permutation.map(&:join)
=> ["act", "atc", "cat", "cta", "tac", "tca"]

      

And if you really need a regex to scan a larger string, you can always:

Regexp.union "act".split('').permutation.map(&:join)
=> /\b(act|atc|cat|cta|tac|tca)\b/

      

Obviously this strategy does not scale if your search string grows, but it is much easier to watch this code like this in my opinion.

EDIT : Added word boundaries for false positives based cata

on @theTinMan's feedback.

+1


source







All Articles