How to use FINDSTR in PowerShell to find strings where all words in the search string match in any order

The following command findstr.exe

almost does what I want, but not quite:

findstr /s /i /c:"word1 word2 word3" *.abc

      

I used:

  • /s

    to find all subfolders.
  • /c:

    Uses the specified text as a search literal string

  • /i

    Indicates that the search is case insensitive.
  • *.abc

    Files of type abc.

The above expression word1 word2 word3

is a literal and therefore only finds words in that exact order.

In contrast , I want all words to match individually, in any order (AND logic, conjunction) .

If I remove /c:

from the command above, rows matching any of the words (OR logic, disjunction) are returned, which is not what I want.

Can this be done in PowerShell?

+3


source to share


3 answers


You can use Select-String

to search based on regular expression using multiple files.

To match all multiple search terms on the same line with regular expressions, you will need to use a search assertion :

Get-ChildItem -Filter *.abc -Recurse |Select-String -Pattern '^(?=.*\bword1\b)(?=.*\bword2\b)(?=.*\bword3\b).*$'

      

In the example above, this is what happens with the first command:

Get-ChildItem -Filter *.abc -Recurse

      

Get-ChildItem

searches for files in the current directory
-Filter *.abc

only shows us files ending with *.abc


-Recurse

searches for all subfolders

We then pass the resulting FileInfo objects into Select-String

and use the following regex pattern:

^(?=.*\bword1\b)(?=.*\bword2\b)(?=.*\bword3\b).*$
^              # start of string  
 (?=           # open positive lookahead assertion containing
    .*         # any number of any characters (like * in wildcard matching)
      \b       # word boundary
        word1  # the literal string "word1"
      \b       # word boundary
 )             # close positive lookahead assertion
 ...           # repeat for remaining words
 .*            # any number of any characters
$              # end of string

      

Since each lookahead group is only asserted for correctness, and the position of the lookahead within the string never changes, the order doesn't matter.


If you want it to match lines containing any of the words, you can use a simple non-capturing group:



Get-ChildItem -Filter *.abc -Recurse |Select-String -Pattern '\b(?:word1|word2|word3)\b'

      

\b(?:word1|word2|word3)\b
\b          # start of string  
  (?:       # open non-capturing group
     word1  # the literal string "word1"
     |      # or
     word2  # the literal string "word2"
     |      # or
     word3  # the literal string "word3"
  )         # close positive lookahead assertion
\b          # end of string

      


This can of course be diverted into a simple proxy function .

I generated the block param

and most of the body of the function definition Select-Match

below:

$slsmeta = [System.Management.Automation.CommandMetadata]::new((Get-Command Select-String))
[System.Management.Automation.ProxyCommand]::Create($slsmeta)

      

Then unnecessary parameters were removed (including -AllMatches

and -Pattern

), then the template generator was added (see inline comments):

function Select-Match
{
    [CmdletBinding(DefaultParameterSetName='Any', HelpUri='http://go.microsoft.com/fwlink/?LinkID=113388')]
    param(
        [Parameter(Mandatory=$true, Position=0)]
        [string[]]
        ${Substring},

        [Parameter(Mandatory=$true, ValueFromPipelineByPropertyName=$true)]
        [Alias('PSPath')]
        [string[]]
        ${LiteralPath},

        [Parameter(ParameterSetName='Any')]
        [switch]
        ${Any},

        [Parameter(ParameterSetName='Any')]
        [switch]
        ${All},

        [switch]
        ${CaseSensitive},

        [switch]
        ${NotMatch},

        [ValidateNotNullOrEmpty()]
        [ValidateSet('unicode','utf7','utf8','utf32','ascii','bigendianunicode','default','oem')]
        [string]
        ${Encoding},

        [ValidateNotNullOrEmpty()]
        [ValidateCount(1, 2)]
        [ValidateRange(0, 2147483647)]
        [int[]]
        ${Context}
    )

    begin
    {
        try {
            $outBuffer = $null
            if ($PSBoundParameters.TryGetValue('OutBuffer', [ref]$outBuffer))
            {
                $PSBoundParameters['OutBuffer'] = 1
            }

            # Escape literal input strings
            $EscapedStrings = foreach($term in $PSBoundParameters['Substring']){
                [regex]::Escape($term)
            }

            # Construct pattern based on whether -Any or -All was specified 
            if($PSCmdlet.ParameterSetName -eq 'Any'){
                $Pattern = '\b(?:{0})\b' -f ($EscapedStrings -join '|')
            } else {
                $Clauses = foreach($EscapedString in $EscapedStrings){
                    '(?=.*\b{0}\b)' -f $_
                }
                $Pattern = '^{0}.*$' -f ($Clauses -join '')
            }

            # Remove the Substring parameter argument from PSBoundParameters
            $PSBoundParameters.Remove('Substring') |Out-Null

            # Add the Pattern parameter argument
            $PSBoundParameters['Pattern'] = $Pattern

            $wrappedCmd = $ExecutionContext.InvokeCommand.GetCommand('Microsoft.PowerShell.Utility\Select-String', [System.Management.Automation.CommandTypes]::Cmdlet)
            $scriptCmd = {& $wrappedCmd @PSBoundParameters }
            $steppablePipeline = $scriptCmd.GetSteppablePipeline($myInvocation.CommandOrigin)
            $steppablePipeline.Begin($PSCmdlet)
        } catch {
            throw
        }
    }

    process
    {
        try {
            $steppablePipeline.Process($_)
        } catch {
            throw
        }
    }

    end
    {
        try {
            $steppablePipeline.End()
        } catch {
            throw
        }
    }
    <#

    .ForwardHelpTargetName Microsoft.PowerShell.Utility\Select-String
    .ForwardHelpCategory Cmdlet

    #>

}

      

Now you can use it like this, and it will behave something like Select-String

:

Get-ChildItem -Filter *.abc -Recurse |Select-Match word1,word2,word3 -All

      

+5


source


Another (admittedly less sophisticated) approach would be simple chaining filters since word order doesn't matter. Filter the files for one word first, then filter the output for lines that also contain the second word, then filter that output for lines that also contain the third word.

findstr /s /i "word1" *.abc | findstr /i "word2" | findstr /i "word3"

      

Using PowerShell cmdlets it looks like this:

Get-ChildItem -Filter '*.abc' -Recurse | Get-Content | Where-Object {
  $_ -like '*word1*' -and
  $_ -like '*word2*' -and
  $_ -like '*word3*'
}

      



or (using aliases):

ls '*.abc' -r | cat | ? {
  $_ -like '*word1*' -and
  $_ -like '*word2*' -and
  $_ -like '*word3*'
}

      

Note that aliases are simply to store the time input on the command line, so I don't recommend using them in scripts.

+4


source


Note:

  • The first part of this answer doesn't address the OP's problem - for solutions, see Mathias R. Jessen, helpful answer and Ansgar Wiecher's answer ; alternatively, see the bottom of this answer, which offers a generic solution adapted from Mathias code.

    • (due to the initial misreading of the question) this part of the answer uses disjunctive logic - matching strings that have at least one matching search term - are only supported by findstr.exe

      PowerShell Select-String

      (directly)
      .

    • Unlike OP asks for conjunctive logic , which requires additional work.

  • This part of the answer may still be of interest for translating commands findstr.exe

    to PowerShell using Select-String

    .


the PowerShell equivalent of the commandfindstr

from the question , but without the/c:

-
FINDSTR /s /i "word1 word2 word3" *.abc


 - is:

(Get-ChildItem -File -Filter *.abc -Recurse |
  Select-String -SimpleMatch -Pattern 'word1', 'word2', 'word3').Count

      

  • /s

    β†’ Get-ChildItem -File -Filter *.abc -Recurse

    lists all files in the current subdirectory subtree*.abc

    • Note: wile Select-String

      is capable of accepting a filename pattern (wildcard expression), for example *.abc

      , it does not support recursion, so a separate call is required Get-ChildItem

      , the output of which is piped to Select-String

      .
  • findstr

    β†’ Select-String

    , a more flexible copy of PowerShell:

    • -SimpleMatch

      indicates that the argument is -Pattern

      interpreted as literals and not as regular expressions (regular expressions). Note how they differ by default:

      • findstr

        expects literals by default (you can switch to regular expressions with /R

        ).
      • Select-String

        expects regular expressions by default (you can switch to literal with -SimpleMatch

        ).
    • -i

      β†’ (default behavior); like most PowerShell, case insensitivity is the Select-String

      default behavior
      - add -CaseSensitive

      to change this.

    • "word1 word2 word3"

      β†’ -Pattern 'word1', 'word2', 'word3'

      ; specifying an array of templates searches for a match of at least one of the templates in each line (disjunctive logic).

      • That is, all these lines will fit: ... word1 ...

        , ... word2 ...

        , ... word2 word1 ...

        ,... word3 word1 word2 ...

  • /c

    β†’ (...).Count

    : Select-String

    Outputs a collection of objects representing the matching strings that this expression simply counts. Output objects are instances that not only include the corresponding string , but the input metadata and specifics of what matches . [Microsoft.PowerShell.Commands.MatchInfo]


Solution built on Elegant wrapper function by Mathias R. Jessen :

Select-AllStrings

is a conjunctive only companion function for a Select-String

disjunctive cmdlet that uses the same syntax as the latter, except it does not support a switch -AllMatches

.

That is, it Select-AllStrings

requires all patterns to be passed to it - regardless of whether they are regular expressions (the default) or literals (c -SimpleMatch

) - to match the string.

With regard to the OP problem, we get:

(Get-ChildItem -File -Filter *.abc -Recurse |
  Select-AllStrings -SimpleMatch word1, word2, word3).Count

      

Note the options compared to the command at the top:
* The parameter is -Pattern

implicitly linked, by the position of the argument.
* Patterns are specified as simple words (no quotation marks) for convenience, although they are generally safer to quote because it is not easy to remember what is required to quote.

+2


source







All Articles