How do I create a RegEx template that will receive N words using a custom Word border?
I need a RegEx pattern that will return the first N words using a custom word boundary, which is the usual RegEx white space (\ s) plus punctuation marks like .,;:!?-*_
EDIT # 1: Thanks for your comments.
To be clear:
- I would like to set characters to be word delimiters
- Lets call it "Delimiter Set" or strDelimiters
-
strDelimiters = ".,;:!?-*_"
-
nNumWordsToFind = 5
- Word is defined as any contiguous text that does NOT contain a character in strDelimiters
- RegEx word boundary is any continuous text containing one or more characters in strDelimiters
- I would like to create a RegEx template to get / return the first nNumWordsToFind using strDelimiters.
EDIT # 2 Sat 08 Aug 2015 12:49 pm US CT
@maraca definitely answered my question as originally said. But I really need to return the number of words ≤ nNumWordsToFind. So if the original text only has 3 words, but my RegEx is asking for 4 words, I need them to return 3 words. The answer provided by maraca fails if nNumWordsToFind> the number of actual words in the source text.
For example:
one,two;three-four_five.six:seven eight nine! ten
He will see it as 10 words. If I want the first 5 words, it will return:
one,two;three-four_five.
I have this pattern using regular space that works, but NOT exactly what I need:
([\w]+\s+){<NumWordsOut>}
where <NumWordsOut>
is the number of words returned.
I also found this word border pattern, but I don't know how to use it:
"real word boundary", which defines the boundary between an ASCII letter and a non-letter.
(?i)(?<=^|[^a-z])(?=[a-z])|(?<=[a-z])(?=$|[^a-z])
However, I would like my words to resolve numbers as well.
IAC, I was unable to use the above word boundary pattern to return the first N words of my text.
By the way, I will be using this in the Keyboard Maestro macro .
Can anyone please help? TIA.
All you have to do is adapt your template ([\w]+\s+){<NumWordsOut>}
to, including some special cases:
^[\s.,;:!?*_-]*([^\s.,;:!?*_-]+([\s.,;:!?*_-]+|$)){<NumWordsOut>}
1. 2. 3. 4. 5.
- Matches any number of delimiters before the first word
- Match word (= at least one without separator)
- The word must be followed by at least one separator
- Or it can be at the end of the line (in case there is no delimiter at the end)
- Repeat from 2. to 4. <NumWordsOut> times
Notice how I changed the order -
, it must be at the beginning or at the end, otherwise it must be shielded: \-
.
Thanks to @maraca for providing a complete answer to my question.
I just wanted to post the Keyboard Maestro macro that I created using the @maraca RegEx template for anyone interested in a complete solution.