Get the string of the first argument of the calling function
I want to search with PHP files for a special function call. The reason is because I want to generate .MO-Files for GetText-Extension. So I first need to create a .PO-Files that contains all the required text strings.
I already find a lot of texts, but there are some problems.
Here is my Regex to find the first argument of a function:
/\_\([\'|\"]{1}(.+?[^\\\])[\'|\"]{1}[,]{0,1}.*?\)+/si
I need to find function calls with the following patterns:
_("text");
_("text %s", 3);
_('text');
The text can contain escaped quotes. My problem is this is urgent that I need to know if there was an apostrophe or a regular quote for the call.
If I have a challenge
_('"text"');
then i get the problem i get the text
"text
without an end quote.
Do any of you have an idea how I can get my Regex to work?
source to share
I would use a PHP tokenizer for this kind of thing, not regular expressions:
$funcName = '_';
$tokens = token_get_all(file_get_contents('path/to/your/script.php'));
$strings = array();
foreach($tokens as $index => $token){
if(!is_array($token))
continue;
if($token[0] === T_CONSTANT_ENCAPSED_STRING){
if(!isset($tokens[$index - 2]) || ($tokens[$index - 1] !== "("))
continue;
list($id, $text, $line) = $tokens[$index - 2];
// this is your string (substr drops quotes around it)
if(($id === T_STRING) && ($text === $funcName))
$strings[] = substr($token[1], 1, -1);
}
}
var_dump($strings);
source to share
Raw regex:
_\((?|'((?:[^'\\]|\\.)*)'|"((?:[^"\\]|\\.)*)")
Restricted regex:
~_\((?|'((?:[^'\\]|\\.)*)'|"((?:[^"\\]|\\.)*)")~
The result is capturing group 1. I used the reset pattern branch (?|pattern)
so that the capture group number was reset for every branch variable split |
.
There (?|'((?:[^'\\]|\\.)*)'|"((?:[^"\\]|\\.)*)")
are 2 templates inside the reset branch :
-
'((?:[^'\\]|\\.)*)'
: Matching and capturing content within a single-quoted string that consists of either an unquoted sequence, no backslash, or an escaped sequence. Actually, I'm a bit sloppy here, since the (raw) new line character is considered part of the string. I don't think the spec will allow this, but if the input contains valid code then there should be no problem. -
"((?:[^"\\]|\\.)*)"
: Same as above, but for a double quoted string.
Note that I am not using the rest of the function arguments.
source to share