Regex Code Recursive Content
I need to get the content between two directives ( embed
and endembed
) using RegEx. My current template does it right /(?<!\w)(\s*)@embed(\s*\(.*\))([\w\W]*?)@endembed/g
.
However, when directives are nested, they do not match blocks correctly. https://regex101.com/r/nL8gV5/2 ,
@extends('layouts/default')
@section('content')
<div class="row">
<div class="col-md-6">
@embed('components/box')
@section('title', 'Box title')
@section('content')
<h4>Haai</h4>
Box content
@stop
@endembed
</div>
<div class="col-md-6">
@embed('components/box')
@section('title', 'Box2 title')
@section('content')
@embed('components/timeline')
@section('items')
@stop
@endembed
@stop
@endembed
</div>
</div>
@stop
Desired output:
1:
@section('title', 'Box title')
@section('content')
<h4>Haai</h4>
Box content
@stop
2:
@section('title', 'Box2 title')
@section('content')
@embed('components/timeline')
@section('items')
@stop
@endembed
@stop
3:
@section('items')
@stop
I've tried different patterns but I don't seem to get it. As I understand it, should I be using a recursive token (R?)
in combination with a backlink? something like this https://regex101.com/r/nL8gV5/3 . After spending a few hours fiddling around, I still haven't worked.
What am I doing wrong and what is the correct pattern?
source to share
To capture outer @embed
and nested, use a recursive regex :
$pattern = '/@embed\s*\([^)]*\)((?>(?!@(?:end)?embed).|(?0))*)@endembed/s';
The (?0)
template will be inserted. See test at regex101 . Replace with captured $1
on match:
$res = array();
while (preg_match_all($pattern, $str, $out)) {
$str = preg_replace($pattern, "$1", $str);
$res = array_merge($res, $out[1]);
}
It will give you the outer and nested to the innermost. Test for eval.in
Basic recursive pattern without capturing is as simple as this :
/@embed\b(?>(?!@(?:end)?embed\b).|(?0))*@endembed/s
- Matches a literal
@embed
followed by a\b
word boundary -
(?>
Using a non-capturing atomic group for alternation: - Alternative:
(?!@(?:end)?embed).
A character that does not start@embed
or@endembed
|(?0)
OR insert the pattern from the beginning.)*
All this at any time. - Matches literal
@endembed
Using the s (PCRE_DOTALL)
flag to create dot also match newlines
source to share