Regex Code Recursive Content

I need to get the content between two directives ( embed

and endembed

) using RegEx. My current template does it right /(?<!\w)(\s*)@embed(\s*\(.*\))([\w\W]*?)@endembed/g

.

However, when directives are nested, they do not match blocks correctly. https://regex101.com/r/nL8gV5/2 ,

@extends('layouts/default')

@section('content')
    <div class="row">
        <div class="col-md-6">
            @embed('components/box')
                @section('title', 'Box title')
                @section('content')
                    <h4>Haai</h4>
                    Box content
                @stop
            @endembed
        </div>
        <div class="col-md-6">
            @embed('components/box')
                @section('title', 'Box2 title')
                @section('content')

                    @embed('components/timeline')
                        @section('items')
                        @stop
                    @endembed

                @stop
            @endembed
        </div>
    </div>
@stop

      

Desired output:

1:    
@section('title', 'Box title')
@section('content')
    <h4>Haai</h4>
    Box content
@stop

2:
@section('title', 'Box2 title')
@section('content')
    @embed('components/timeline')
        @section('items')
        @stop
    @endembed
@stop

3:
@section('items')
@stop

      

I've tried different patterns but I don't seem to get it. As I understand it, should I be using a recursive token (R?)

in combination with a backlink? something like this https://regex101.com/r/nL8gV5/3 . After spending a few hours fiddling around, I still haven't worked.

What am I doing wrong and what is the correct pattern?

+3


source to share


2 answers


To capture outer @embed

and nested, use a recursive regex :

$pattern = '/@embed\s*\([^)]*\)((?>(?!@(?:end)?embed).|(?0))*)@endembed/s';

      

The (?0)

template will be inserted. See test at regex101 . Replace with captured $1

on match:

$res = array();

while (preg_match_all($pattern, $str, $out)) {
  $str = preg_replace($pattern, "$1", $str);
  $res = array_merge($res, $out[1]);
}

      

It will give you the outer and nested to the innermost. Test for eval.in




Basic recursive pattern without capturing is as simple as this :

/@embed\b(?>(?!@(?:end)?embed\b).|(?0))*@endembed/s

      

  • Matches a literal @embed

    followed by a \b

    word boundary
  • (?>

    Using a non-capturing atomic group for alternation:
  • Alternative: (?!@(?:end)?embed).

    A character that does not start @embed

    or @endembed

    |(?0)

    OR insert the pattern from the beginning. )*

    All this at any time.
  • Matches literal @endembed

Using the s (PCRE_DOTALL)

flag to create dot also match newlines

+1


source


I came up with this recursive regex from the example I had (from this stackoverflow answer ):

(?=(@embed(?:(?>(?:(?!@embed|@endembed).)+)*|(?1))*@endembed))

      



Try on regex101

+1


source







All Articles