PHP parses .ini files with new characters / needs regex?

I have some trouble parsing .ini files that have unquoted values ​​and some newlines in it. Here's an example:

[Section1]
ID=xyz

# A comment
Foo=BAR

Description=Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
 tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
 quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
 consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
 cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
 proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Screenshot=url-goes-here.png
Categories=some,categories

Vendor=abc

[Section2]
Description=Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
 tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
 quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
 consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
 cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
 proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
 tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,

 quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
 consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
 cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
 proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Somekey=somevalue

      

When I try to parse this string with parse_ini_string($file_content, true, INI_SCANNER_RAW);

, it returns either false or only returns the first string Description

. E. g.

["Description"]=> "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod" // next lines are missing

      

I've already tried removing newlines and enclosing values ​​in quotes, but can't find a regex that works. I need a pattern that matches each key / value until the next key / value or before the start of a comment.

Unfortunately, sometimes the key starts after a blank line, sometimes it doesn't. And values ​​can contain empty strings (look Description

at Section2

).

So the question is, how do I change / clear this readable line with parse_ini_string

?

+3


source to share


1 answer


You can describe a multi-line key / value with this pattern:

/^\w+=\N*(?:\R++(?!\w+=|[[#;])\N+)+/m

      

The parameter INI_SCANNER_NORMAL

defaults to multiline values ​​enclosed between quotes, so all you need to do is add quotes:

$content = preg_replace('~^\w+=\K\N*(?:\R++(?!\w+=|[[#;])\N+)+~m', '"$0"', $content);

      

more details:

~                  # pattern delimiter
^                  # start of the line
\w+                # key name
=
\K                 # discards characters on the left from the match result
\N*                # zero or more characters except newlines
(?:                # non-capturing group: eventual empty lines until a non empty line
    \R++           # one or more newlines
    (?!\w+=|[[#;]) # not followed by another key/value, a section or a comment
    \N+            # one or more characters except newlines
)+                 # at least one occurence
~m                 # switch on the multiline mode, ^ means "start of the line"

      

This pattern is for multi-line values ​​only; other values ​​are left without quotes.



Notes: I assumed that every key, comment, section starts at the beginning of a line. If it is not, for example, in leading spaces, you can easily adapt the pattern by adding \h*+

after each new line.

If comments are allowed anywhere in the line, change \N

to[^#\r\n]


If you want to use a parameter INI_SCANNER_RAW

, you must remove newlines in the values:

$pattern = '~(?:\G(?!\A)|^\w+=[^#\r\n]*)\K\R++(?!\w+=|[[#])([^#\r\n]+)~';
$content = preg_replace($pattern, ' $1', $content);

      

The pattern matches groups of sequential newlines and then non-empty strings one by one and replaces sequential newlines with a space.

Another way to do this is to use the first template, but this time with preg_replace_callback

simple character translation in a callback function. Note that this method can be interesting if you want to avoid special or problematic characters.

+3


source







All Articles