Cumulative memory in row assignment: $ a = $ a. $ b vs $ a. = $ b

Some of you are probably familiar with how PHP handles memory in various string situations.

When a row is assigned again, it is not "updated", it is cloned. At least that's my real understanding.

$a = 'a';
$b = 'b';
$a = $a . $b; // uses sizeof($a)*2 + sizeof($b) bytes
$a .= $b; // uses sizeof($a) + sizeof($b) bytes

      

In a template engine I am developing, this means huge memory consumption. I am using over 128MB of memory for a page line which is actually less than 512KB. This is because the string is being copied over and over.

Simply put, these copies are created every time I do something like:

$page = str_replace($find, $replace, $page)

      

Is there a workaround for this clone, generally speaking?

I downloaded this a bit and it will give the same result but with a completely different amount of memory. The former consumes a huge amount of memory, but the latter only consumes what is the actual size of the string.

$iterations = 100000;
$a = 'a';
$b = 'b';
echo "start peak memory usage " . (memory_get_peak_usage()/1024).'k<br>';
echo "start current memory usage " . (memory_get_usage()/1024).'k<br>';

for($i = 0; $i<$iterations; $i++) {
    $a = $a . $b;
}
echo "end peak memory usage " . (memory_get_peak_usage()/1024).'k<br>';
echo "end current memory usage " . (memory_get_usage()/1024).'k<br>';

      

against

$iterations = 100000;
$a = 'a';
$b = 'b';
echo "start peak memory usage " . (memory_get_peak_usage()/1024).'k<br>';
echo "start current memory usage " . (memory_get_usage()/1024).'k<br>';

for($i = 0; $i<$iterations; $i++) {
    $a .= $b;
}
echo "end peak memory usage " . (memory_get_peak_usage()/1024).'k<br>';
echo "end current memory usage " . (memory_get_usage()/1024).'k<br>';

      

Regarding the templating engine, what would be the best way to avoid unnecessary memory usage? In a development environment this is not a problem, but in production it can become a scalability issue.

Naturally, speed also worries me, so the alternative should be about the same as this one.

Finally, I think it also has something to do with the variable. Feel free to correct me as I am not a professional. I understand that the variables are "not set" by the PHP garbage collector (?) When the function or method ends, but in my case $page

we are working on a natural way for the entire duration of the script, since it is a class variable and is accessed $this->page

, and therefore old copies cannot be "canceled".

EDIT 10/16/2014: To keep an eye on this question, I did some testing and I'm leaning towards the solution mentioned about hacking the page apart. Here's a rough, simple outline of the structure, then down to explain.

class PageObjectX {
    $_parent;
    __constructor(&$parent) { $this->_parent = $parent; }
    /* has a __toString() method, handles how the variable/section is outputted. */
}

class Page {
    $_parts;
    $_source_parts;
    $_variables;

    public function __constructor($s) {
        $this->_source_parts = preg_split($s, ...);
        foreach($this->_source_parts as $part) {
            $this->_parts[] = new PageObject($this, ...); }
    }

    public function ___toString() { return implode('', $this->_parts); }

    public function setVariables($k, $v) { $this->_variables[$k] = $v; }
}

      

What I do is explode the template string into an array of parts. Regular strings, variables, strings to get from the database, and regions / sections. The management of arrays of parts is encapsulated in the Page class. The array has objects as elements: PageVariable, PageString, PageRepeatable, PagePlaintext. Each object provides a toString () method that allows different types of items to control how they are displayed and helps keep the classes relatively small and manageable. Feels "clean" to me.

Each PageNNL class receives data from the main class by referring to it parent. so the Page class is set for all globals, and the page class handles one database query to get all the translated strings, etc.

The repetitions are probably not straightforward. I use repeating lists to display or something that can be repeated several times, like news. The content changes, the structure doesn't work. So I pass the following array to the page, and when the duplicate "news" names look for data, it gets the data for two news, for example.

$regions['news'][0]['news title'] = 'Todays news';
$regions['news'][0]['news desc'] = 'The united nations...';
$regions['news'][1]['news title'] = 'Yesterdays news';
$regions['news'][1]['news desc'] = 'Meanwhile in Afghanistan the rebels...';

      

If the page element has no data, it's easy to just exclude it in __toString (). This reduces the need to clean up unused parts in the template.

The overall effectiveness of this approach seems to be pretty good. In initial comparisons, memory consumption is about half. 2M versus 4M. I am looking for this to be in the best balance on large pages as the test page is pretty straightforward. The speed boost is quite remarkable compared to the line version, where the cleanup takes quite a bit of juice. 0.1s versus 0.6s in the string version.

Post updating final results, but this is what I have. Hope this helps those who stumbled upon this page from google;)

+3


source to share


2 answers


In your specific example ( $page = str_replace($find, $replace, $page);

) will fail to make a copy $page

. This applies to all functions (string-related or not) that require parameters to be passed by value. However, PHP garbage collection must reclaim these unused copies at regular intervals.

If you are still experiencing excessive memory usage, I highly recommend you check your code. Make sure that the variables have a well-defined scope and that only the required data is stored. There are tools to help diagnose PHP memory usage, such as php-memprof .



In addition, I would also like to make sure you are using the latest PHP versions available as garbage collection is constantly improving .

+2


source


What system are you using? For me this is not such a huge difference:
In a simple script:
peak 325.1k, rate 218.7k versus peak 219.6k, rate 218.7k
In class function:
peak 327.2k, current 220.8k against peak 221.8k, curr 220.8k



I would expect the spike difference could come from the last operation where $ a is concatenated and the old value of $ a is still in use. This explains almost 100 thousand peaks.

0


source







All Articles