Is there a way to use memoryview with regular expressions in Python 2?
In Python 3, the module re
      
        
        
        
      
    can be used with memoryview
      
        
        
        
      
    :
~$ python3
Python 3.2.3 (default, Feb 20 2013, 14:44:27)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> x = b"abc"
>>> import re
>>> re.search(b"b", memoryview(x))
<_sre.SRE_Match object at 0x7f14b5fb8988>
      
        
        
        
      
    However, it doesn't look like this in Python 2:
~$ python
Python 2.7.3 (default, Mar 13 2014, 11:03:55)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> x = "abc"
>>> import re
>>> re.search(b"b", memoryview(x))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/re.py", line 142, in search
    return _compile(pattern, flags).search(string)
TypeError: expected string or buffer
      
        
        
        
      
    I can pass a string to buffer
      
        
        
        
      
    , but looking at the buffered documentation it doesn't mention exactly how it buffer
      
        
        
        
      
    works versus memoryview
      
        
        
        
      
    .
Performing an empirical comparison shows that using an object buffer
      
        
        
        
      
    in Python 2 does not offer performance benefits when used memoryview
      
        
        
        
      
    in Python 3:
playground$ cat speed-test.py
import timeit
import sys
print(timeit.timeit("regex.search(mv[10:])", setup='''
import re
regex = re.compile(b"ABC")
PYTHON_3 = sys.version_info >= (3, )
if PYTHON_3:
    mv = memoryview(b"Can you count to three or sing 'ABC?'" * 1024)
else:
    mv = buffer(b"Can you count to three or sing 'ABC?'" * 1024)
'''))
playground$ python2.7 speed-test.py
2.33041596413
playground$ python2.7 speed-test.py
2.3322429657
playground$ python3.2 speed-test.py
0.381270170211792
playground$ python3.2 speed-test.py
0.3775448799133301
playground$
      
        
        
        
      
    If the argument is regex.search
      
        
        
        
      
    changed from mv[10:]
      
        
        
        
      
    to mv
      
        
        
        
      
    , Python 2's performance is about the same as Python 3, but there are a lot of duplicate lines in the code I'm writing.
Is there a way to get around this issue in Python 2 while still having the performance benefits memoryview
      
        
        
        
      
    
As I understand an object in Python 2, you have to use it without slicing:
>>> s = b"Can you count to three or sing 'ABC?'"
>>> str(buffer(s, 10))
"unt to three or sing 'ABC?'"
      
        
        
        
      
    Thus, instead of slicing the resulting buffer, you use the buffer function directly to perform the slicing, which results in quick access to the substring of interest:
import timeit
import sys
import re
r = re.compile(b'ABC')
s = b"Can you count to three or sing 'ABC?'" * 1024
PYTHON_3 = sys.version_info >= (3, )
if len(sys.argv) > 1: # standard slicing
    print(timeit.timeit("r.search(s[10:])", setup='from __main__ import r, s'))
elif PYTHON_3: # memoryview in Python 3
    print(timeit.timeit("r.search(s[10:])", setup='from __main__ import r, s; s = memoryview(s)'))
else: # buffer in Python 2
    print(timeit.timeit("r.search(buffer(s, 10))", setup='from __main__ import r, s'))
      
        
        
        
      
    I got very similar results in Python 2 and 3, which suggests that a use buffer
      
        
        
        
      
    like that with a module re
      
        
        
        
      
    has a similar effect than a new one memoryview
      
        
        
        
      
    (which then seems to be a lazy-evaluated buffer):
$ python2 .\speed-test.py
0.681979371561
$ python3 .\speed-test.py
0.5693422508853488
      
        
        
        
      
    And as a comparison with standard string slicing:
$ python2 .\speed-test.py standard-slicing
7.92006735956
$ python3 .\speed-test.py standard-slicing
7.817641705304309
      
        
        
        
      
    If you want to maintain slice access (so the same syntax can be used everywhere), you can easily create a type that dynamically creates a new buffer when you slice it:
class slicingbuffer:
    def __init__ (self, source):
        self.source = source
    def __getitem__ (self, index):
        if not isinstance(index, slice):
            return buffer(self.source, index, 1)
        elif index.stop is None:
            return buffer(self.source, index.start)
        else:
            end = max(index.stop - index.start, 0)
            return buffer(self.source, index.start, end)
      
        
        
        
      
    If you only use it with a module re
      
        
        
        
      
    , it can probably work as a direct replacement for memoryview
      
        
        
        
      
    . However, my tests show that this already gives you a lot of overhead. Thus, you may want to do the opposite and wrap your Python 3s memory object into a wrapper that gives you the same interface as buffer
      
        
        
        
      
    :
def memoryviewbuffer (source, start, end = -1):
    return source[start:end]
PYTHON_3 = sys.version_info >= (3, )
if PYTHON_3:
    b = memoryviewbuffer
    s = memoryview(s)
else:
    b = buffer
print(timeit.timeit("r.search(b(s, 10))", setup='from __main__ import r, s, b'))