Why is my search in BBEdit throwing a stack overflow error?

I am stumped about the "stack overflow" error - "from stack space (application error code: 12246)) - what I get in BBEdit when I do" replace all "looking

(@article(((?!eprint|@article|@book).)*\r)*)pmid = {(.+)}((((?!eprint|@article|@book).)*\r)*(@|\r*\z))

      

and replacing

\1eprinttype = {pubmed}, eprint = {\4}\5

      

I can use these same patterns manually, find and replace each time without any errors, even if the match no longer occurs. I can also avoid the error by working with smaller files.

I suspect this is my inefficient and sloppy regex coding which is to blame, and would appreciate an expert help with this more efficiently. I'm trying to find all entries in the BibLaTeX bibliography that don't have a field eprint

yet, but have a field pmid

and replace the field with the pmid

appropriate e-print specification (using eprint

and eprinttype

).


Update: After some experimentation, I found that a different approach is the only one I can get to work. Search

(?(?=@article(.+\r)+eprint = {(.+\r)+}\r*)(?!)|(@article(.+\r)+)pmid = {(.+)}((.+\r)+}\r*))

      

and replacing

\3eprinttype = {pubmed}, eprint = {\5}\6

      

does the trick. The only problem with this is that the backreferences are fragile, but I can't get named backreferences to work in BBEdit.

+3


source to share


2 answers


Probably a catastrophic rollback caused by this last part:

.)*\r)*(@|\r*\z))

      

If you break it down and simplify it, you are, in fact, will be next to him .*

, a \r*

one more \r*

. Now draw a string of characters \r

at the end of the input: how to distribute each \r

? Which of these little articles will each character absorb \r

? If you have \r\r\r\r\r

, you can eat all five \r

with a part .*

and have no parts at all \r*

... or you can make up any number of permutations that will still match. Since he *

is greedy, he will first try to fill.*

but if that fails, it should keep trying the permutations until one of them works. So it probably clogs a bunch of your resources with unnecessary backtracking until it finally crashes.

I'm not an expert on optimization techniques for regex, but I would start there if I were you.



Update:

Check out the Wikipedia article on PCRE :

If the PCRE build option "NoRecurse" is not set (aka "--disable-stack-for-recursion"), sufficient stack space must be assigned to PCRE by the calling application or operating system .... Although the PCRE documentation warns that the build option " NoRecurse "makes PCRE slower than the alternative, using this completely eliminates the problem.

So, I think a catastrophic go-back is a good bet. I will try to fix this problem by changing your regex before changing build options to PCRE.

+3


source


Obviously, this is some mistake. But you can try changing the expression a little. It's hard to optimize an expression without knowing the requirements, but here's a guess:

(@article(?:(?:(?!eprint|@article|@book|pmid)[^\r])*+\r)*+)pmid = {([^\n\r]+)}((?:(?:(?!eprint|@article|@book)[^\r])*+\r)*(?:@|\r*\z))

      

Replaced by:



\1eprinttype = {pubmed}, eprint = {\2}\3

      

BBEdit seems to use PCRE if it's (very) deprecated the above expression shouldn't be compatible.

0


source







All Articles