Regex that identifies sections in COBOL

I am setting up an outline bracket plugin that uses a regex to define the outline of the currently open file.

Using regex101.com, I created the following regular expression (uses lookarounds to determine that the string starts with seven spaces and ends with "SECTION."):

(?<=^       )([A-Za-z\-0-9]*)(?= SECTION\.[ ]*$)

      

According to regex101.com this is fine, however when checked through jshint / jslint it indicates it is invalid. When I test it, it doesn't work (I suspect JSHint / JSLint is correct).

Below is an example of some cobol code where I want to get 2000-GET-EXPECTED-IN-DATE and 2020-GET-DUE-DATE.

          ...
      2000-GET-EXPECTED-BY-DATE SECTION.
          MOVE '2' TO W10-OPTION.

          ...

          ELSE                                                     
              MOVE 'Y' TO W10-NO-ERRORS                         
          END-IF.                                                  

      2017-EXIT.                                                   
          EXIT.                                                   
     /
      2020-GET-DUE-DATE SECTION.
      2020.

          MOVE 'N' TO W10-USER-INPUT-DUE-DATE-SW.
          MOVE '1' TO W10-OPTION.
          ...

      

So my questions are:

  • Is the regex valid?
  • If this is not true, then how was I wrong?
  • How do I write a regex to find the name of each section?
+3


source to share


2 answers


This works for me to find lines with "SECTION":

^[ ]{7}(.*)[ ]SECTION\.$

      



DEMO: http://regex101.com/r/zC1xY6/2

If you only want the section names: ^[ ]{7}\d+\-(.*)[ ]SECTION\.$

+1


source


Ok, it turns out that what I was using works, with two comments:

  • Must add global and multiline modifier when used via regex101.com,
  • Runs very slowly, so regex101.com plays out in a large program.

However, I found (via the regex101.com experiment) that if I change it to

^       (.*)(?= SECTION\.[ ]*$)

      



Then it works with no timeout problem. It looks like if I use ^[ ]{7}

as a prefix or I use ([A-Za-z0-9-]*)

as a capture group to match the name, then it is very slow.

The main issue is performance (.*)

compared to ([A-Za-z0-9-]*)

, and later much slower.

I can use the look and feel of regex101.com: (?<=^ )(.*)(?= SECTION\.[ ]*$)

however it throws an error with JSLint / JSHint. Therefore, I will not use it.

I tested the first one ^ (.*)(?= SECTION\.[ ]*$)

in the fork of the parenthesis outlines program and it works! :-)

0


source







All Articles