How to handle nested comments in antlr lexer

How to handle nested comments in antlr4 lexer? those. I need to count the number of "/ *" inside this token and close only after the same amount of "* /" has been received. For example, the D language has nested comments like "/ + ... + /"

For example, the following lines should be treated as one comment block:

/* comment 1
   comment 2
   /* comment 3
      comment 4
   */
   // comment 5
   comment 6
*/

      

My current code is as follows and it doesn't work on the above nested comment:

COMMENT : '/*' .*? '*/' -> channel(HIDDEN)
        ;
LINE_COMMENT : '//' ~('\n'|'\r')* '\r'? '\n'  -> channel(HIDDEN)
        ;

      

+3


source to share


5 answers


Terence Parr has these two lexer lines in the Swift Antlr4 grammar for lexing out nested comments:



COMMENT : '/*' (COMMENT|.)*? '*/' -> channel(HIDDEN) ;
LINE_COMMENT  : '//' .*? '\n' -> channel(HIDDEN) ;

      

+5


source


I use:

COMMENT: '/*' ('/'*? COMMENT | ('/'* | '*'*) ~[/*])*? '*'*? '*/' -> skip;

      



This causes anyone /*

inside a comment to be considered the start of a nested comment and similar */

. In other words, there is no way to recognize /*

it */

differently than at the beginning and at the end of the rule COMMENT

.

So something like /* /* /* */ a */

will not be fully recognized as a (bad) comment (inconsistency /*

and */

s) as if when used COMMENT: '/*' (COMMENT|.)*? '*/' -> skip;

but how /

followed *

, followed by correct nested comments /* /* */ a */

.

+2


source


Works for Antlr3.

Allows nested comments and '*' in comments.

fragment
F_MultiLineCommentTerm
:
(   {LA(1) == '*' && LA(2) != '/'}? => '*'
|   {LA(1) == '/' && LA(2) == '*'}? => F_MultiLineComment
|   ~('*') 
)*
;   

fragment
F_MultiLineComment
:
'/*' 
F_MultiLineCommentTerm
'*/'
;   

H_MultiLineComment
:   r=  F_MultiLineComment
    {   $channel=HIDDEN;
        printf(stder,"F_MultiLineComment[\%s]",$r->getText($r)->chars); 
    }
;

      

+1


source


I can give you an ANTLR3 solution that you can configure to work in ANTLR4:

I think you can use a recursive rule call. Execute a non-greedy comment rule for / * ... * / that calls itself. This should allow unlimited nesting, not counting the opening + closing comment markers:

COMMENT option { greedy = false; }:
    ('/*' ({LA(1) == '/' && LA(2) == '*'} => COMMENT | .) .* '*/') -> channel(HIDDEN)
;

      

or maybe even:

COMMENT option { greedy = false; }:
    ('/*' .* COMMENT? .* '*/') -> channel(HIDDEN)
;

      

I'm not sure if ANTLR is choosing the correct path depending on any char or comment commenter. Try it.

0


source


  • This will handle: '/ * / * /' and '/*.../*/, where the comment body is' /' and '... /' respectively.
  • Multi-line comments will not be nested inside comments on the same line, so you cannot start and start multi-line comments within a single line comment.
    • This is an invalid comment: '/ * // * /'.
    • You need a newline to end a single line comment before "* /" can be used to end a multiline comment.
    • This is a valid comment: '/ * // * / \ n / * /'.
    • Comment body: '// * / \ n /'. As you can see, the complete single line comment is included in the multiline comment body.
  • Although "/ * /" may end a multiline comment, if the preceding character is "*", the comment ends with the first "/" and the remaining "* /" must end a nested comment, otherwise there is an error. The shortest path wins, it's not greedy!
    • This is an invalid comment / **** / * /
    • This is a valid comment / * / **** / * /, the body of the comment is / **** /, which is itself a nested comment.
  • Prefix and suffix will never match in a multiline comment tag.
  • If you want to implement this for the "D" language, change the "*" to "+".

  COMMENT_NEST : '/*' ( ('/'|'*'+)? ~[*/] | COMMENT_NEST | COMMENT_INL )*? ('/'|'*'+?)? '*/' ;

  COMMENT_INL : '//' ( COMMENT_INL | ~[\n\r] )* ;

0


source







All Articles