Antlr4 is slow on RaspberryPi
We are trying to parse a custom language on RaspberryPi B using Antlr4 (Python2 target). However, it's too slow to do anything serious: parsing multiple lines takes about ten seconds. This is my code:
# -*- coding:Utf-8 -*- from antlr4 import * from TransposeurLexer import TransposeurLexer from TransposeurParser import TransposeurParser import sys from Listener import Listener def transpose(file_path): input = FileStream(file_path) lexer = TransposeurLexer(input) stream = CommonTokenStream(lexer) parser = TransposeurParser(stream) tree = parser.myfile() listener = Listener() walker = ParseTreeWalker() tree) return listener.array
grammar Transposeur; myfile: block+; block: title | paragraph ; title: firstTitle | secondTitle ; firstTitle: '#' ' '? unit+ newline; secondTitle: '##' ' '? unit+ newline; paragraph: unit+ newline; unit: low+ | upper | (low | cap)* cap (low | cap)* | ponctuation | number | space ; upper: cap cap+; number: digit+; low: LOW; cap: CAP; newline: NEWLINE; ponctuation: SPACE? PONCT; space: SPACE; digit: DIGIT; LOW: [a-z] | 'ç' | 'é' | 'è' | 'à' | 'â' | 'ê' | 'ù' | 'î' | 'ô' | 'û' | 'ë' | 'ï' | 'ü' | 'œ'; CAP: [A-Z]; NEWLINE: '\r'? '\n'; SPACE: ' '; DIGIT: [0-9]; PONCT: ',' | '!' | '?' | ';' | '.' | ':';
The team takes time
tree = parser.myfile()
. Is there a way to make things faster?
I suspect that the problem solving problem is low + vs ( low | cap) * .... where you might have to look arbitrarily far ahead to determine which ones to apply.
I think the real problem is that the unit + reference is ambiguous about the low + . Given the text for a unit consisting of:
(fifty a's). You can analyze it like this:
- a unit having low + all "a",
- a block with the first low + any prefix and the second low + the rest of the "a" s (which is 2500 possibilities)
- unit of units with first low + any prefix, last low + any remaining suffix, and middle low + characters in between (path, path more options)
- unit block units ...
So, I think this part of your grammar is very ambiguous, and ANTLR is researching a huge number of variations trying to pick one. You're probably lucky that ANTLR is fast enough to finish at all: -}
You will have the same problems with unit + and tops (== cap + ).
It is not clear to me what part of the structure you really need to capture. It looks to me like you just want a string. Try rewriting it as:
unit: low | cap | ponctuation | number | space ;
Better yet, define the unit this way:
unit: LOW | CAP | PONCT | DIGIT | SPACE ;
source to share