Generalized flow analysis?

Are there any libraries or technologies (in any language) that provide a regular expression-like tool for any data type like a stream or a list (as opposed to just character strings)?

For example, suppose you are writing a parser for your pet programming language. You have already entered it into the Common Lisp list of objects representing tokens.

You can use a pattern like this to parse function calls (using C-style syntax):

(pattern (:var (:class ident)) (:class left-paren) (:optional (:var object)) (:star (:class comma) (:var :object)) (:class right-paren))

To bind variables for the function name and each of the function arguments (in fact, it would probably be implemented so that this template would probably bind a variable for the function name, one for the first argument and a list, but this is not a very important detail) ...

Would something like this be useful at all?

+2


source to share


3 answers


I don't know how many answers you will get on this question, as most languages ​​do not have the kind of robust streaming APIs you think they mean; so most people reading this probably don't know what you are talking about.

Smalltalk is a notable exception, dispatched with a rich hierarchy of Stream classes, which when combined with the Collection classes allows you to do some pretty impressive things. While most Smalltalks also come with regex support (pure ST Vassili Bykov implementation is a popular choice), regex classes are unfortunately not integrated with Stream classes in the same way as Collection classes. This means that using streams and regular expressions in Smalltalk usually involves reading strings of characters from the stream and then testing those strings separately with regex patterns - not sorting "read next n characters until pattern matches" or "read next n characters that match this pattern is "kind of functionally you probably mean."



I think a powerful API flow combined with powerful regex support would be great. However, I think you will have trouble generalizing the different types of streams. Streaming reading on a character string will present several difficulties, but file streams and TCP will have their own exceptions and delays that you have to handle gracefully.

+1


source


Try to find scala.util.regexp

both the API documentation and sample code at http://scala.sygneca.com/code/automata . I think that letting a computational linguist match strings of words, for example, to search for part of speech patterns.



0


source


This is the principle behind most parsers that work in two stages. The first phase is a lexer where identifiers, language keywords, and other special characters (arithmetic operators, curly braces, etc.) are identified and split into token objects, which usually have a numeric field indicating the type of token, and optionally another field. indicating the text of the token.

In the second phase, the parser works with the Token objects, matching them only with the magic number to parse phrases. (Software for this includes Antlr, yacc / bison, Scala cala.util.parsing.combinator.syntactical library, and a bunch of others.) These two phases don't have to be completely dependent on each other - you can get your Token objects from anywhere you like. The magic number aspect seems important, though, because magic numbers are assigned to constants, and that's what makes it easier to express your grammar in a readable language.

And remember that anything you can accomplish with a regex can also be accomplished with a context-free grammar (usually just as easy).

0


source







All Articles