LISP NFA Recognizer

I need to define a function in lisp that, given a regex and e-NFA as input, returns true if the expression is accepted by automata.

First, I defined a function that generates an e-NFA (like a cons-cell) from a regular expression with these operators {|, *, +, ...}.

For example: with expression (or b) the output would be:

((INITIAL 0) (DELTA 0 EPSILON 1) (DELTA 0 EPSILON 3) (DELTA 2 EPSILON 5) (DELTA 4 EPSILON 5) (DELTA 1 A 2) (DELTA 3 B 4) (FINAL 5))

      

I came up with this idea: I wrote a recognition function - or (which handles or case):

(defun nfa-recognize-or (fa input init final)
 (let ((arc (cdr (car fa))))
  (cond ((and (equal (first arc) init) (equal (second arc) 'epsilon))
         (nfa-recognize-or (cdr fa) input (third arc) final))
        ((not (equal (first arc) init))
         (nfa-recognize-or (cdr fa) input init final))
        ((and (equal (first arc) init) (equal (second arc) (car input)))
         (nfa-recognize-or fa (cdr input) (third arc) final))
        ((equal (third arc) final)
         T)
  )
 )
)

      

If I call the function like this:

(nfa-recognize-or (cdr fa) '(a) 0 5)

      

It returns a "stack overflow". The problem is that after some calls, the value fa =

(DELTA 1 A 2) (DELTA 3 B 4) (FINAL 5))

      

with init = 2 and final = 5 as before. At this point, the next state that the program should consider should be

(DELTA 2 EPSILON 5)

      

to return TRUE, but that is not possible because at the moment the NFA is "being consumed" and I cannot reverse it to check for other states.

Do you have any suggestions?

+3


source to share


2 answers


I will start at a high level and work in detail. I won't give a complete answer as it is not short, but hopefully it will be enough to set you on the right track.

given regex and e-NFA as input, it returns true if the expression is accepted by the automata.

For the sake of simplicity, I will assume that the regular expression is in a form compatible with a string of characters from the alphabet E

plus normal (, ), +, *

. I also assume that it is e-NFA

given in a form that contains all the information you need to make up all the information you would expect in a formal definition e-NFA

. The difficulty of translating between the actual formats and my preferred formats is not addressed.

I always recommend figuring out how you manually fix the problem before writing any code to do it. How would you solve this problem on a test? How can you tell if a e-NFA

regex is accepted ? More fundamentally, what is a reasonable interpretation of this requirement? My interpretation is this:

An e-NFA

M

accepts a regular expression r

if it L(r)

is a subset L(M)

.



In other words, if it w

is a string to be matched r

, it should be accepted M

. This first step is important because it poses a problem into one that we can begin to formally address. We need to understand if the language of one thing is a subset of the language of another. There is no direct mechanism for an official answer to this question directly for regular expressions that I am aware of. However, there is a well-known construct for answering questions like this with finite state machines: I call this construct the Cartesian Product Machine construct, and it is used to create a state machine that accepts a language based on the languages ​​of two other state machines. In particular: if L \ R = 0

(where 0

is the empty set, and the \

difference is given), thenL

is a subset of r

. Cartesian Product Machine design allows us to build a machine for L \ R

given machines for L

and r

. We already have one for M

; it is given. All we need is a machine for L(r)

, and we are ready to start a deterministic algorithm to release a machine for language difference. Then all that remains is to check if the resulting language is empty. For details on the design of the Cartesian Product Machine see my other answer here . Don't worry about having NFA

s; construction work is the same as for DFA as indicated here .

Given a regex r

, there is an algorithm to create NFA

that takes L(r)

. I don't have a ready-made link for this, so I'll mask it. Basically we define some recursive rules to build e-NFA

based on each member of the regex. Here are the rules:

1. Alphabet symbol a: M(a)

--->()-a->(O)


2. Concatenation rs: M(rs)

--->[M(r)]-e->[M(s)]

* Note: one -e-> from each accepting state of M(r) to initial state of M(s)
        initial state is that of M(r)
        accepting states are those of M(s)


3. M(r+s):

-->()-e->[M(r)]
    \-e->[M(s)]

* Note: new initial state added
        accepting states are those of M(r) and M(s)

4. M(r*):

     /--e--\
    V       \
--->()-e->[M(r)]

* Note: one -e-> from each accepting state of M(r) to initial state
        new initial state
        accepting state is only the new initial state

      

We now have an NFA for your regex, and we know how to build a Cartesian products machine for that difference. As a result, we get a big e-NFA

one representing the difference L(r)

and L(M)

. We have already said that the difference between these languages ​​is empty if f L(r)

is a subset L(M)

, and L(r)

is a subset L(M)

if f M

accepts r

. The question remains: is the language of our Cartesian product machine empty or not?

There are many ways to answer this question, but the most straightforward one would be to simply run the graph search algorithm starting from the initial state to see if any of the receiving states are available. If a graph search shows that a state is available, then there are multiple lines in that language. If none of the states available from the original state accept, the language is empty. Any search algorithm for directed graphs - depth-first, width, etc. - will be good. Note that the graph is not acyclic, so do not use methods that require acyclic graphs.

+1


source


I think I worked out what you are trying to do.

Edit: I thought you were trying to convert regex to e-NFA, but it looks like you want to do something else. I'll leave it anyway.

Try to make some progress on creating a function re->enfa

that takes the following parameters:

  • Regular expression in some Lisp format
  • Status symbol that will have an epsilon transition to sub-e-NFA matching this regex
  • Symbol for the corresponding leaving state

We will use symbols to identify states so that we can easily find unique identifiers. We'll write a few small functions if we change our mind.

(defun new-state () (gensym))
(defun transition (from to on) `((delta ,from ,on ,to)))
(defun nfa-union (&rest args) (apply #'append args))

      



Now let's look at the two cases that we will introduce into functions. One for concatenating two regular expressions and one for alternating.

(defun concat-nfa (a b s0 sn)
  (let ((m (new-state)))
    (nfa-union (re->enfa a s0 m) (re->enfa a m sn))))
(defun or-nfa (a b s0 sn)
  (let ((s1 (new-state))
        (s2 (new-state))
        (sk (new-state))
        (sl (new-state)))
    (nfa-union (transition s0 s1 'epsilon)
               (transition s0 s2 'epsilon)
               (transition sk sn 'epsilon)
               (transition sl sn 'epsilon)
               (re->enfa a s1 sk)
               (re->enfa b s2 sl))))

      

Then you can determine what might look like re->enfa

:

(defun re->enfa (x s0 sn)
  (if (atom x) (transition s0 sn x)
    (case (car x)
      (or (if (cddr x) (or-nfa (cadr x) (cons 'or (cddr x) s0 sn)
              (re->enfa (cadr x) s0 sn)))
      (concat  (if (cddr x) (concat-nfa (cadr x) (cons 'concat (cddr x) s0 sn)
              (re->enfa (cadr x) s0 sn)))
      (* (kleene-nfa (cadr x) s0 sn))
      (+ (re->nfa `(concat ,(cadr x) (* ,(cadr x))))))))

      

This should give you a starting point with some others to fill out. There are also some changes you could make. For example, you can implement +

such that the e-NFA for the inner expression is evaluated only once.

You also need to add a function that adds a start and end state marker and wraps re->enfa

.

0


source







All Articles