How to prove the correctness of a given grammar?

I am wondering how langauge programmers assert and prove their grammar is correct. Suppose I have created a new grammar for the new langauge. I can test my grammar with the unit test tool by providing different types of test programs. However, I will never be 100% sure that my grammar is correct. How do language designers ensure that their grammar is correct in the real world?

Let's say I created a grammar for a new language using pencil and paper. However, I made a mistake and my grammar accepts expressions that end with + + 2 + 2+. I will use my language using this incorrect grammar unless I find a mistake in it. After doing and unit testing, I can find the error. Is it possible to find it before starting any implementation?

Definitely, I can try my grammar with some examples of pencil and paper input (output, etc.), but I might miss some corner cases. Is there a better approach, or how do developers test their grammar in real language?

+3


source to share


1 answer


Proof is a logical argument demonstrating the truth of the claim. There are as many ways to prove something, as there are ways of thinking about the problem. A common way of proving things about discrete structures (like grammars) is by using mathematical induction. Basically, you show that something is true in base cases - the simplest cases are possible - and then show that if it is true for all cases with a certain size, it should be true for the next size cases.

In our case: suppose we only wanted to prove that your grammar did not spawn a + at the end of a word. We could do induction on the number of products used to construct a string in a language. We would define all relevant base cases, show the hold property for those strings, and then show that longer strings in the language are structured in such a way that it is impossible to get a + at the end. Here's an example.

S: = S + S | (S) | x



Base case: shortest string in x language, generated as S β†’ x. It does not end with +.

Induction hypothesis: Suppose all strings created using up to and including k productions do not end with +.

Induction step: we have to show strings created using more than k productions do not end with +. If we apply rule (S) to any string generated from S, we don't add +, so the property holds. If we apply S + S to strings generated from S, the last character in S + S is the last character of a shorter string (at least 2 characters shorter) generated by S. By induction, this string does not end with +, nor does doesn't do it. There are no other settings, so not a single line in the language ends with +. QED

0


source







All Articles