CS154 Assignment #4 Solutions

(a) All strings of 0's and 1's are generated. In fact, the third production, S -> S0, is not necessary.
(b) The string 0 has two leftmost derivations. They are S=>0S=>0 and S=>S0=>0.
(c) P=({q}, {0,1}, {S,0,1}, delta, q, S) where delta is given by the following rules:
```
     delta(q,0,0) = {(q,epsilon)}
     delta(q,1,1) = {(q,epsilon)}
     delta(q,S) = {(q,0S), (q,1S), (q,S0), (q,epsilon)}
```
(d) P=({p,q,r}, {0,1}, {Z,S,0,1}, delta, p, Z, {r}) where delta is given by the following rules:
```
     delta(p,epsilon,Z) = {(q,SZ)}
     delta(q,0,0) = {(q,epsilon)}
     delta(q,1,1) = {(q,epsilon)}
     delta(q,S) = {(q,0S), (q,1S), (q,S0), (q,epsilon)}
     delta(q,Z) = {(r,epsilon)}
```
Note that state p causes the PDA from (b) to operate, with Z as a bottom-of-stack marker. If Z is ever exposed after that first step, then this PDA accepts by going to state r.
Just remove the third production, leaving:
```
     S -> 0S | 1S | epsilon
```
(a) Our PDA will count on its stack whichever of 0 and 1 it has seen in excess so far. It could remember which it is counting in the state, but to avoid having more than one state q, it will instead use three stack symbols: X as a bottom-marker, Z to count 0's, and U (unit) to count 1's. The PDA is P = ({q},{0,1}, {X,Z,U}, delta, q, X), where delta is given by:
```
     delta(q,epsilon,X) = {(q,epsilon)} /* we may empty the stack
                                           if the difference is 0 */
     delta(q,0,X) = {(q,ZX)} /* push Z on a 0 input */
     delta(q,0,Z) = {(q,ZZ)}
     delta(q,1,X) = {(q,UX)} /* push U on a 1 input */
     delta(q,1,U) = {(q,UU)}
     delta(q,0,U) = {(q,epsilon)} /* pop when the input is different
     delta(q,1,Z) = {(q,epsilon)}    the symbol being counted */
```
(b) Having only one state makes life a lot easier than the general case. In particular, each delta rule becomes only one production, while if there were s states, a rule with a stack of length k would become s^k productions. The variables are S, [qXq], [qZq] and [qUq]. The productions are:
```
     S -> [qXq]
     [qXq] -> epsilon
     [qXq] -> 0[qZq][qXq]
     [qZq] -> 0[qZq][qZq]
     [qXq] -> 1[qUq][qXq]
     [qUq] -> 1[qUq][qUq]
     [qUq] -> 0
     [qZq] -> 1
```
First, there are no useless symbols. We find that B and S derive epsilon, then A derives a string with production body 0B1. Also, S derives itself and A, while A derives a string with B.
To eliminate epsilon-productions, we note that S and B are nullable, while A is not. Since no body consists of only S's and B's, it is sufficient to eliminate each S and B optionally from the bodies, yielding a grammar without epsilon-productions:
```
     S -> AAS | AA | A
     A -> 0A1 | 01 | 0B1
     B -> B1 | 1
```
The next step is to eliminate unit productions. There is only one: S -> A. Thus, we replace this production by all the bodies of the A-productions:
```
     S -> AAS | AA | 0A1 | 01 | 0B1
     A -> 0A1 | 01 | 0B1
     B -> B1 | 1
```
Now, we replace all terminals in bodies, unless that body is a single terminal. To do so, we introduce variables C for 0 and D for 1:
```
     S -> AAS | AA | CAD | CD | CBD
     A -> CAD | CD | CBD
     B -> BD | 1
     C -> 0
     D -> 1
```
Lastly, we need to shorten the three bodies AAS, CAD, and CBD that are too long for CNF. We introduce variables E for AS, F for AD, and G for BD, yielding the following CNF grammar:
```
     S -> AE | AA | CF | CD | CG
     A -> CF | CD | CG
     B -> BD | 1
     C -> 0
     D -> 1
     E -> AS
     F -> AD
     G -> BD
```
Note that the formal construction given in the reader didn't take advantage of the possibility that the same body appears several times. However, it is safe to do so, introducing only as many new variables as you need for one copy of the body.