That is, your program, perhaps called myProg should be able to run with the command line
myProg reFile dataFilewhere reFile is a file containing a regular expression, and dataFile is a file containing the data that will be input to the resulting FA.
We will also accept programs that take a different input form, if you are more comfortable with that. For example, you might want to concatenate the RE in front of the data file, and use only the standard input. However, if you do so, better make sure that you have a way of telling where the RE ends.
Your output should be an indication (discussed below) of at what points in string w, the automaton for the regular expression .*R accepts. Notice that while R might be a simple pattern, say abc, what you are looking for is all the positions in R where abc occurs. Thus, the regular expression you match against w is not R, but .*abc, i.e., anything followed by abc.
Presumably, your program will modify R to become .*R, then feed that expression to your program from Project 1. The new code you write for this project will simulate whatever kind of automaton your code from Project 1 produces. If for some reason you can't handle the ``.*'' then an alternative is to start your automaton at every character of w, and simulate all these (copies of) automata, throwing them away when they reach a dead state.
We strongly advise you to design your program to ignore any input character that the automaton cannot use. The reason is that we can not be sure the 100Mb of text we downloaded doesn't have some weird character that is not part of the usual 128 ASCII set. If there is even one such character, the FA will die on reading it, and you won't be able to find anything beyond that point. Note that reading an unexpected character is different from reading an expected character that leads to a dead state (or, if the FA is an NFA, having no transitions on that character).
Either way, you have to ``buffer'' w, that is, remember the last so-many characters. You can make a reasonable assumption about how long lines could ever be, if you choose line-based output.