Credit for the project will be based on correct use of your software (e.g., selection of appropriate regular expressions), but there will be cheap prizes for originality.
Selecting the proper regular expressions is not trivial, as those of you following the FAQ sheet for Project 2 will note. For example, my test proposal that you look for capitalized Something of Something was badly done at least twice. First, it was pointed out that you want to include the whitespace at the end of the second Something, or you get Something of S. Then, it was pointed out that the actual context could involve punctuation, such as period or comma, and that these had to be added as options. Is that good enough to capture all, or the great majority, of instances of the intended pattern?
time foo
runs command foo and prints a synopsis of the user, system, and wall-clock time taken by foo at the end. However, we are only interested in whether you are able to do the search in a reasonable amount of time. Thus, anything that is too short to measure on a watch can be reported as ``instantaneous.'' For example, tell us whether you were able to get through the full 100Mb at all, or whether you took long or short times on the shorter files.
All this information can be handed in electronically using submit, or given to us in class by hardcopy.