Primitives for Content Check
Good Word List for Text
- domain specific to increase precion and reliability
- created by processing good documents
- any word not in list shown to SO with context
Bad Word List (optional)
- not reliable (mispellings, accidental or intentional)
- no increase in efficiency given good word list processing
- trigger special case rules
Image data (current research)
- extract text and analyze as above
- recognize objectionable images by sketch or color