Information Pragmatics: Lessons for the web from human languages

Christopher Manning
Assistant Professor of Computer Science and Linguistics
Dept of Computer Science, Gates Building 4A, 353
Serra Mall, Stanford CA 94305-9040, USA


When people see web pages, they understand their meaning. When computers see web pages, they get only words and HTML tags. We'd like computers to see meanings as well, so that computer agents could more intelligently process the web. These desires have led to XML, RDF, agent markup languages, and a host of other technologies which attempt to impose more syntax and semantics on the web -- to make life easier for agents. Now, while some of these technologies are certain to see a lot of use (XML), and some of the others may or may well not, I think their proponents all rather miss the mark in believing that the solution lies in mandating standards for semantics. This will fail for several reasons: (i) a lot of the meaning in web pages (as in any communication) derives from the context -- what is referred to in the philosophy of language tradition as pragmatics (ii) semantic needs and usages evolve (like languages) more rapidly than standards (cf. the Académie française), (iii) meaning transfer frequently has to occur across the subcommunities that are currently designing *ML languages, and then all the problems reappear, and the current proposals don't do much to help, and (iv) a lot of the time people won't use the standards -- it's just like how newspaper advertisements rarely contain spec sheets. I will argue that, yes, agents need knowledge, ontologies, etc., to interpret web pages, but the aim necessarily has to be to design agents that can interpret information in context, regardless of the form in which it appears. And for that goal work in natural language processing is of some use, because that field has long been dealing with the uncertain contextual interpretation of ambiguous information. In case the abstract so far hasn't made it obvious: I intend this as more a pontifical than a technical talk, but will discuss a little relevant natural language processing technologies.


Christopher Manning is assistant professor of computer science and linguistics at Stanford University. Previously, he held faculty positions at Carnegie Mellon University and the University of Sydney. His research interests include probabilistic models of language and statistical natural language processing, constraint-based theories of grammar, parsing systems, computational lexicography, information extraction and text mining, and topics in syntactic theory and typology.