Research Interests
I have been and mostly still am working on a number of things, including:
- Probabilistic databases, in particlar Uncertainty and Lineage Databases (ULDBs)
- Top-k ranking and efficiency guarantees
- Efficient XML-IR and XPath full-text search
- Document classification and focused crawling
- Ontology extraction, fusion, and automatic maintenance
Current Projects
- The Stanford Trio Project
A System for Integrated Management of Data, Uncertainty, and Lineage.
- The Stanford WebBase Project
A long-running project for large-scale Web archive management.
- TopX
Efficient and Versatile Top-k Query Processing for Text, Structured, and Semistructured Data (my Ph.D. thesis).
TopX is hosting the INEX topic development and interactive track.
You can download our current Java prototype from topx.sourceforge.net.
Past Projects
- DFG CLASSIX Project
Classification and Intelligent Search on Information in XML. Joint-work of MPII and University of Duisburg/Essen.
- The BINGO! Focused Crawler
Bookmark-Induced Gathering of Information with Adaptive Classification into Personalized Ontologies (my Diploma thesis).
It's still maintained at MPI!
Other Open-Source Stuff
- JNI_SVM-light-6.01
A true native interface for Thorsten Joachim's genuine SVM-light v6.01 in a compact Java API. Originally written as part of BINGO!,
this is probably the fastest currently available Java Native Interface (JNI) for SVM-light. It comes with two precompiled shared libraries for Windows (svmlight.dll) and RedHat/Debian/Suse Linux (svmlight.so)
and supports the full functionality of SVM-light such as classification, regression, and full Java-side parameterization. All sources can easily be recompiled for more eccentric operating systems.
See the JavaDoc and JNI_SVMLight_Test.java test class for more details.