News
From Walking to Thinking Feedback, Memory, and Causal Reasoning for Embodied AGI, Symposium on Humanoid Robotics & Sovereign AI for Future Living — Keynote speaker alongside robotics pioneers Oussama Khatib (Director, Stanford Robotics Center) and Hiroshi Ishiguro (Director, Intelligent Robotics Lab, Osaka University), February 2026
Multi-LLM Agent Collaborative Intelligence: The Path to AGI — First edition published by SocraSynth, March 2024; acquired and published by ACM Books, December 2025
Two Paradigm Bridges to AGI — Presented to Stanford PhD students, November 2025
Pioneering Data-Centric AI 2007–2012
Between NeurIPS 2007 and 2012, while serving as Director of Google Research (Beijing), our team built the scalable infrastructure and large-scale datasets that would become foundational to modern data-centric AI — years before the term was coined. We produced one of the first web-scale annotated image datasets (30,000+ real web images with multimodal signals), sponsored Fei-Fei Li's ImageNet project at Google, and published a series of parallel machine learning algorithms on MapReduce that enabled training at unprecedented scale. This body of work was consolidated in the Springer book Foundations of Large-Scale Multimedia Information Management and Retrieval (2011), whose Chapter 2 explicitly formulated a data-driven + model-based hybrid architecture (DMD), asking "Can more data help a model?" — a decade before "data-centric AI" became a recognized paradigm.
Research
My research focuses on building the theoretical and practical foundations for safe, reliable AGI systems.
Developing System-2 on LLMs for AGI
Enabling multiple LLM agents to collaborate through structured debate, perspective synthesis, and consensus-building. Includes SocraSynth, CRIT, EVINCE, SagaLLM, and the UCCT theoretical foundation.
SagaLLM: Transaction Guarantees for Multi-Agent Planning
Bringing database-style transactional guarantees (atomicity, consistency, recovery via compensating actions) to multi-agent LLM planning. Focuses on robust context management, validation, and failure-safe orchestration.
AI Safety & Alignment
Checks-and-balances frameworks for ethical AI, including RAudit for real-time verification and multi-branch governance architectures.
UCCT: Unified Cognitive Consciousness Theory
A theoretical foundation for how language models convert pretrained capacity into goal-directed behavior through semantic anchoring and threshold effects. Formalizes anchoring strength and connects in-context learning, retrieval, and fine-tuning under a unified mechanism.
UAudit: Enhancing Reasoning Capability of LLMs
Auditing and strengthening LLM reasoning through blind verification protocols, structured probes, and consistency checks — enabling third-party evaluation of black-box model reasoning.
Transactional Swarm Orchestration (TSO)
Enabling robots to discover causal relationships through physical intervention, with transactional guarantees and epistemic regret minimization.
Recent Publications
View full publication at Google Scholar →Working Papers & Preprints
-
arXiv 2026
Right for the Wrong Reasons: Epistemic Regret Minimization for Causal Rung Collapse in LLMsTL;DR:Causal origin of “right for wrong reasons”: rung collapse & aleatoric entrenchment. ERM belief-revision + 3-layer theory improves LLM recovery.
-
arXiv 2026
CausalT5K: An Extensive Benchmark for Conducting Causal Reasoning ResearchTL;DR: A large-scale benchmark (5,000+ samples) for evaluating causal reasoning in LLMs, covering intervention queries, counterfactual reasoning, and causal graph discovery across multiple domains.
-
arXiv 2026
RAudit: A Blind Auditing Protocol for Large Language Model ReasoningTL;DR: A protocol for verifying LLM reasoning correctness without access to the reasoning trace, enabling third-party auditing of black-box models through structured probes and consistency checks.
-
arXiv 2025
UCCT: The Unified Cognitive Consciousness Theory for Language Models: Anchoring Semantics, Thresholds of Activation, and Emergent ReasoningTL;DR: A unified theory explaining how LLMs turn pretrained capacity into goal-directed behavior via semantic anchoring. Formalizes anchoring strength S = ρd − dr − log k, predicting threshold-like performance flips and generalizing ICL, retrieval, and fine-tuning as anchoring variants.
-
arXiv 2024
EVINCE: Optimizing Adversarial LLM Dialogues via Conditional Statistics and Information TheoryTL;DR: Uses information-theoretic metrics to optimize multi-agent debates, measuring when additional dialogue rounds yield diminishing returns.
-
KDD 2026
REALM-Bench: A Real-World Planning Benchmark for LLMs and Multi-Agent SystemsTL;DR: Benchmark featuring real-world planning tasks (travel, scheduling, logistics) that exposes the gap between LLM reasoning capabilities and practical deployment.
-
VLDB 2025
SagaLLM: Context Management, Validation, and Transaction Guarantees for Multi-Agent LLM PlanningTL;DR: Brings database-style ACID guarantees to multi-agent LLM systems — ensuring plans are atomic, consistent, and recoverable through compensating transactions.
-
ICML 2025
A Checks-and-Balances Framework for Ethical AI AlignmentTL;DR: A three-branch governance architecture (Executive, Legislative, Judicial) for AI systems that prevents any single component from unilateral harmful actions.
-
NeurIPS AI Safety 2024
A Three-Branch Checks-and-Balances Framework for Context-Aware Ethical Alignment of Large Language ModelsTL;DR: Early version of checks-and-balances framework demonstrating how separation of powers prevents single-point-of-failure in AI alignment.
-
IEEE MIPR 2024
Behavioral Emotion Analysis Model for Large Language ModelsTL;DR: A framework for analyzing and modeling emotional behaviors in LLM responses, enabling more nuanced human-AI interaction.
-
IEEE CCWC 2023 100+ citations
Prompting Large Language Models With the Socratic MethodTL;DR: Introduces SocraSynth — using Socratic questioning to elicit deeper reasoning from LLMs through structured multi-turn dialogue and adversarial probing.
-
IEEE CSCI 2023 100+ citations
Examining GPT-4's Capabilities and Enhancement with SocraSynth (CRIT)TL;DR: Systematic evaluation of GPT-4's reasoning capabilities and introduction of CRIT — a critique-based method that improves accuracy through iterative refinement and self-correction.
Recent Papers (2023–2026)
Books
Foundations of Large-Scale Multimedia Information Management and Retrieval
The Journey of Mind
Teaching (Stanford)
-
Spring 2026
CS486 — Advanced Large Language Models Research Seminar
-
Winter 2026
CS372 — Artificial General Intelligence for Reasoning, Planning, and Decision Making
-
Spring 2025
CS372 — Artificial Intelligence for Reasoning, Planning, and Decision Making
-
2023–2024
CS372 — Artificial Intelligence for Precision Medicine and Psychiatric Disorders
-
2019–2022
CS372 — Artificial Intelligence for Disease Diagnosis and Information Recommendations
Background
Education
- Ph.D., Electrical Engineering
Stanford University - M.S., Computer Science
Stanford University - M.S., IEOR
University of California, Berkeley
Industry Experience
- Director of Research
Google, 2006–2012 - President
HTC Healthcare, 2012–2021
Selected Honors
- XPRIZE Tricorder
$1M Award for AI Medical Diagnosis, 2017 - ACM Fellow, IEEE Fellow
Citations: for contributions in scalable machine learning and healthcare
Previous Academic
- Professor (tenured)
UC Santa Barbara, 1999–2006