CS486: Advanced Large Language Models Research Seminar

Spring 2026 · Stanford University

2026 focus: Human-LLM Collaboration for Scientific Discovery

Instructor: Edward Y. Chang

Lectures: Mondays & Wednesdays, 4:30 PM – 5:50 PM

Operational format: Monday seminar; Wednesday research clinic, office hours, and team check-ins

Dates: March 30, 2026 – June 3, 2026

Announcements

3.2026 Prerequisite: CS372 is strongly recommended. Students who have not taken CS372 may be admitted by exception with instructor approval, typically only if they are exceptionally strong coders and have substantial OpenClaw experience. Also, please check out this free online textbook for the required background.
3.2026 The main goal of the seminar is to mentor student teams toward one or more NeurIPS-style submissions, with the paper ready and submitted by the tentative deadline of May 11, 2026.

Course Overview

This seminar studies how scientists and mathematicians can conduct exploratory research through disciplined human-LLM collaboration, and how to build a platform that makes such collaboration trustworthy, productive, and publishable.

The purpose of this course is to mentor students to build a platform for scientific discovery in human-LLM collaboration settings and to write a paper for submission to NeurIPS. The tentative submission deadline is May 11, 2026, and teams should plan for the paper to be ready and submitted by that date. The course builds directly on ideas covered in CS372, but shifts from conceptual foundations to implementation, integration, evaluation, and research writing.

The main theme is AGI-oriented scientific discovery. We will design and implement a platform that helps scientists and mathematicians search literature, organize memory, generate hypotheses, run controlled exploration, track provenance, validate intermediate results, and produce research artifacts under human supervision. The platform itself is a research contribution. The use of the platform in mathematical or scientific case studies may also become research papers.

The course is motivated in part by recent remarks from Terence Tao, who has argued that AI is becoming a practical research assistant in mathematics and theoretical physics, especially for literature search, coding, calculation, and rapid exploration of candidate ideas, while the human remains responsible for selecting problems, designing workflows, and verifying correctness.

Core Questions

  1. What are the essential components of a trustworthy human-LLM research platform?
  2. How should such a system support memory, provenance, validation, rollback, and multi-path exploration?
  3. How do we keep the human in control of significance, correctness, and scientific judgment?
  4. How can we evaluate whether a human-LLM platform truly improves exploratory scientific work?
  5. How do we turn both the platform and its usage into publishable research papers?

Course Examines

  1. Human-LLM collaboration protocols for scientific and mathematical discovery
  2. Platform design for memory, validation, provenance, and research orchestration
  3. Exploratory workflows for conjecture generation, branching search, and verification
  4. Failure modes such as hallucination, sycophancy, context drift, and brittle overconfidence
  5. Evaluation of scientific usefulness, controllability, and reproducibility
  6. Paper writing and revision discipline for NeurIPS-style submissions

Prerequisite: CS372 is strongly recommended. Students who have not taken CS372 may be admitted by exception with instructor approval, typically only if they are exceptionally strong coders and have substantial OpenClaw experience.

Expected Deliverables

  1. A platform module or subsystem
  2. An integrated prototype by mid-May
  3. A submission-quality paper on the platform and/or its use
  4. A final presentation and demonstration

Tentative Grading

  1. Weekly milestone reports and participation: 20%
  2. Engineering contribution to the platform: 30%
  3. Paper draft quality and revision discipline: 25%
  4. Final presentation and demo: 15%
  5. Final report or submission-ready paper: 10%

Resources

Anchor Direction

The seminar is centered on building a platform for exploratory research in human-LLM collaboration settings, motivated by recent work on mathematical exploration and broader questions about AI-assisted scientific discovery.

Meeting Structure

Monday: main seminar meeting, lecture, design review, milestone planning, and paper discussion.

Wednesday: research clinic, implementation support, office hours, debugging, and team check-ins.

Schedule

The schedule below is a draft and will be refined.

# Date Topic Focus Milestone
1 3/30/2026 Kickoff Course aims, platform vision, and team formation Scientific discovery as human-LLM collaboration; course roadmap; paper targets Team formation begins
2 4/6 Systems #1 Requirements for a scientific discovery platform User stories, provenance, memory, validation, rollback, trust 2-page design brief
3 4/13 Systems #2 Search, synthesis, and research memory Literature workflows, persistent state, versioning, state management Module ownership and interface spec
4 4/20 Systems #3 Conjecture generation, branching exploration, and verification Multi-path reasoning, validator roles, computational checks, audit trails Prototype checkpoint #1
5 4/27 Paper #1 Human-in-the-loop research workflows Moderator roles, refusal, escalation, significance judgments, writing plan Evaluation plan and paper outline
6 5/4 Build #1 Integrated prototype and internal review End-to-end prototype, first case studies, debugging, interface refinement Integrated prototype v1
7 5/11 Paper #2 Writing sprint and launch week Experiments, figures, system diagrams, submission packaging Paper ready and submitted by tentative 5/11 deadline
8 5/18 Build #2 Post-submission refinement and usage studies Additional experiments, failure analysis, case-study extension Usage-paper or extension draft
9 5/25 Generalization Platform portability across domains Mathematics, science, causal discovery, long-term roadmap Final demo preparation
10 6/1 Finale Final presentations, demos, and next steps Lessons learned, summer continuation plans, release discussion Final presentation and report