ArcSpread Logo Photo Andreas Paepcke

Independent Study, Possibly RAship Opportunities

with Andreas Paepcke

1. Which Courses, and Why?

Explore what vector space embeddings of courses taken by students can reveal about pathways through college.

Prerequisites: CS224W preferred; OK to take simultaneously. Alternatively: CS224N, CS246. Light-weight SQL familiarity, knowledge of PyTorch/TensorFlow a plus.


If we understand how students make choices on their way through college, we can improve decision support for these important pathways. Courses chosen today have impact on other courses being options down the road, versus remaining closed for lack of prerequisite knowledge. Overly narrow course choices leave on the table important contributions that college can make to students' lives. A first goal is to understand what Stanford students have chosen over the past 18 years. That understanding can inform both future students, and university policy.

We will use historic enrollment, and possibly other data. Already available are vector embeddings from courses taken by past students. An exploration of the corresponding clusters is a first step. But we plan to generate a network of course sequences and their frequencies, to then apply network analytics to these structures. Our hope is to find course-taking patterns, and instances of unusual, innovative course choice behaviors.

Scatter plot of course enrollments since 2000.
Each color corresponds to a discipline:
engineering, law, H&S, etc.

2. Discover Majors Degree Requirements.

Prerequisites: Any neural networks course.


The University has no central place where the requirements for obtaining a Bachelors or Masters degree described in a unified form. Every department decides their own palette of options. Each organization chooses how to represent the requirement alternatives in writing. Finding and parsing these documents would be tedious. But thousands of students have fulfilled those requirements over the years. We will try to derive the requirements from the history of course taking.

We will use analytic tools, such as neural networks to derive every department's Masters degree requirements at Stanford, and develop a unified form to describe them. Using eighteen years of enrollment history, and the majors of the respective students, we hope to learn the sometimes widely branching alternative paths to the degrees. We will also attempt to describe the undergraduate requirements from observed data, and express them in the form we develop.

Computer Science English

Example descriptions of masters
degrees in CS and English.

3. Breadth of Student Interests Over Time

Prerequisites: Linear algebra and basic statistics.


Most universities encourage students to take advantage of courses offered outside of students' focus of study. Policy changes over time have attempted to encourage breadth of study, particularly for undergraduates. Has the breadth of interest changed among students during the past eighteen years? If so, let's understand what may have triggered those changes.

We will use vector embeddings of course choices to compute the intellectual spread of student choices. Student choices are motivated by requirements, and background. But given these embeddings we will analyze how the resulting per year distributions of spread have changed during the past n years.

4. The Gist of Course Evaluations

Deploy NLP on course evaluation answers to the question "What would you like to say about this course to a student who is considering taking it in the future?"

Prerequisites:Some NLP class.


When students use Carta, they often glean information from the textual course evaluation part. We aim to extract salient course information from the text. If successful, the results of this work might end up in Carta to help future students.

The first thought when thinking of applying NLP to opinions tends to be 'sentiment analysis.' We can of course run such techniques over evaluations, particularly because the domain of discourse is narrow: The content is always about Stanford courses.

But more interesting will be the subtler gems. Hints such as "Definitely do the reading every week." Or "Problem sets are only every other week." Or "Find your project partner early, because you will need all the time you can get for completing the project." These hints will be harder to isolate, but could be extremely useful as a potential addition to Carta one day.

5. Automatic Study Guides for MOOCs

Prerequisites: Python, CS221N and/or CS229.

Given video closed caption files of instructional videos, student forum posts, the Web, and derived resources, create personal study guides for students.


We will take the view that many online courses will be modular, like Stanford's self-paced database course. We will extract word clusters from closed caption files of course videos to identify topics. We will then attach learning resources to each topic. Resources are relevant course forum question/answer pairs, video snippets, Wikipedia search results, and student-identified entities. We will use these resources to automatically create study guides and learning hints.

We can obtain closed caption files for a number of Stanford's online courses. These will be the source of word clusters that each define a topic. We also have a half billion individual 'events' of learners interacting with Stanford's open online courses. Events are starting or rewinding video tapes, forum posts, and assignment submissions. Forum posts identified by an existing poster-confusion classifier, as well as repeated incorrect assignment submissions will serve as triggers to offer topic- and student-specific study resources. We will need to identify those resources automatically.

Specific Projects

6. Predicting Sensitivity of Coral Reefs to Heat Stress

Prerequisites: Python, CS231N.


An existing biology project is researching the impact of artificially introduced heat stress on coral bleaching. The 400 colonies under investigation are surrounded by sand, other types of corals, and algae. Given the heat stimuli and coral response data, can we create a predictor of coral response from photos taken around the corals? For example, can we help predict how surroundings of 25% sand, 30% branching corals, 35% encrusting algae, and 10% mounding corals predict coral response to heat?

I don't yet know details about the number of photos, or availability of training data. In the absence of such labeled examples we may not be able to identify each surrounding species. But we might well still be able to determine how color and edge density distributions predict measured data. Our contact will be in the field between Jan 18 and early February, generating more photos. He is hoping for instructions from us as to how best to shoot the images. I will keep this entry updated as I learn more.

7. Teaching Choreography Online

Prerequisites: Python. Experience with either HCI or distributed systems.


We will develop infrastructure and (with help) pedagogy for teaching choreography entirely online. Choreography is the activity of designing dances. Geographically distant students will be able to work on dance design exercises together. The 'performers' will be avatars of any shape. They will operate in a 3D robotics simulation environment. Students will continuously be able to observe their teammates' work.

We will try to use Gazebo, an existing high fidelity robot simulation environment. The software was developed for the DARPA robotics challenges, and can take into account mass distributions of simulated avatars. Gazebo is by nature distributed, but we many need additionally to use a high-function distributed messaging system.
Three main elements are involved in this work. Development of a Web based UI for easily manipulating avatars, the distributed messaging for allowing distant Gazebo instances to be coupled, and some choreography pedagogy. We will consult with a professional choreographer.

Specific Projects

Andreas Paepcke
Home Page