1. Read Logan’s paper in full
  2. Read Cunningham paper cited in Anthropic
  3. Read “Finding Neurons in a Haystack” by Wes Gurnee

Think about differences between residual stream and MLP “mathematically speaking”

Think about what synthetic dataset would be good to construct (chess!)

Notes with Logan, Naomi, Laker (10/20/2023)

Directions Logan is interested in:

Advice/Resources:

Next Steps:

  1. Two most promising directions are PPO with specific trainable parameters and chess!
  2. Practice loading Pythia 70M, Pythia 70M fine tuned on chess, running forward pass.
  3. Practice loading Logan’s sparse autoencoder for Pythia 70M, run, look at features.