Sunday, 11/26/2023 | Notion

Roadmap for finishing the project:

Train an interpretable sparse autoencoder
Learn how to use RLHF (notebook, can do in parallel)
RLHF the sparse autoencoder, and interpret the results
Write up a blog (can do in parallel)

Timeline:

Saturday: inspect, tweak, and become happy with our sparse autoencoders.