Roadmap for finishing the project:
- Train an interpretable sparse autoencoder
- Learn how to use RLHF (notebook, can do in parallel)
- RLHF the sparse autoencoder, and interpret the results
- Write up a blog (can do in parallel)
Timeline:
- Saturday: inspect, tweak, and become happy with our sparse autoencoders.