Laker worked on the blog, while Naomi looked at PPO (https://huggingface.co/docs/trl/main/en/ppo_trainer) and got the blog working for her. Forked the repository so we can push changes there.
TODOs:
- Work on the blog (literature review, explaining SAEs, Laker)
- Look at features in SAE 5
- Surgically insert SAE 5 into Pythia-6.9B, with HookedTransformer (Naomi)
- Do PPO on overall new model (Naomi)
- Do PPO on only SAE in new model? (Laker + Naomi, when the time comes)
- First step, if helpful: try to do PPO on a 2-parameter model but only allow 1 parameter to change, or something