Goal for today: interpret and become happy with our sparse autoencoder.

While it trains, we can learn how to use the blog format well.

Progress today:

  1. Naomi found PPO notebook that she is working on understanding and running. https://github.com/huggingface/trl/tree/a60ceefa694d62565789b03e8fa35244bc46c9ba
  2. We interpreted the top 2 most frequent features in 4_checkpoint_7.pt SAE
  3. Laker implemented dead neuron resampling and changed to a lower learning rate. We are training a new sparse autoencoder now!

Next steps:

  1. Naomi had trouble compiling the blog format and needs help from TAs (try on Tuesday).
  2. On Monday, Laker will begin writing our blog post. He will focus on literature review and maybe explore interactive plotly figures.
  3. For the rest of Sunday, Naomi will continue exploring PPO in the notebook she found.

We will meet on Tuesday at 11 AM.