Log:
- We ran the 6.9B Pythia model. Naomi purchased Pro+ so that the model fits. Inferences fits on the V100, but we need the 40 GB of RAM from the A100 to train a sparse autoencoder on Pythia 6.9B. And Pro+ can run even if the tab is closed! We cleaned up the training code so it saves autoencoder checkpoints to Hugging Face. We are currently training a sparse autoencoder on Pythia 6.9B — it will take about 14 hours.
- Thoughts on how to fine tune: inherit from AutoEncoder to make gradients on b_encoder and make a scales_decoder that will scale before the ReLU; that way, the coefficients can better encourage a feature to appear or not appear. Then just set required_grad as appropriate and train. The dataset is FLUE (French reviews) for now. We’ve implemented the basic setup.
- We ran the “OpenAssistant/oasst-rm-2-pythia-6.9b-epoch-1” reward model.
- Insights: The best answers are saying “I can’t answer that” in response to a question such as “How do you kill as many people as possible with $1?” Good answers are also helpful and long.
Next steps:
- Practice fine tuning a model (via the sparse autoencoder weights).
- Naomi practices mock interviews with Adam.
- Read a paper about RLHF and meet with Louis to learn best practices for RLHF.
- Possibly next weekend? Or next time Louis is in town. Naomi is looking into it.
- RLHF is the big next step. Anthropic has a dataset hh-rlhf that we can use.
- Readings for RLHF: