Metadata-Version: 2.3
Name: sae-lens
Version: 5.5.1
Summary: Training and Analyzing Sparse Autoencoders (SAEs)
License: MIT
Keywords: deep-learning,sparse-autoencoders,mechanistic-interpretability,PyTorch
Author: Joseph Bloom
Requires-Python: >=3.10,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Provides-Extra: mamba
Requires-Dist: automated-interpretability (>=0.0.5,<1.0.0)
Requires-Dist: babe (>=0.0.7,<0.0.8)
Requires-Dist: datasets (>=2.17.1,<3.0.0)
Requires-Dist: mamba-lens (>=0.0.4,<0.0.5) ; extra == "mamba"
Requires-Dist: matplotlib (>=3.8.3,<4.0.0)
Requires-Dist: matplotlib-inline (>=0.1.6,<0.2.0)
Requires-Dist: nltk (>=3.8.1,<4.0.0)
Requires-Dist: plotly (>=5.19.0,<6.0.0)
Requires-Dist: plotly-express (>=0.4.1,<0.5.0)
Requires-Dist: pytest-profiling (>=1.7.0,<2.0.0)
Requires-Dist: python-dotenv (>=1.0.1,<2.0.0)
Requires-Dist: pyyaml (>=6.0.1,<7.0.0)
Requires-Dist: pyzmq (==26.0.0)
Requires-Dist: safetensors (>=0.4.2,<0.5.0)
Requires-Dist: simple-parsing (>=0.1.6,<0.2.0)
Requires-Dist: transformer-lens (>=2.0.0,<3.0.0)
Requires-Dist: transformers (>=4.38.1,<5.0.0)
Requires-Dist: typer (>=0.12.3,<0.13.0)
Requires-Dist: typing-extensions (>=4.10.0,<5.0.0)
Requires-Dist: zstandard (>=0.22.0,<0.23.0)
Project-URL: Homepage, https://jbloomaus.github.io/SAELens
Project-URL: Repository, https://github.com/jbloomAus/SAELens
Description-Content-Type: text/markdown

<img width="1308" alt="Screenshot 2024-03-21 at 3 08 28 pm" src="https://github.com/jbloomAus/mats_sae_training/assets/69127271/209012ec-a779-4036-b4be-7b7739ea87f6">

# SAE Lens 
[![PyPI](https://img.shields.io/pypi/v/sae-lens?color=blue)](https://pypi.org/project/sae-lens/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![build](https://github.com/jbloomAus/SAELens/actions/workflows/build.yml/badge.svg)](https://github.com/jbloomAus/SAELens/actions/workflows/build.yml)
[![Deploy Docs](https://github.com/jbloomAus/SAELens/actions/workflows/deploy_docs.yml/badge.svg)](https://github.com/jbloomAus/SAELens/actions/workflows/deploy_docs.yml)
[![codecov](https://codecov.io/gh/jbloomAus/SAELens/graph/badge.svg?token=N83NGH8CGE)](https://codecov.io/gh/jbloomAus/SAELens)

SAELens exists to help researchers:
- Train sparse autoencoders.
- Analyse sparse autoencoders / research mechanistic interpretability. 
- Generate insights which make it easier to create safe and aligned AI systems.

Please refer to the [documentation](https://jbloomaus.github.io/SAELens/) for information on how to:
- Download and Analyse pre-trained sparse autoencoders. 
- Train your own sparse autoencoders.
- Generate feature dashboards with the [SAE-Vis Library](https://github.com/callummcdougall/sae_vis/tree/main).

SAE Lens is the result of many contributors working collectively to improve humanity's understanding of neural networks, many of whom are motivated by a desire to [safeguard humanity from risks posed by artificial intelligence](https://80000hours.org/problem-profiles/artificial-intelligence/).

This library is maintained by [Joseph Bloom](https://www.jbloomaus.com/) and [David Chanin](https://github.com/chanind).

## Loading Pre-trained SAEs. 

Pre-trained SAEs for various models can be imported via SAE Lens. See this [page](https://jbloomaus.github.io/SAELens/sae_table/) in the readme for a list of all SAEs.
## Tutorials

- [SAE Lens + Neuronpedia](tutorials/tutorial_2_0.ipynb)[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/jbloomAus/SAELens/blob/main/tutorials/tutorial_2_0.ipynb)
- [Loading and Analysing Pre-Trained Sparse Autoencoders](tutorials/basic_loading_and_analysing.ipynb)
 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/jbloomAus/SAELens/blob/main/tutorials/basic_loading_and_analysing.ipynb)
 - [Understanding SAE Features with the Logit Lens](tutorials/logits_lens_with_features.ipynb)
 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/jbloomAus/SAELens/blob/main/tutorials/logits_lens_with_features.ipynb)
  - [Training a Sparse Autoencoder](tutorials/training_a_sparse_autoencoder.ipynb)
 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/jbloomAus/SAELens/blob/main/tutorials/training_a_sparse_autoencoder.ipynb)


## Join the Slack!

Feel free to join the [Open Source Mechanistic Interpretability Slack](https://join.slack.com/t/opensourcemechanistic/shared_invite/zt-2k0id7mv8-CsIgPLmmHd03RPJmLUcapw) for support!


## Citation

Please cite the package as follows:

```
@misc{bloom2024saetrainingcodebase,
   title = {SAELens},
   author = {Joseph Bloom, Curt Tigges and David Chanin},
   year = {2024},
   howpublished = {\url{https://github.com/jbloomAus/SAELens}},
}
```


