EleutherAI1 年前稀疏自编码器特征的开源自动可解释性Open Source Automated Interpretability for Sparse Autoencoder Features构建并评估用于自动可解释性的开源流水线Building and evaluating an open-source pipeline for auto-interpretability