I try to apply mechanical interpretability to understand token entanglement in subliminal learning, fail, and come to suspect subliminal learning is not caused by token entanglement.
Was It Owl a Dream?
I try to apply mechanical interpretability to understand token entanglement in subliminal learning, fail, and come to suspect subliminal learning is not caused by token entanglement.