Explainability for Authorship Attribution Analysis
Developing approaches and methods to explain the latent representations learned in authorship attribution models
The HIATUS program aims to develop robust, multilingual systems that can identify stylistic signatures of authors while providing human-interpretable explanations of model decisions and preserving author anonymity where needed — addressing challenges such as combating information manipulation and protecting vulnerable writers through novel explainable AI and linguistic representation techniques.
My work within HIATUS spans several core advances in interpretable stylometric modeling and explainable representation learning.In the Latent Space Interpretation for Stylistic Analysis and Explainable Authorship Attribution, we demonstrated how latent representations can be mapped to descriptive stylistic elements that meaningfully aid both model performance and human understanding. Later, in IXAM: Interactive Explainability for Authorship Attribution Models, we introduced an interactive framework that lets users explore model embedding spaces and derive interpretable style features for AA tasks. You can test our demo here. Our work on the Layered Insights: Generalizable Analysis of Authorial Style by Leveraging All Transformer Layers showed how leveraging hierarchical transformer features yields more robust and generalizable authorship signals across domains. Finally, with iBERT: Interpretable Style Embeddings via Sense Decomposition, we developed an encoder that produces inherently interpretable and controllable stylistic embeddings, advancing both representation transparency and style-focused performance. These contributions collectively help move explainable authorship attribution toward practical, auditable systems for real-world linguistic analysis.