← All Papers
Genre Mimicry vs. Ethical Reasoning in Abliterated Language Models
Abstract
When safety fine-tuning is removed from language models ('abliteration'), the resulting behavior reveals important distinctions between learned genre conventions and genuine ethical reasoning. This paper analyzes how abliterated models respond to adversarial prompts, demonstrating that much apparent 'alignment' reflects pattern matching rather than robust ethical judgment.
Suggested Citation
Murad Farzulla (2025). Genre Mimicry vs. Ethical Reasoning in Abliterated Language Models. ASCRI Discussion Paper DP-2503. DOI: 10.5281/zenodo.17957694
BibTeX
@misc{farzulla2025_genre_mimicry,
author = {Farzulla, Murad},
title = {Genre Mimicry vs. Ethical Reasoning in Abliterated Language Models},
year = {2025},
howpublished = {ASCRI Discussion Paper DP-2503},
doi = {10.5281/zenodo.17957694},
url = {https://systems.ac/5/DP-2503}
}