Showcase of EMNLP Sycophancy Papers
The EMNLP 2025 papers are out! Congrats to everyone who had their papers accepted.
As an accepted author myself, I decided to explore other papers that examined my paper’s topic: LLM Sycophancy. Below are a few bullet points for each paper.
Self-Augmented Preference Alignment for Sycophancy Reduction in LLMs
- Created SAA dataset, which contains queries with user pressumption, correct answer, and incorrect alternatives.
- The pressumptions are sometimes correct and sometimes incorrect.
- Trained LLM with SAA using Supervised Fine Tuning and Proximal Policy Optimization. Results show that in both scenarios, the rate of sycophancy decreased.
- I wonder if model trained with SAA works with out of domain pressumptions. the examples in the dataset are simple: they are single turn and the prompts pertain to simple questions.
Advancing Oversight Reasoning across Languages for Audit Sycophantic Behaviour via X-Agent
- Created a framework called X-Agent, an agent that checks for sycophancy in LLM responses.
- Their X-Agent successfully detects sycophantic responses,
- Defined sycophancy as when LLM accepts incorrect user presupposition.
- Overall accuracy with X-Agent increased in most LLMs.
- The models fine tuned on the X-Agent rollout also improved performance for both accuracy and sycophancy
Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories
- Created a SMART (Sycophancy Mitigate Through Adaptive Reasoning Trajectories) Framework
- Smart uses UA-MCTS(Uncertainty-Aware Adaptive Monte Carlo Tree Search) to generate high quality reaoning trajectory
- In each reasoning step, future rollouts are created and scored based on the improvement of answer correctness probability compared to the last step.
- The rollouts with scores are then used to train Large Language Models.
- The implementation outperforms other approaches.
- Dense Reinforcement Learning on the rollouts outperforms SFT.
Pointing to a Llama and Call it a Camel On the Sycophancy of Multimodal Large Language Models
- Multimodal Large Language Models are more susceptible to Sycophancy compared to Large Language Models.
- Created a pipeline called SRT (Sycophancy Reflective Tuning).
- This pipeline consists of image textualization -> reflection -> Summarization.
- Fine tuning on the SRT rollout resulted in better overall score compared to original prompting.
- Correction Rate (original answer incorrect, user suggestion correct) was reduced slightly, but Sycophancy Rate (original answer correct, user suggestion incorrect) increased considerably.
- Removing reasoning in SRT results in low Correction Rate.
Measuring Sycophancy of Language Models in Multi-turn Dialogues
- Authors proposed a benchmark called SYCON-BENCH, a benchmark for evaluating sycophantic behavior in multi-turn, free-form conversational settings
- Alignment Tuning increases sycophancy
- Model scaling reduces sycophancy
- Reasoning Models are More Robust
- Prompting in third person, creating a Non-sycophantic prompt can mitigate sycophancy
- Models can identify sycophancy when asked to, but are often sycophantic. This shows that sycophancy is not driven by ignorance.
Challenging the Evaluator: LLM Sycophancy Under User Rebuttal
- Authors explored the discrepancy: LLMs are good as evaluative agents, but are sycophantic in follow-up questions. Since two actions are similar: choosing the better of the two different argument, what prompts an LLM to have this difference
- Found that conversational framework makes LLM more sycophantic
- More reasoning results in better persuasion, even if the reasoning is incorrect
- LLMs are more likely to be persuaded by casual prompts than by reasoned prompts.
- While casual prompt results in the best persuasion rate, the best correction rate is given when LLM is prompted as a judge.
Echoes of Agreement: Argument Driven Sycophancy in Large Language Models
- LLMs are likely to echo user’s presented stance
- LLMs tend to agree more when supporting arguments were provided and disagree more when refuting arguments were provided.
- LLMs are likely to change their stance both in single and multi turn settings.
- LLMs are more likely to change their stance for arguments with “better” strength.
Enjoy Reading This Article?
Here are some more articles you might like to read next: