Showcase of EMNLP Sycophancy Papers

The EMNLP 2025 papers are out! Congrats to everyone who had their papers accepted.

As an accepted author myself, I decided to explore other papers that examined my paper’s topic: LLM Sycophancy. Below are a few bullet points for each paper.

Self-Augmented Preference Alignment for Sycophancy Reduction in LLMs

Created SAA dataset, which contains queries with user pressumption, correct answer, and incorrect alternatives.
- The pressumptions are sometimes correct and sometimes incorrect.
Trained LLM with SAA using Supervised Fine Tuning and Proximal Policy Optimization. Results show that in both scenarios, the rate of sycophancy decreased.
I wonder if model trained with SAA works with out of domain pressumptions. the examples in the dataset are simple: they are single turn and the prompts pertain to simple questions.

Advancing Oversight Reasoning across Languages for Audit Sycophantic Behaviour via X-Agent

Created a framework called X-Agent, an agent that checks for sycophancy in LLM responses.
Their X-Agent successfully detects sycophantic responses,
- Defined sycophancy as when LLM accepts incorrect user presupposition.
- Overall accuracy with X-Agent increased in most LLMs.
The models fine tuned on the X-Agent rollout also improved performance for both accuracy and sycophancy

Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories

Created a SMART (Sycophancy Mitigate Through Adaptive Reasoning Trajectories) Framework
- Smart uses UA-MCTS(Uncertainty-Aware Adaptive Monte Carlo Tree Search) to generate high quality reaoning trajectory
- In each reasoning step, future rollouts are created and scored based on the improvement of answer correctness probability compared to the last step.
The rollouts with scores are then used to train Large Language Models.
- The implementation outperforms other approaches.
- Dense Reinforcement Learning on the rollouts outperforms SFT.

Pointing to a Llama and Call it a Camel On the Sycophancy of Multimodal Large Language Models

Multimodal Large Language Models are more susceptible to Sycophancy compared to Large Language Models.
Created a pipeline called SRT (Sycophancy Reflective Tuning).
- This pipeline consists of image textualization -> reflection -> Summarization.
Fine tuning on the SRT rollout resulted in better overall score compared to original prompting.
- Correction Rate (original answer incorrect, user suggestion correct) was reduced slightly, but Sycophancy Rate (original answer correct, user suggestion incorrect) increased considerably.
- Removing reasoning in SRT results in low Correction Rate.

Measuring Sycophancy of Language Models in Multi-turn Dialogues

Authors proposed a benchmark called SYCON-BENCH, a benchmark for evaluating sycophantic behavior in multi-turn, free-form conversational settings
Alignment Tuning increases sycophancy
Model scaling reduces sycophancy
Reasoning Models are More Robust
Prompting in third person, creating a Non-sycophantic prompt can mitigate sycophancy
Models can identify sycophancy when asked to, but are often sycophantic. This shows that sycophancy is not driven by ignorance.

Challenging the Evaluator: LLM Sycophancy Under User Rebuttal

Authors explored the discrepancy: LLMs are good as evaluative agents, but are sycophantic in follow-up questions. Since two actions are similar: choosing the better of the two different argument, what prompts an LLM to have this difference
Found that conversational framework makes LLM more sycophantic
More reasoning results in better persuasion, even if the reasoning is incorrect
LLMs are more likely to be persuaded by casual prompts than by reasoned prompts.
While casual prompt results in the best persuasion rate, the best correction rate is given when LLM is prompted as a judge.

Echoes of Agreement: Argument Driven Sycophancy in Large Language Models

LLMs are likely to echo user’s presented stance
LLMs tend to agree more when supporting arguments were provided and disagree more when refuting arguments were provided.
LLMs are likely to change their stance both in single and multi turn settings.
LLMs are more likely to change their stance for arguments with “better” strength.