Showcase of EMNLP Sycophancy Papers

The EMNLP 2025 papers are out! Congrats to everyone who had their papers accepted.

As an accepted author myself, I decided to explore other papers that examined my paper’s topic: LLM Sycophancy. Below are a few bullet points for each paper.

Self-Augmented Preference Alignment for Sycophancy Reduction in LLMs

  • Created SAA dataset, which contains queries with user pressumption, correct answer, and incorrect alternatives.
    • The pressumptions are sometimes correct and sometimes incorrect.
  • Trained LLM with SAA using Supervised Fine Tuning and Proximal Policy Optimization. Results show that in both scenarios, the rate of sycophancy decreased.
  • I wonder if model trained with SAA works with out of domain pressumptions. the examples in the dataset are simple: they are single turn and the prompts pertain to simple questions.

Advancing Oversight Reasoning across Languages for Audit Sycophantic Behaviour via X-Agent

  • Created a framework called X-Agent, an agent that checks for sycophancy in LLM responses.
  • Their X-Agent successfully detects sycophantic responses,
    • Defined sycophancy as when LLM accepts incorrect user presupposition.
    • Overall accuracy with X-Agent increased in most LLMs.
  • The models fine tuned on the X-Agent rollout also improved performance for both accuracy and sycophancy

Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories

  • Created a SMART (Sycophancy Mitigate Through Adaptive Reasoning Trajectories) Framework
    • Smart uses UA-MCTS(Uncertainty-Aware Adaptive Monte Carlo Tree Search) to generate high quality reaoning trajectory
    • In each reasoning step, future rollouts are created and scored based on the improvement of answer correctness probability compared to the last step.
  • The rollouts with scores are then used to train Large Language Models.
    • The implementation outperforms other approaches.
    • Dense Reinforcement Learning on the rollouts outperforms SFT.

Pointing to a Llama and Call it a Camel On the Sycophancy of Multimodal Large Language Models

  • Multimodal Large Language Models are more susceptible to Sycophancy compared to Large Language Models.
  • Created a pipeline called SRT (Sycophancy Reflective Tuning).
    • This pipeline consists of image textualization -> reflection -> Summarization.
  • Fine tuning on the SRT rollout resulted in better overall score compared to original prompting.
    • Correction Rate (original answer incorrect, user suggestion correct) was reduced slightly, but Sycophancy Rate (original answer correct, user suggestion incorrect) increased considerably.
    • Removing reasoning in SRT results in low Correction Rate.

Measuring Sycophancy of Language Models in Multi-turn Dialogues

  • Authors proposed a benchmark called SYCON-BENCH, a benchmark for evaluating sycophantic behavior in multi-turn, free-form conversational settings
  • Alignment Tuning increases sycophancy
  • Model scaling reduces sycophancy
  • Reasoning Models are More Robust
  • Prompting in third person, creating a Non-sycophantic prompt can mitigate sycophancy
  • Models can identify sycophancy when asked to, but are often sycophantic. This shows that sycophancy is not driven by ignorance.

Challenging the Evaluator: LLM Sycophancy Under User Rebuttal

  • Authors explored the discrepancy: LLMs are good as evaluative agents, but are sycophantic in follow-up questions. Since two actions are similar: choosing the better of the two different argument, what prompts an LLM to have this difference
  • Found that conversational framework makes LLM more sycophantic
  • More reasoning results in better persuasion, even if the reasoning is incorrect
  • LLMs are more likely to be persuaded by casual prompts than by reasoned prompts.
  • While casual prompt results in the best persuasion rate, the best correction rate is given when LLM is prompted as a judge.

Echoes of Agreement: Argument Driven Sycophancy in Large Language Models

  • LLMs are likely to echo user’s presented stance
  • LLMs tend to agree more when supporting arguments were provided and disagree more when refuting arguments were provided.
  • LLMs are likely to change their stance both in single and multi turn settings.
  • LLMs are more likely to change their stance for arguments with “better” strength.



Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Old NLP Paper Reviews
  • Torch MPS Basic Speedup Test
  • Random Cosine Similarity Distribution
  • Further Look into Cancer-Myth. Does the LLM Ignore False Presupposition Due to Lack of Knowledge or is it Sycophantic?
  • Simple Attention Visualizer