Further Look into Cancer-Myth. Does the LLM Ignore False Presupposition Due to Lack of Knowledge or is it Sycophantic?

Premise

Cancer-Myth: Evaluating Large Language Models on Patient Questions with False Presuppositions showed that LLMs, when prompted with questions which have false medical presuppositions, does not readily detect these presuppositions. There are there are two possible points of failure.

LLM does not know about the false medical presupposition.
LLM blindly follows user input, sign of sycophancy.

Problem (1) would indicate that LLM’s lack medical knowledge, while problem (2) would suggest sycophancy.

Dataset

The Cancer-Myth paper includes a public dataset. Sampled 20 first questions for this experiment. Each items in the dataset includes question, incorrect presuppostions, and other informations.

Experiment

Implemented a basic presupposition checker. Made LLMs

Deconstruct user query into presumptions
Check the factuality of each presumptions
Condense the report to ones that are shown false
Compare with the ground truth information

Tested with Anthropic’s Claude 4 Sonnet, and 3.5 Haiku models.

Results

On the first 20 questions, Both LLMs almost always found the correct response. Using the scoring rubric of [-1, 0, 1] (Provided in the paper), we have

Model	+1	0	-1
Claude 3.5 Haiku	20	0	0
Claude 4	17	2	1

Limitations

Tested on only 20 questions.
Didn’t compare the result to simple LLM QA (Might be the case that the chosen 20 questions were easy)
Deconstructing and validating might be too over the board.
- Simple QA of prompting LLM to detect incorrect presummptions might have been good enough.
evaluator, deconstructor, and presumption checker all used same model. Would have been better if different LLM were used for evaluator.
Only tested on Anthropic’s Claude models. Would be interesting to see how other models behave.

github link

Premise

Dataset

Experiment

Results

Limitations

Enjoy Reading This Article?