publications

2026

  1. Lancet Digital Health
    CARDBiomedBench: A Benchmark for Evaluating Large Language Model Performance in Biomedical Research
    Owen Bianchi, Maya Willey, Chelsea X. Alvarado, and 21 more authors
    Lancet Digital Health, Jan 2026

2025

  1. EMNLP
    Challenging the Evaluator: LLM Sycophancy Under User Rebuttal
    Sungwon Kim and Daniel Khashabi
    In Findings of the Association for Computational Linguistics: EMNLP 2025, Nov 2025
  2. arXiv
    BiomedSQL: Text-to-SQL for Scientific Reasoning on Biomedical Knowledge Bases
    Mathew J. Koretsky, Maya Willey, Adi Asija, and 8 more authors
    arXiv preprint arXiv:2505.20321, May 2025