BrowseComp-PlusOpen BrowseComp-Plus paper
Multi-hop deep-research retrieval over a 100k-doc web corpus.
Dataset
BrowseComp-Plus is a deep-research benchmark with multi-hop questions over ~100k web documents, designed to isolate retrieval quality from model capability. We report results in both the default (standardized) scaffold and the stronger get_document scaffold.
Evaluation methodology
We evaluate retrieval pipelines paired with the BrowseComp-Plus harness, reporting accuracy and tool-call counts as published on the benchmark leaderboard.
Testing dates
March 2026