BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval

Why a new benchmark?

Existing retrieval benchmarks primarily consist of information-seeking queries (e.g., aggregated questions from search engines) where keyword or semantic-based retrieval is usually sufficient. However, many real-world, complex queries necessitate in-depth reasoning to identify relevant documents that go beyond surface form matching. For example, finding documentation for a coding question requires understanding the logic and syntax of the functions involved. We introduce BRIGHT to better benchmark retrieval on such challenging and realistic scenarios.

BRIGHT

We introduce BRIGHT, the first text retrieval benchmark that requires intensive reasoning to retrieve relevant documents. We collect 1,385 real-world queries from diverse domains (StackExchange, LeetCode, and math competitions), sourced from naturally occurring or carefully curated human data. We pair these queries with web pages linked in StackExchange answers, tagged theorems in math Olympiad questions—all of which require deliberate reasoning to identify the connections.

Leaderboard submission

If you would like to submit your results to the leaderboard, email the results to suhongjin96@gamil.com! Optionally, you are encouraged to provide the link to the open-sourced codebase. Otherwise, you may provide a short description on the used models and approaches (e.g., size of retrieval model, whether LLMs like GPT-4 or reranking are used, etc.)!

Have Questions?

Ask us questions at our Github issues page or contact Hongjin Su, Howard Yen, or Mengzhou Xia.

We report the average nDCG@10 score across 12 datasets in BRIGHT. Apart from using the original query, retrievers can replace it with the LLM reasoning steps to retrieve relevant documents.

Rank	Retriever	Score
1 June 2, 2025	XRR2 Jataware Corp	40.3
2 June 5, 2025	RaDeR with Qwen reranking CIIR, University of Massachusetts Amherst	39.2
3 May 29, 2025	ReasonIR with Rank-R1 CSIRO, University of Waterloo, The University of Queensland	38.8
4 June 4, 2025	ReasonIR with TongSearch Reasoner 7B Reasoning and reranker Beijing Institute for General Artificial Intelligence (BIGAI)	38.3
5 May 14, 2025	BM25 with GPT4 reasoning and Qwen QWQ reranking Zach Nussbaum (NomicAI)	37.8
6 Apr 29, 2025	ReasonIR with reranker Meta, University of Washington, etc.	36.9
7 Oct 22, 2024	JudgeRank ICLR submission	35.8
8 Aug 28, 2024	BM25, with GPT-4 reasoning and top-100 reranking by Llama-3.1-70B Salesforce Research (proprietary code)	30.4
9 May 10, 2025	BM25 with TongSearch Reasoner 7B Reasoning Beijing Institute for General Artificial Intelligence (BIGAI)	27.9
10 Apr 30, 2025	Qwen1.5-7B with InteRank-3B re-ranking University of Massachusetts Amherst	27.4
11 May 10, 2025	Seed1.5-Embedding ByteDance	27.2
12 July 11, 2024	BM25, with gpt-4-0125-preview reasoning Microsoft	27.0
13 July 11, 2024	instructor-xl, with gpt-4-0125-preview reasoning The University of Hong Kong, University of Washington	26.9
14 July 11, 2024	BM25, with Claude-3-Opus reasoning Microsoft	26.8
15 July 11, 2024	instructor-xl, with Claude-3-Opus reasoning The University of Hong Kong, University of Washington	26.4
16 July 11, 2024	instructor-xl, with Llama-3-70B-Instruct reasoning The University of Hong Kong, University of Washington	26.3
17 July 11, 2024	google-gecko.text-embedding-preview-0409, dim=768, with gpt-4-0125-preview reasoning Google	26.2
18 July 11, 2024	BM25, with Llama-3-70B-Instruct reasoning Microsoft	25.7
19 July 11, 2024	google-gecko.text-embedding-preview-0409, dim=768, with Claude-3-Opus reasoning Google	25.6
20 July 11, 2024	google-gecko.text-embedding-preview-0409, dim=768, with Llama-3-70B-Instruct reasoning Google	24.9
21 July 11, 2024	gte-Qwen1.5-7B-instruct, with gpt-4-0125-preview reasoning Alibaba	24.8
22 July 11, 2024	gte-Qwen1.5-7B-instruct, with Claude-3-Opus reasoning Alibaba	24.8
23 July 11, 2024	voyage-large-2-instruct, with gpt-4-0125-preview reasoning Voyage AI	24.7
24 May 10, 2025	BM25 with TongSearch Reasoner 1.5B Reasoning Beijing Institute for General Artificial Intelligence (BIGAI)	24.6
25 July 11, 2024	GritLM-7B, with gpt-4-0125-preview reasoning ContextualAI, The University of Hong Kong, Microsoft	24.5
26 July 11, 2024	instructor-xl, with Gemini-1.0-pro reasoning The University of Hong Kong, University of Washington	24.5
27 July 11, 2024	BM25, with Gemini-1.0-pro reasoning Microsoft	23.9
28 July 11, 2024	instructor-large, with gpt-4-0125-preview reasoning The University of Hong Kong, University of Washington	23.5
29 July 11, 2024	gte-Qwen1.5-7B-instruct, with Llama-3-70B-Instruct reasoning Alibaba	23.4
30 July 11, 2024	GritLM-7B, with Claude-3-Opus reasoning ContextualAI, The University of Hong Kong, Microsoft	23.4
31 July 11, 2024	text-embedding-3-large, with gpt-4-0125-preview reasoning OpenAI	23.3
32 July 11, 2024	voyage-large-2-instruct, with Claude-3-Opus reasoning Voyage AI	23.1
33 July 11, 2024	google-gecko.text-embedding-preview-0409, dim=768, with Gemini-1.0-pro reasoning Google	23.0
34 July 11, 2024	voyage-large-2-instruct, with Llama-3-70B-Instruct reasoning Voyage AI	22.9
35 July 11, 2024	instructor-large, with Llama-3-70B-Instruct reasoning The University of Hong Kong, University of Washington	22.7
36 July 11, 2024	text-embedding-3-large, with Claude-3-Opus reasoning OpenAI	22.7
37 July 11, 2024	Cohere-embed-english-v3.0, with gpt-4-0125-preview reasoning Cohere	22.6
38 July 11, 2024	gte-Qwen1.5-7B-instruct, with Gemini-1.0-pro reasoning Alibaba	22.6
39 July 11, 2024	google-gecko.text-embedding-preview-0409, dim=768, top-100 reranking by gpt-4-0125-preview Google	22.6
40 July 11, 2024	gte-Qwen1.5-7B-instruct Alibaba	22.5
41 July 11, 2024	instructor-xl, with GritLM-7B reasoning The University of Hong Kong, University of Washington	22.4
42 July 11, 2024	voyage-large-2-instruct, with Gemini-1.0-pro reasoning Voyage AI	22.4
43 July 11, 2024	Cohere-embed-english-v3.0, with Llama-3-70B-Instruct reasoning Cohere	22.2
44 July 11, 2024	e5-mistral-7b-instruct, with gpt-4-0125-preview reasoning Microsoft	22.1
45 July 11, 2024	text-embedding-3-large, with Llama-3-70B-Instruct reasoning OpenAI	22.1
46 July 11, 2024	instructor-large, with Claude-3-Opus reasoning The University of Hong Kong, University of Washington	22.1
47 July 11, 2024	bge-large-en-v1.5, with gpt-4-0125-preview reasoning Beijing Academy of Artificial Intelligence	22.0
48 July 11, 2024	SFR-Embedding-Mistral, with gpt-4-0125-preview reasoning Salesforce	22.0
49 July 11, 2024	Cohere-embed-english-v3.0, with Claude-3-Opus reasoning Cohere	21.9
50 July 11, 2024	SFR-Embedding-Mistral, with Claude-3-Opus reasoning Salesforce	21.7
51 July 11, 2024	text-embedding-3-large, with Gemini-1.0-pro reasoning OpenAI	21.5
52 July 11, 2024	google-gecko.text-embedding-preview-0409, dim=768, top-10 reranking by gpt-4-0125-preview Google	21.5
53 July 11, 2024	e5-mistral-7b-instruct, with Claude-3-Opus reasoning Microsoft	21.4
54 July 11, 2024	bge-large-en-v1.5, with Claude-3-Opus reasoning Beijing Academy of Artificial Intelligence	21.1
55 July 11, 2024	GritLM-7B ContextualAI, The University of Hong Kong, Microsoft	21.0
56 July 11, 2024	GritLM-7B, with Llama-3-70B-Instruct reasoning ContextualAI, The University of Hong Kong, Microsoft	20.9
57 July 11, 2024	instructor-large, with Gemini-1.0-pro reasoning The University of Hong Kong, University of Washington	20.8
58 July 11, 2024	bge-large-en-v1.5, with Llama-3-70B-Instruct reasoning Beijing Academy of Artificial Intelligence	20.7
59 July 11, 2024	GritLM-7B, with Gemini-1.0-pro reasoning ContextualAI, The University of Hong Kong, Microsoft	20.7
60 July 11, 2024	SFR-Embedding-Mistral, with Gemini-1.0-pro reasoning Salesforce	20.1
61 July 11, 2024	google-gecko.text-embedding-preview-0409, dim=768, top-10 reranking by Gemini-1.0-pro Google	20.1
62 July 11, 2024	google-gecko.text-embedding-preview-0409, dim=768 Google	20.0
63 July 11, 2024	SFR-Embedding-Mistral, with Llama-3-70B-Instruct reasoning Salesforce	20.0
64 July 11, 2024	e5-mistral-7b-instruct, with Llama-3-70B-Instruct reasoning Microsoft	19.9
65 July 11, 2024	gte-Qwen1.5-7B-instruct, with GritLM-7B reasoning Alibaba	19.9
66 July 11, 2024	Cohere-embed-english-v3.0, with Gemini-1.0-pro reasoning Cohere	19.8
67 July 11, 2024	google-gecko.text-embedding-preview-0409, dim=768, with GritLM-7B reasoning Google	19.6
68 July 11, 2024	e5-mistral-7b-instruct, with Gemini-1.0-pro reasoning Microsoft	19.6
69 July 11, 2024	BM25, with GritLM-7B reasoning Microsoft	19.4
70 July 11, 2024	instructor-xl The University of Hong Kong, University of Washington	18.9
71 July 11, 2024	voyage-large-2-instruct, with GritLM-7B reasoning Voyage AI	18.7
72 July 11, 2024	bge-large-en-v1.5, with Gemini-1.0-pro reasoning Beijing Academy of Artificial Intelligence	18.7
73 July 11, 2024	SFR-Embedding-Mistral Salesforce	18.3
74 July 11, 2024	GritLM-7B, with GritLM-7B reasoning ContextualAI, The University of Hong Kong, Microsoft	18.3
75 July 11, 2024	text-embedding-3-large, with GritLM-7B reasoning OpenAI	18.0
76 July 11, 2024	e5-mistral-7b-instruct Microsoft	17.9
77 July 11, 2024	text-embedding-3-large OpenAI	17.9
78 July 11, 2024	voyage-large-2-instruct Voyage AI	17.9
79 July 11, 2024	sentence-transformers, with gpt-4-0125-preview reasoning Technische Universität Darmstadt	17.7
80 July 11, 2024	e5-mistral-7b-instruct, with GritLM-7B reasoning Microsoft	17.6
81 July 11, 2024	SFR-Embedding-Mistral, with GritLM-7B reasoning Salesforce	17.4
82 July 11, 2024	BM25, top-10 reranking by gpt-4-0125-preview Microsoft	17.4
83 July 11, 2024	BM25, top-100 reranking by gpt-4-0125-preview Microsoft	17.0
84 July 11, 2024	Cohere-embed-english-v3.0 Cohere	16.6
85 July 11, 2024	sentence-transformers, with Claude-3-Opus reasoning Technische Universität Darmstadt	16.4
86 July 11, 2024	Cohere-embed-english-v3.0, with GritLM-7B reasoning Cohere	16.4
87 July 11, 2024	sentence-transformers, with Llama-3-70B-Instruct reasoning Technische Universität Darmstadt	16.3
88 July 11, 2024	bge-large-en-v1.5, with GritLM-7B reasoning Beijing Academy of Artificial Intelligence	16.0
89 July 11, 2024	google-gecko.text-embedding-preview-0409, dim=768, top-10 reranking by MiniLM Google	16.0
90 July 11, 2024	instructor-large, with GritLM-7B reasoning The University of Hong Kong, University of Washington	15.8
91 July 11, 2024	BM25, top-10 reranking by Gemini-1.0-pro Microsoft	15.7
92 July 11, 2024	sentence-transformers, with Gemini-1.0-pro reasoning Technische Universität Darmstadt	15.5
93 July 11, 2024	sentence-transformers Technische Universität Darmstadt	14.9
94 July 11, 2024	BM25 Microsoft	14.5
95 July 11, 2024	instructor-large The University of Hong Kong, University of Washington	14.2
96 July 11, 2024	sentence-transformers, with GritLM-7B reasoning Technische Universität Darmstadt	13.9
97 July 11, 2024	bge-large-en-v1.5 Beijing Academy of Artificial Intelligence	13.7
98 July 11, 2024	BM25, top-10 reranking by MiniLM Microsoft	13.1
99 July 11, 2024	google-gecko.text-embedding-preview-0409, dim=768, top-100 reranking by MiniLM Google	9.2
100 July 11, 2024	BM25, top-100 reranking by MiniLM Microsoft	8.3

Rank	Retriever	Score
1 Dec 10, 2024	Google-Gecko-Text_Embedding-004 Google Cloud AI (with Max-Mean Aggregation)	33.2
2 July 11, 2024	gte-Qwen1.5-7B-instruct Alibaba	27.8
3 July 11, 2024	SFR-Embedding-Mistral Salesforce	26.0
4 July 11, 2024	GritLM-7B ContextualAI, The University of Hong Kong, Microsoft	26.0
5 July 11, 2024	e5-mistral-7b-instruct Microsoft	25.5
6 July 11, 2024	voyage-large-2-instruct Voyage AI	24.6
7 July 11, 2024	google-gecko.text-embedding-preview-0409, dim=768 Google	22.4
8 July 11, 2024	text-embedding-3-large OpenAI	21.9
9 July 11, 2024	Cohere-embed-english-v3.0 Cohere	18.4
10 July 11, 2024	instructor-large The University of Hong Kong, University of Washington	18.2
11 July 11, 2024	instructor-xl The University of Hong Kong, University of Washington	17.8
12 July 11, 2024	sentence-transformers Technische Universität Darmstadt	17.4
13 July 11, 2024	bge-large-en-v1.5 Beijing Academy of Artificial Intelligence	14.8
14 July 11, 2024	BM25 Microsoft	11.4