Draft:LegalBench

Draft article not currently submitted for review.

This is a draft Articles for creation (AfC) submission. It is not currently pending review. While there are no deadlines, abandoned drafts may be deleted after six months. To edit the draft click on the "Edit" tab at the top of the window.

To be accepted, a draft should:

Show the subject qualifies for a Wikipedia article by using multiple sources that meet four criteria. The sources should be (1) reliable (2) secondary (3) independent of the subject (4) talk about the subject in some depth. For some topics, there are alternative criteria.
Be written from a neutral point of view
Respect copyright and do not plagiarize. Do not copy-paste.

It is strongly discouraged to write about yourself, your business or employer. If you do so, you must declare it.

Where to get help

If you need help editing or submitting your draft, please ask us a question at the AfC Help Desk or get live help from experienced editors. These venues are only for help with editing and the submission process, not to get reviews.
If you need feedback on your draft, or if the review is taking a lot of time, you can try asking for help on the talk page of a relevant WikiProject. Some WikiProjects are more active than others so a speedy reply is not guaranteed.

How to improve a draft

Wikipedia:Contributing to Wikipedia – a basic overview on how to edit Wikipedia.
Help:Wikitext – how to use the markup
Help:Referencing for beginners – how to include references
Wikipedia:Article development – how to develop your article
Wikipedia:Writing better articles – how to improve your article
Wikipedia:Verifiability – make sure your article includes reliable third-party sources

You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article.

Improving your odds of a speedy review

To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags.

Add tags to your draft

Editor resources

Easy tools: Citation bot (help) | Advanced: Fix bare URLs

Last edited by Headbomb (talk | contribs) 38 days ago. (Update)

Submit the draft for review!

Comment: In accordance with Wikipedia's Conflict of interest policy, I disclose that I have a conflict of interest regarding the subject of this article. Ghita Ha (talk) 02:26, 22 April 2025 (UTC)

LegalBench^[1] is an open-source benchmark designed to evaluate the legal reasoning capabilities of large language models (LLMs). Developed as a collaborative initiative in 2023, LegalBench includes over 160 legal tasks contributed by legal scholars, practitioners, and computational researchers. It serves both as a testbed for AI researchers and as a practical resource for legal professionals exploring the capabilities of language models in law-related applications.

Overview

LegalBench comprises a diverse set of tasks intended to assess how well LLMs can understand, reason through, and apply legal principles. Examples of tasks include:

Determining whether a passage constitutes hearsay
Identifying whether a statute includes a private right of action
Answering substantive questions about legal rules and cases

Each task is associated with a dataset of input-output examples and is suitable for evaluation through prompting, fine-tuning, or retrieval-based techniques. Tasks span a wide variety of legal domains, document types, and complexity levels.

Origins

LegalBench was created through a crowdsourced effort involving over 40 contributors, including law professors, practicing attorneys, legal technologists, and public interest legal organizations. Many tasks were newly created for the benchmark, while others were adapted from existing legal NLP datasets such as CUAD^[2], ContractNLI ^[3], MAUD ^[4], and CaseHold. Contributors were encouraged to submit tasks they deemed "interesting" (i.e., reflective of reasoning challenges) or "useful" (i.e., applicable to real-world legal work).

Applications

LegalBench is intended for two main audiences:

AI researchers seeking to test the capabilities of LLMs in domains requiring long-context reasoning, complex terminology, and minimal labeled data.
Legal professionals and organizations evaluating the utility of LLMs for tasks such as legal research, contract analysis, or regulatory compliance.

LegalBench-RAG

In 2024, a derivative benchmark called LegalBench-RAG ^[5] was introduced by ZeroEntropy. This extension adapts tasks from LegalBench for use in evaluating retrieval-augmented generation (RAG) systems—AI models that combine document retrieval with generation to improve factual accuracy.

LegalBench-RAG focuses on assessing retrieval quality in legal settings by providing precision and recall metrics for document-level retrieval over unstructured legal corpora. It is used to benchmark systems that rely on vector search, reranking, and prompt augmentation for generating legally accurate responses.

Related Benchmarks

LegalBench builds on or integrates tasks from several existing legal datasets, including:

CUAD – Contract Understanding Atticus Dataset
ContractNLI – Contractual natural language inference
MAUD – Merger Agreement Understanding Dataset
CaseHold – Case law entailment classification
CLAUDETTE – Unfair terms detection in consumer contracts
PolicyQA – Privacy policy question answering

References

^ Guha, Neel; Nyarko, Julian; Ho, Daniel E.; Ré, Christopher; Chilton, Adam; et al. (2023). "LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models". arXiv:2308.11462 [cs.CL].
^ Hendrycks, Dan; Burns, Collin; Chen, Anya; Ball, Spencer (2021). "CUAD: An expert-annotated NLP dataset for legal contract review". arXiv:2103.06268 [cs.CL].
^ Koreeda, Yuta; Manning, Christopher D. (2021). "ContractNLI: A dataset for document-level natural language inference for contracts". arXiv:2110.01799 [cs.CL].
^ Wang, Steven H.; Scardigli, Antoine; Tang, Leonard; Chen, Wei (2023). "MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding". arXiv:2301.00876 [cs.CL].
^ Pipitone, Nicholas; Alami, Ghita Houir (2024). "LegalBench-RAG: Evaluating Retrieval-Augmented Generation for Legal Reasoning". arXiv:2408.10343 [cs.AI].

[legalbench2023-1] Guha, Neel; Nyarko, Julian; Ho, Daniel E.; Ré, Christopher; Chilton, Adam; et al. (2023). "LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models". arXiv:2308.11462 [cs.CL].

[cuad2021-2] Hendrycks, Dan; Burns, Collin; Chen, Anya; Ball, Spencer (2021). "CUAD: An expert-annotated NLP dataset for legal contract review". arXiv:2103.06268 [cs.CL].

[contractnli2021-3] Koreeda, Yuta; Manning, Christopher D. (2021). "ContractNLI: A dataset for document-level natural language inference for contracts". arXiv:2110.01799 [cs.CL].

[maud2023-4] Wang, Steven H.; Scardigli, Antoine; Tang, Leonard; Chen, Wei (2023). "MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding". arXiv:2301.00876 [cs.CL].

[legalbenchrag2024-5] Pipitone, Nicholas; Alami, Ghita Houir (2024). "LegalBench-RAG: Evaluating Retrieval-Augmented Generation for Legal Reasoning". arXiv:2408.10343 [cs.AI].

[1]

[2]

[3]

[4]

[5]