The Allens AI Australian law benchmark

The future of law

Will Gen AI take the place of the human lawyer?

Lawyers still needed—but AI assisted lawyers will excel

While our results demonstrate that the current cohort of LLMs still fall short of the critical analysis skills necessary to provide quality legal advice consistently, they also demonstrate that—for a significant portion of the time—they can provide impressive responses that can supplement and elevate lawyers' skills. We see room for GenAI's performance in the legal sphere to continue to improve, but some limitations remain (and in some respects, have gotten worse), and may be inherent to LLMs.

Coupled with our own experience leveraging many of these tools, our 2025 test results confirm that LLMs are good at:

  • Streamlining routine writing tasks by generating initial drafts for correspondence and legal drafting.
  • Distilling complex information into concise summaries through analysis of materials across various modalities.
  • Conducting high-level legal research in general practice areas without needing specific citations, especially in well-understood, internationally harmonised areas of law.

Based on our test results, however, LLMs lack the precision to differentiate legal from other contexts, or to identify the specific legal context required without extensive prompting. This limitation appears to be inherent, and is likely to mean that LLMs, in the legal context, are only safe to use in the hands of people who already have legal expertise. The current batch of LLMs also cannot convey the uncertainty inherent in legal analysis—though this is something that could potentially be addressed through targeted tuning.

Technology also won't replace many of the human elements of being a great lawyer—judgement, intuition and critical and innovative thinking, just to name a few, which remain central to the roles played by lawyers whether in private practice, public service or in-house.

Ongoing improvement of LLM technology

Since we completed our 2025 Allens AI Australian law benchmark testing, new iterations of existing LLMs have emerged (such as GPT-4.1 and OpenAI o4-mini released in April 2025, Claude 4.0 Sonnet released in May 2025, and Gemini 2.5 released in March 2025), with ongoing advances from other LLM developers not included in the test, such as Grok.

As an ongoing project, Allens continually monitors these new developments and welcomes discussions with any clients interested in the subject.

Speciality models and 'hybrid search'

While our benchmark focuses on each LLM's capacity to provide legal advice in response to legal problem questions as a proxy for each LLM's overall utility, the last 12 months have also seen an increasing number of new products specifically designed for use by lawyers.

For example, a number of providers are now developing and launching products specifically trained on vetted legal content (eg databases of cases, law and guidance—though often they're focused on US law) and designed to only base their conclusions on that content. Developers have also begun tackling the citation problem by adopting 'hybrid search' techniques that combine LLM response with links to substantiating vetted sources, in addition to beginning to integrate multi-model functionality into their solutions. Over time, this may enable lawyers to leverage the ‘best’ LLM for the particular legal use case to help optimise results.

Other providers have focused on tailoring their user experience to existing law firm workflows. AI-powered 'legal assistants' now perform specific comms tasks, such as summarising sets of documents and searching databases.