The Allens AI Australian law benchmark

The future of law

Will GenAI take the place of the human lawyer?

AI-driven answer tools will likely allow legal advice to be delivered quickly and efficiently, with less need for 'time-consuming' research or specialist expertise. But that advice may not always be correct.

As our benchmarking shows, the current generation of large language models (LLMs) still falls short of the critical analysis skills required to deliver consistently reliable legal advice. Generative AI tools often struggle to identify the right legal context without detailed prompting, and they can’t express the nuance or uncertainty that is fundamental to legal reasoning. These limitations are unlikely to be solved by simply adding more data or training—they stem from how LLMs work. At their core, these tools generate text by predicting likely word sequences based on patterns in data, not by reasoning or understanding the law in any meaningful sense.

That said, for a significant portion of tasks, LLMs can generate impressive responses that supplement and elevate lawyers’ work. We expect their performance to continue improving, particularly as hallucinated citations are reduced through better safeguards and verification tools. Drawing on our 2025 benchmarking results and hands-on experience with these technologies, we found LLMs to be particularly effective at:

  • streamlining routine writing tasks by generating initial drafts for correspondence and legal drafting;
  • distilling complex information into concise summaries through analysis of materials across various modalities; and
  • conducting high-level legal research in general practice areas without needing specific citations, especially in well-understood, internationally harmonised areas of law.

Even as the tools improve, many of the qualities that make a great lawyer—judgement, intuition, critical thinking, creativity—remain uniquely human. In legal practice, LLMs are safest and most effective in the hands of those who already have legal expertise.

Ongoing improvement of LLM technology

Since we completed our 2025 Allens AI Australian law benchmark testing, new iterations of existing LLMs have emerged (such as GPT-4.1 and OpenAI o4-mini released in April 2025, Claude 4.0 Sonnet released in May 2025, and Gemini 2.5 released in March 2025), with ongoing advances by other LLM developers not included in the test, such as Grok.

Allens continually monitors these new developments and welcomes discussions with any clients interested in the subject.

Speciality models and 'hybrid search'

While our benchmark focuses on each LLM's capacity to provide legal advice in response to legal-problem questions as a proxy for each LLM's overall utility, the past 12 months have also seen an increasing number of new products specifically designed for use by lawyers.

For example, a number of providers are now developing and launching products specifically trained on vetted legal content (eg databases of cases, law and guidance—though often they're focused on US law) and designed to only base their conclusions on that content. Developers have also begun tackling the citation problem by adopting 'hybrid search' techniques that combine LLM responses with links to substantiating vetted sources, in addition to beginning to integrate multi-model functionality into their solutions. Over time, this may enable lawyers to leverage the ‘best’ LLM for the particular legal use case to help optimise results.

Other providers have focused on tailoring their user experience to existing law firm workflows. AI-powered 'legal assistants' now perform specific comms tasks, such as summarising sets of documents and searching databases.