
Vals AI, a provider of AI evaluation and benchmarking studies for large language models (LLMs), shared its recent VLAIR – Legal Research report, which includes an in-depth examination of how various AI products handle traditional legal research questions compared to their human counterparts based on accuracy, authoritativeness, and appropriateness.
As reported by Law Sites Inc, both legal-specific and general-purpose generative AI tools achieved an accuracy rate of 80%, outperforming lawyers by nine points. Surprisingly, in certain comparisons, the tested general-purpose AI product (ChatGPT) provided a more accurate response than the legal AI products.
According to the report, underperformance among legal LLMs was due to either technical issues or a lack of available source data. While ChatGPT matched its legal-AI rivals on accuracy, it lagged in authoritativeness. Vals AI noted this reflects access to proprietary legal databases and curated citation sources, which remain differentiators for legal-domain systems.
Not all is lost for humans, the report also found that lawyers outperformed AI in questions requiring deep interpretive analysis or nuanced reasoning, underscoring “the enduring edge of human judgment in complex, multi-jurisdictional reasoning.” This study is well worth a closer look.
Report Link: https://lnkd.in/d4AVAFgK
