In the last year or so there have been approximately a skillion gajillion articles written on the ethics of AI use by lawyers. I’m not even going to attempt to provide a comprehensive list of links here – just search “ethics lawyers AI” and you’ll find articles in bar journals, newsletters from law firms, and ethics opinions from bar associations like the ABA, the New York City Bar, the Florida Bar, and the State Bar of California. Many courts have entered orders governing the use of AI, including one district court in Texas that bluntly warned lawyers that AI tools “make stuff up – even quotes and citations.” My colleague Frank Pasquale has developed an online course covering the ethics of generative AI.
With all this attention, you’d think lawyers would have a pretty good level of awareness of the danger of trusting AI-generated briefs and research results – at the same level as, say, the general knowledge that crypto tokens named after internet memes are risky investments. Based on my unscientific method of reading legal news, however, it appears that the rate of lawyers getting in trouble for filing briefs including citations made up by generative AI tools is actually increasing. Recent examples include:
(1) Lawyers at a big firm, K&L Gates, sanctioned for submitting a brief written by AI, including fabricated authorities. The special master concluded the lawyers’ conduct was tantamount to bad faith, which is an important predicate for sanctions under both Fed. R. Civ. P. 11 and the court’s inherent power. It didn’t help that the lawyers tried to tapdance around their responsibility for the submission of phony cases:
[T]he conduct of the lawyers at K&L Gates is also deeply troubling. They failed to check the validity of the research sent to them. As a result, the fake information found its way into the Original Brief that I read. That’s bad. But, when I contacted them and let them know about my concerns regarding a portion of their research, the lawyers’ solution was to excise the phony material and submit the Revised Brief – still containing a half-dozen AI errors. Further, even though the lawyers were on notice of a significant problem with the legal research (as flagged by the brief’s recipient: the Special Master), there was no disclosure to me about the use of AI. Instead, the e-mail transmitting the new brief merely suggested an inadvertent production error, not improper reliance on technology. Translation: they had the information and the chance to fix this problem, but didn’t take it (p. 7).
[Law.com article here; parties’ briefing and blistering opinion by the special master here.]
(2) The ubiquitous plaintiffs’ firm Morgan & Morgan was sanctioned for filing motions in limine that had been drafted, in part, by using an in-house AI tool called “MX2.law,” into which the lawyers uploaded a document with the prompt to “add federal caselaw from Wyoming.” The lawyers in this case at least admitted their mistake, but only after the district court entered an order to show cause. They were sanctioned nonetheless, with the district court concluding, quite reasonably, that made-up cases are not “existing law” for the purposes of Fed. R. Civ. P. 11(b). A more demanding duty, but one certainly consistent with Rule 11 and Rule 5.2(a) of the Model Rules of Professional Conduct, is that a lawyer may not rely uncritically on another lawyer’s compliance with the rule:
Signing a legal document ensures that the attorney read the document and conducted a reasonable inquiry into the existing law. See . . . Adamson, 855 F.2d at 673 (“The attorney must ‘stop, look, and listen’ before signing a document subject to Rule 11.” This duty is a “nondelegable responsibility.” Thus, blind reliance on another attorney can be an improper delegation of this duty and a violation of Rule 11 (p. 8, some citations removed).
[ABA Journal article here; district court order here.]
(3) And, in a highly amusing irony, lawyers from Latham & Watkins included bogus citations in an expert declaration in support of the position of the AI firm Anthropic in defense of a copyright infringement action brought by music publishers. The bogus citations came from – you guessed it! – Anthropic’s Claude chatbot.
What’s going on here? Why are lawyers, even at sophisticated law firms, continuing to make lazy use of generative AI tools without checking the citations for accuracy? At one time the problem could be written off as lawyers not understanding the risks, but today it’s impossible to credit that as an excuse. Of course the legal profession, like every other industry, is rushing to adopt AI tools to find new efficiencies. (Or, at least, law firms are worried that if they’re not using AI, their competitors may be.) And there’s an important long-term payoff to using AI to increase access to legal services for poor and middle-income clients. But I’m less interested in the technology itself than in the more human ethics question of why lawyers keep screwing this up.
I want to start with a familiar story, because I think it may help answer that question.
Sympathy for the Early Poster Children
A couple of years ago there was a case called Mata v. Avianca, 678 F. Supp. 3d 443 (S.D.N.Y. 2023). It started as a personal-injury case brought by a guy whose knee was crashed into by a beverage cart on an airline flight (as a tall guy with long legs who travels on airlines a lot, I have the most sincere sympathy for the plaintiff). He hired a couple of personal-injury lawyers who filed a lawsuit in New York State court, which was promptly removed to federal court because the injury had occurred on a international flight. The lawyers, as far as anyone can tell, had never even heard of the Montreal Convention that governs the liability of international air carriers – they mostly did workers’ comp and car accident cases. The airline’s lawyer moved to dismiss because the statute of limitations under the Montreal Convention had run. To make things even more complicated, however, the airline had recently emerged from bankruptcy and there was some uncertainty over whether the bankruptcy proceedings tolled the statute under the Convention.
The lawyers, who were in way over their head in an international aviation law proceeding in federal court, used a newfangled research tool they had heard about, called ChatGPT. The AI model, asked to draft an argument contending that the bankruptcy filing tolled the statute of limitations, dutifully responded with a brief citing a bunch of cases from other federal circuits supporting the proposition that the statute had not run.
Even if you are one of the few people who has not heard of this specific case, I’m sure you can guess the next part: Opposing counsel went to look up the cases cited by the plaintiff’s lawyers, couldn’t find them, and responded with a filing saying [head-scratching emoji]. The judge smelled a rat, asked his law clerk to look up the cases, and confirmed that they had been hallucinated by the AI app. In a detailed order, U.S. District Judge Kevin Castel sanctioned the lawyers under Fed. R. Civ. P. 11. Citing one of the few Supreme Court cases on Rule 11, Cooter & Gell v. Hartmax Corp., 496 U.S. 384 (1990), Judge Castel found:
The filing of papers “without taking the necessary care in their preparation” is an “abuse of the judicial system” that is subject to Rule 11 sanction. Rule 11 creates an “incentive to stop, think and investigate more carefully before serving and filing papers.” “Rule 11 ‘explicitly and unambiguously imposes an affirmative duty on each attorney to conduct a reasonable inquiry into the viability of a pleading before it is signed.’”
The specific reasoning here is important, because other, related, legal rules impose penalties on lawyers only for knowingly making misrepresentations of fact of law to a tribunal (see, e.g,, Rule 3.3(a) of the Model Rules of Professional Conduct). The lawyers in this case were exceedingly careless, but they plausibly denied intending to deceive the court. Rule 11, on the other hand, does not have a “pure heart, empty head” defense. (I had a question on this on my PR exam last year.) Thus, the lawyers’ failure to read the cases cited in their brief was sanctionable.
This is certainly not a novel problem. When I was in practice in the mid-1990’s, my firm had a weekly “litigation lunch” where lawyers in the litigation department would sit around and tell war stories. A remember a junior partner in the maritime law department talking about a case in which a local PI lawyer found himself in a maritime proceeding in federal court, which was way outside his wheelhouse. (This was not uncommon in Seattle, where people injured in Alaska fisheries would sometimes hire regular personal-injury lawyers to sue vessel owners – bad idea.) The plaintiff’s lawyer had made an argument that was a real stretch on existing law, and to the judge’s question about the way he was reading one of the cases he had cited, the lawyer replied, “Uh, I’m not sure, your Honor . . . I just read the West headnotes.” Game over. It was a great litigation lunch story and we all had a good laugh at the expense of the knucklehead lawyer who didn’t bother to read the whole case and then admitted it in court.
The Mata v. Avianca case had a similar professional context, which is why the reaction of many lawyers was to mock the workers’ comp/PI lawyers who handled that case and became the poster children for the risks of fake citations hallucinated by large language model AI tools. There’s a certain snobbery about being able to operate in federal court, with its complex procedures and FAFO judges, in contrast to the loosey-goosey way that some state courts operate. The judge even made a bit of fun of the lawyers, observing that in the hearing one of the lawyers said he thought the citation “F.3d” meant “federal district, third department.” Hardy har har, but in defense of the lawyers, it’s customary in New York State courts to cite the department of the Appellate Division that decided a case. But the judge, understandably annoyed that these lawyers were wasting his time, seemed to go out of his way to portray them as hayseeds.
I’ll admit that I initially mocked the lawyers, too, but there is a sympathetic version of the story that goes like this: The client didn’t know he had a highly specialized legal problem – he just knew that his knee got hurt by a clumsy flight attendant. He called some lawyers who advertised that they handle injury cases and retained them. The lawyers didn’t realize that suing a South American airline does not involve more or less the same substantive law and state court procedures, and when the case was removed to federal court by the airline’s lawyers, they did what lawyers are trained to do – they did some research. In an ideal world they should have declined the case at the outset or associated with co-counsel when it became clear that this was a much more complex case than they had signed on for. But it’s not an ideal world, and anyway, the bar tends to be highly stratified and it’s not obvious that these guys knew the “right” kinds of lawyers to handle this case. The market for legal services is highly imperfect and the problem of matching suitable lawyers with particular clients is a hard one for all but very sophisticated clients. The plaintiff in Mata v. Avianca deserved quality representation and it should be possible for the lawyers he retained to provide it, perhaps with a little help from technology.
Jimmy McGill, not Saul Goodman
In other words, it’s easy to make the plaintiffs’ lawyers sound like Saul Goodman, but really they’re more like Jimmy McGill. Before becoming Saul Goodman and providing legal advice to the criminal enterprise run by Walter and Jesse in Breaking Bad, we learn from the outstanding television series Better Call Saul that Jimmy is a scrappy, hustling lawyer who works hard for his clients but never enjoys the respect paid to his brother Chuck, a name partner at a fancy law firm. (He also never gets respect from Chuck, which sets into a motion a series of problems for Jimmy.) Jimmy would totally represent a guy whose knee got banged into by a beverage cart on an airline flight.
Go, Land Crabs!
For a lawyer like Jimmy McGill, Westlaw and Lexis are expensive luxuries. The plaintiffs’ lawyers in Mata v. Avianca had a subscription to a service that provided all the New York State law they needed for most of their practice, but no database of federal caselaw. So, according to the brief they filed in opposition to the district court’s order to show cause why they should not be sanctioned, they took advantage of a new, highly publicized legal research tool. The lawyers’ brief attached a bunch of articles in the legal press saying that generative AI is so good that it’s about to render legal research obsolete. As the brief explains, the lawyer “approached this task the way a lawyer would generally approach a legal research project using a standard database – unaware that ChatGPT involved a very different technology.” We don’t really expect lawyers to look under the hood of new technology to try to understand how it works. The brief argues that the lawyers believed ChatGPT worked like a search engine or a legal database. Sure, AI researchers know these systems are prone to what are called “hallucinations” – just making up stuff if their prior training has not included genuine sources – but do lawyers know that? Should they be expected to know that?
That’s not a bad argument, but the decisive response is the lesson from my litigation lunch story: You always have to read the cases cited in anything you submit, and therefore read any cases generated by some research tool you’re using. The duty of competence does not require lawyers to understand exactly how the technology works – the statistical techniques employed by the large language models underlying generative AI applications – but it does require them to cross-check the results of any search to ensure that the cases cited actually say that the AI tool says they say. This isn’t new at all. Lawyers understood this back when the newfangled research tools were Lexis and Westlaw. As the district court noted, there were also telltale signs that the research tool used by the lawyers was a bit hinky, including the fact that the excerpt from the fictional case that was made up by ChatGPT “shows stylistic and reasoning flaws that do not generally appear in decisions issued by United States Courts of Appeal,” including mixing up Chapter 7 and Chapter 13 bankruptcy proceedings, a reference out of the blue to an arbitration proceeding, and ending without a conclusion. That’s true, but a lawyer using an AI tool may notice those signs only if they’re already primed to look for them. A lawyer who assumes the results of a search are reliable may not be on the lookout for indications that the cases cited in support are simply made up.
I’m on this little digression about Jimmy McGill not only because I loved Better Call Saul but because it is important to situate the debate about lawyers and technology in the context of a broader and longer-running debate about access to justice. There is no doubt there are excellent human lawyers out there who could have dealt with the Montreal Convention, bankruptcy, and statute of limitations issues competently. But they’re expensive. They probably work for Chuck’s firm, Hamlin, Hamlin & McGill. Lawyers like the workers’ comp/PI lawyers in the Mata v. Avianca case run a low-overhead, high-volume, no-frills practice that does a good job of helping people with minor injuries obtain compensation. In order to enhance access to legal services, we should encourage this form of law practice.
There is a ton of empirical evidence on the “justice gap” in America. A 2017 study by the Legal Services Corporation found that 86% of civil legal matters reported by low-income Americans in the prior year received no or inadequate legal help. These unmet legal concerns often pertain to basic human needs such as housing (evictions and foreclosures), consumer debt, and child custody proceedings. That’s the demand side. On the supply side, there is also evidence showing the decline of what legal scholar Bill Henderson refers to as the “PeopleLaw sector” – that is, lawyers dealing with the legal needs of individuals in matters such as personal injury, matrimonial, personal bankruptcy, preparing simple wills, and minor criminal matters. One study found that the average solo or small-firm lawyer in the PeopleLaw sector earns $422 per day before paying office overhead expenses. Assuming weekends off and two weeks of vacation, that’s gross revenue of $105,000 per year – again, before subtracting office overhead expenses. So the Jimmy McGills of the world have to care a lot about efficiency and cost-containment. For 99% of the work they do, they don’t need a research service that covers anything outside of New York State law. When encountering a novel issue, it’s understandable that they may be drawn to emerging technologies like generative AI based on large language models. If we care about access to legal services, we should presumably encourage AI use by solo practitioners and small-firm lawyers.
Still, you’d think as a matter of minimum standards of professional competency, these lawyers would know the requirement of Rule 11, that they read and verify the propositions for which they cite cases in their filings. If legal technology, including generative AI, is going to make a difference in fostering access to legal services, it cannot be by providing crappy legal services to individuals who can’t afford to pay for a fancy law firm . . . oops, but wait a second – even fancy law firms are making these mistakes. Again, what’s going on here?
Human-Computer Interaction
When I posted something about this recently on Bluesky, Canadian legal ethics scholar Amy Salyzyn, who has written a lot about lawyers and AI, suggested that the explanation for the continuing misuse of generative AI tools by lawyers is not ignorance of the risks, but subtle patterns that tend to develop in the way people interact with technology. She pointed me to an article in the Australian Law Society Journal about the sirens’ song of AI (the title of which I borrowed as the subtitle of this post). Based on an empirical study of law students, the article explained the problem as one of “verification drift,” where people may start out skeptical of some piece of technology, but as they become increasingly comfortable with its use, they
become overconfident in its reliability and find verification less necessary. This misplaced trust may stem from GenAI’s authoritative tone and ability to present incorrect details alongside accurate, well-articulated data. This suggests that the challenge is not just a lack of awareness but a cognitive bias that lulls users into a false sense of security.
This helps explain why it seems like the rate of erroneous AI-generating legal filings is increasing, rather than decreasing, as lawyers become more familiar with it. It’s not so much an incompetence or laziness story as a trust story. The proliferation of AI tools is gradually seducing us into complacency about the risks. It’s similar to the automation dependency problem in commercial aviation, famously described by American Airlines training captain Warren Vanderburgh as creating pilots who are “children of the magenta line”: Pilots have come to rely on automation, adopted by the industry in the name of safety, to the extent that more fundamental flying skills have eroded, to the detriment of safety. The deep lesson of the automation-dependency phenomenon is that poor human performance motivates the adoption of technological solutions, which in turn worsen human performance.
Some scholars, like Suffolk Law School Dean Andrew Perlman, argue that the use of AI by lawyers is not only ethically permissible but may be required, as a matter of the duty of competency. However, the recent run of incidents of AI misuse suggests that, somehow, the training and regulation of lawyers is not adequate to mitigate some of the risks of using this technology. The Australian study, which posits that the problem is due to “verification drift,” concludes with four recommendations for meaningful AI literacy by lawyers:
Evidence from this study and other reported cases of GenAI misuse in courtrooms shows that interacting with GenAI requires a distinct set of skills. Acquiring these skills goes beyond following a list of instructions. While guidelines such as the Supreme Court of NSW and the Law Society of NSW provide valuable directions, they are not sufficient to acquire skills for the responsible use of this technology. Legal professionals need access to training courses that provide a hands-on, interactive experience.
As generative AI becomes an increasingly important tool and expected skill in the legal profession, it is difficult to imagine any lawyer who will not use GenAI at some point in their career, and a lack of understanding—even of its basic limitations—can lead to serious consequences. In this context, mandatory AI literacy training may be warranted.
To mitigate the risk of verification drift and emphasise the importance of a rigorous verification process, legal practitioners should be engaged with case studies in which lawyers previously introduced fabricated AI-generated material in their court submissions. Exploring the thought process of their peers can possibly encourage them to adopt a more sceptical and careful approach to AI-generated content.
It is important to approach claims about AI capabilities with caution. While GenAI promises benefits in the legal domain such as enhanced research capabilities and document drafting, these claims are often overstated. Companies like OpenAI highlight their models’ performance on legal tasks, including outperforming 90 per cent of the US Bar Exam test takers. However, studies – including an Australian empirical study – have challenged these assertions, showing that GenAI tools perform below the average law student. Until comprehensive and up-to-date benchmarks are available for legal tasks, such claims should be treated with scepticism.
I am as prone to Schadenfreude as anyone else, and I have to admit that it’s funny when lawyers for Anthropic screw up by using Claude to generate fake citations. But generative AI is not going away, and lawyers are going to have to get better at interacting with this technology to streamline their work while not introducing new types of errors. Understanding the problem as fundamentally coming down to the way in which humans-computer interactions can subtly influence human behavior is key to mitigating these risks.
Thank you for the insightful piece—it really made me reflect. It brought to mind a scenario: Counsel relies on AI to generate case law and submits it as part of testimony. Then, the judge uses the same AI to interpret the results and finds them to be inaccurate. You're absolutely right—AI, at this stage, is still a language model, not true general intelligence.
Ultimately, I believe that whatever you submit is your work—accurate or not. I’d be extremely frustrated if my attorney relied on AI-generated content without proper review, only for it to lead to an unfavorable ruling. There’s no question AI has tremendous potential but using it without verifying or understanding the output is a significant risk.
I work in business valuation and have to be very cautious myself. I primarily use AI to streamline redundancies, but when it comes to core analysis and conclusions, the responsibility—and liability—remains entirely with me.
Thanks for the article. Very interesting and also concerning.
I just read the paper discussing verification drift, it seems the author is not generalising verification drift to any piece of technology, but only GenAI.
My understanding of verification drift is that: There is an emphasis on users who are aware of GenAI limitations (hallucination) and understand the need to verify the outputs. Despite this, given the GenAI's tone, users find the output convincing and drift away from verifying it, hence 'verification drift'.
In another piece by the same author, I read that he believes various factors contribute to verification drift. He says those who use GenAI don't use it once a day; they use it frequently. The burden of verifying the outputs every time they use GenAI, given that they sound credible, means that despite knowing that they should verify it, they sometimes decide not to do so.
He also elaborates in his study that the burden of verification is considerable, as evidenced by his experiment, where he sometimes had to spend a few hours verifying AI-generated content.