AI Detection False Positive Research

Peer-reviewed studies documenting how often AI detection tools incorrectly flag human-written text. Updated as new research is published.

Key statistics

Last updated: March 2026 · Sources ↓

All statistics from peer-reviewed research. Each entry includes source citation and context.

11.8%

False positive rate for non-native English speakers

Measured against GPTZero and Turnitin AI detection on human-written academic essays

Liang et al., 2023 ↗Among students writing in a second language

6.8%

False positive rate across all tested AI detectors

Study tested 6 AI detectors against known human-written text from diverse writing styles and academic levels

Weber-Wulff et al., 2023 ↗Range: 1% to 32% across individual detectors

32%

Maximum false positive rate observed in any individual detector

Worst-performing detector tested against human-written academic essays in formal style

Weber-Wulff et al., 2023 ↗Formal, edited academic prose is highest-risk

3–4×

Higher false positive rate for non-native speakers vs. native speakers

Consistent finding across multiple studies: writing style, not intent, drives false positive risk

Multiple studies, 2023–2024 ↗ESL writers face disproportionate risk

False positive probability calculator

Estimate your false positive risk based on your writing context and the tool used. Based on the studies above.

Calculator coming soon — we are collating study data to ensure accurate inputs. For immediate help, see the accusation guide.

Source studies

Peer-reviewed and preprint studies on AI detection accuracy and false positive rates.

GPT detectors are biased against non-native English writers ↗

Weixin Liang, Mert Yuksekgonul, Yining Mao, Eric Wu, James Zou · arXiv / PNAS · 2023

Key finding: GPT detectors flag non-native English writing at significantly higher rates than native English writing, even when all text is human-authored.

Testing of Detection Tools for AI-Generated Text ↗

Debora Weber-Wulff et al. · International Journal of Educational Technology in Higher Education · 2023

Key finding: No tested detector was consistently reliable. False positive rates ranged from 1% to 32%. Detectors frequently missed AI text while flagging human text.

Can AI-Generated Text be Reliably Detected? ↗

Vinu Sankar Sadasivan, Aounon Kumar, Sriram Balasubramanian, Wenxiao Wang, Soheil Feizi · arXiv · 2023

Key finding: AI-generated text detectors are fundamentally limited. Simple paraphrasing attacks reduce detection rates to near-random while preserving text quality.

Know of a relevant study not listed here? Send it to us.

Why AI detectors fail structurally →·Turnitin false positive guide·GPTZero false positive guide

Don't rely on a detector to prove your work

Scripli records your writing session and issues a Human Authenticity Certificate — proof that exists before any dispute begins.

Get started free Generate an appeal letter