The scientific world is facing a new reality: the rise of AI-written papers. While large language models (LLMs) like ChatGPT offer powerful tools for researchers, their widespread use has raised concerns about the integrity and quality of scientific publications.
A recent study, published as a preprint on arXiv, suggests that the problem is more pervasive than previously thought. Researchers from the University of Tübingen in Germany and Northwestern University in America analyzed a massive dataset of English-language papers published on PubMed, a biomedical research search engine, between 2010 and 2024. Their analysis revealed that at least one in ten new scientific papers contains material generated by an LLM. This equates to over 100,000 papers per year, and the figure is even higher in some fields, such as computer science, where over 20% of research abstracts are estimated to contain LLM-generated text.
Spotting these AI-written sections is not straightforward. Traditional methods, such as detection algorithms and the identification of suspicious words commonly favored by LLMs, have limitations. They rely on “ground truth” data – separate sets of text written by humans and machines, which are difficult to collect due to the ever-evolving nature of language and model updates.
To address this challenge, the researchers developed a new method called “excess-vocabulary,” inspired by demographic work on excess deaths. This method looks for abnormal word usage in scientific abstracts, identifying words that appear significantly more frequently than expected based on existing literature.
Their analysis revealed a striking trend. In 2020, a surge in COVID-related words was observed, reflecting the pandemic’s impact on scientific research. However, by early 2024, a different set of words began to appear with unusual frequency – words related to writing style rather than subject matter, such as “delves,” “potential,” “intricate,” and “insights.” These words, the researchers suggest, are likely indicative of LLM assistance.
Their estimates indicate that at least 10% of scientific abstracts published on PubMed in 2024 may contain LLM-generated text. This prevalence varies across fields, with computer science showing the highest use and ecology the lowest. Geographic differences were also observed, with scientists from Taiwan, South Korea, Indonesia, and China appearing to be more frequent users of LLMs than those from Britain and New Zealand.
The excess-vocabulary method provides a valuable new tool for understanding the extent of LLM use in scientific publishing. However, it is not a foolproof solution. It cannot identify individual instances of LLM-generated text within specific abstracts, and researchers can avoid detection by simply avoiding certain words.
The growing use of AI in scientific writing raises critical questions. How can we ensure the integrity and quality of research when LLMs are increasingly used to generate text? How can we distinguish between genuine human contributions and machine-generated content? How can we address the ethical challenges associated with using AI for writing scientific papers?
As the scientific community grapples with these questions, it is clear that the future of scientific publishing is intertwined with the development and use of AI. Finding a balance between harnessing the power of LLMs for research and safeguarding the integrity of scientific communication will be a major challenge in the years to come.