OpenAI develops watermarking system to detect ChatGPT text

In a significant development, OpenAI has created a watermarking system designed to detect text generated by ChatGPT, according to a report by The Wall Street Journal. This system has been ready for about a year and aims to help teachers identify AI-generated assignments, potentially creating challenges for students who rely on such tools for their work.

The watermarking system works by subtly altering the word prediction process of ChatGPT to produce a detectable pattern in the text. However, this pattern can still be beaten with advanced techniques, meaning it’s not foolproof.

Despite its 99.9% accuracy and resistance to tampering, internal discussions at OpenAI about releasing the tool continue. One major concern is that it could discourage users; about 30% of users surveyed said they would use ChatGPT less if watermarking were implemented.

Even under ideal conditions, the watermark could be successfully removed by rewording the AI-generated text using a third-party tool. OpenAI acknowledges that their watermarking strategy, while effective in many cases, has limitations and may not always be the best solution.

To increase reliability, OpenAI is exploring the use of cryptographically signed metadata. This method, although not yet widely available, is also being developed by other companies like Google.

The Wall Street Journal’s report indicates that OpenAI’s technology is essentially ready to go and could detect AI-written text with 99.9% certainty. The article notes that internal debates about the technology have been ongoing for two years.

In response to the report, OpenAI confirmed in a blog post that it has been researching watermarking technology internally. The company stated that while the system has been highly accurate and effective against localized tampering, such as paraphrasing, it is less effective with text that has been translated or reworded using an outside model.

Additionally, the watermarking system is vulnerable to hacks, such as adding and then deleting junk characters, which OpenAI describes as making it easy for bad actors to circumvent.