How to prevent content from being learned by major AIs

PubDate:

/ ModDate:

WordPressに関する情報の画像

<Description based on the Act on Premiums and Representations> The content on this site may include product promotions.

I have also been using generation AI from time to time recently. If you don’t ask questions properly, you’ll get answers like “Huh?”, so my current impression is that you can’t just take it at face value, but if you keep learning the correct answer content, you’ll eventually become the Terminator. A world like this may come.

Well, the theme this time is to prevent such useful AI from learning your own content.

When I looked into it, it seems that OpenAI’s “ChatGPT”, Google’s “Gemini (formerly Bard)”, and “CCBot” from a non-profit organization called Common Crawl are the major AI bots as of . , we will introduce how to prevent information collection from AI in WordPress.

If WordPress has been installed and you haven’t done anything, you won’t have a robots.txt file anywhere; it will be automatically generated.

Specifically, as shown in the image below, open “Settings” → “Display Settings” and add it to the item “Additional rules for robots.txt” at the bottom, and it will be reflected. You can set it up without using the In or writing any special code, so in principle there is no need to add robots.txt to your site yourself.

By the way, if you click the link “robots.txt” enclosed in a red frame, you can check the contents of robots.txt that is being dynamically output.

On the other hand, if robots.txt already exists in the root directory of your site, it may not be reflected or the operation may malfunction unless you add it there, so be sure to check it.

WordPressの表示設定画面(robots.txtへの追加ルール)

Below is the code for the denial settings for each AI crawler, so if you add or delete it as necessary, it will be reflected immediately (be sure to click the robots.txt link for final confirmation).

User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent:  CCBot
Disallow: /

Google claims that refusing to collect data from Google’s AI bots will not affect search rankings, but we have no way of knowing what is really going on, so please add your data at your own risk. .

The above code was added to this site on April 19, 2024 and we are currently monitoring the situation. Well, the content isn’t particularly educational, so there’s no point in blocking it (lol). Trials are important in everything…


This page is an English translation of the Japanese page. The information on the Japanese page may be more accurate because we maintain the Japanese page as the basis and not all content matches.
You can view the Japanese page by clicking the “View Japanese Page” button on the top right of the page, so please take a look.

If you found the content posted on this page or the reference code introduced helpful, please leave a comment in the comment section at the end of the page, or spread the word on SNS etc.

The methods and codes posted on this page have been confirmed to work in my environment as of the last update date. Please note that in principle, we will not respond even if we receive information about malfunctions or detailed customization methods in the comments section.

Until the end Thank you for reading.

The copyright and ownership of all content published on this page, including texts, images, and codes, belongs to this site, and reproduction is strictly prohibited.