With the introduction of ChatGPT last year, we saw day by day enhancement and introduction of AI tools.
So after the ChatGPT launch, Bing launches its Bing AI and thereafter Google introduces Bard.
These three AI tools then became the inspiration for giving birth to so many AI tools in the industry right now.
Now, all these tools fall in the category of Generative AI which is a machine learning technology in which the transformer has been trained on a set of data. After training the transformer produces entirely new content (whether its text, video, audio, and more) based on the data it has been trained on.
So, whether it is ChatGPT, Bing AI, or Bard, all these tools are Generative AI technology. And, it has been trained on data collected from the Internet (so basically websites).
Initially, users (or more specifically webmasters) weren’t specifically aware that these tools have used their website’s data to train themselves. However, with more awareness, they now learnt about the same. They are now asking these AI tools to give some medium of disallowing them from accessing their website.
Respecting their decision, back in April this year, ChatGPT launched a user agent that webmasters can use in the robots.txt file to disallow ChatGPT from accessing the respective website.
Soon after the launch around 242 of the 1000 most popular websites blocked Open AI web crawler to access their website.
In July this year, Google also initiated a public discussion around this issue to listen to users’ opinions. Seems like users want more control and choice over their data and that’s why Google also launched a user-agent that allows users to block BARD and vertex AI from accessing their website.
Block Bard and Vertex AI using Google Extended
Google has launched a user agent called “Google Extended” that if a webmaster uses it in the robots.txt file, then can completely block BARD and Vertex AI from accessing a website’s data.
The webmaster just has to copy and paste the below two lines of code in their website’s robots.txt file.
User-agent: Google-Extended
Disallow: /
After launching Google Extended, the company said, we want to give more control and choice to the webmasters over what they want to share with us.
They have now added Google Extended in their web crawlers list as well.
The Real Problem
At the SEO level, what do you think the real problem is? When so many such crawlers are accessing your website.
Resource
We all know that we have a limited server resource and when so many such crawlers access your website, a time comes when your server resources get exhausted and the place where you really wanted to market your website will not have bandwidth to market at the same time.
So, although OpenAI or Google launches a crawler agent, we need a universal method to block all the AI tools except the one we want to give access to.
Infact, blocking access via robots.txt is not the ultimate effective way as we place directives inside it which are basically instructions to the web bots about how or which part of our website they should crawl. However, it doesn’t guarantee a web bots respects your instruction.
Google also asserts that Meta tags like No-Index is better (in comparison to robots.txt) when we don’t want to index a webpage or file in the search.
So, there should be an approach like Meta Tags to disallow AI tools access to websites so easily.
What do you think? Let me know in the comments down below.