Good afternoon,
In the past I have written web crawlers, programs that look at web sites and these were good ones that obey a file called robots.txt, many bad ones don't.
The cyclechat robots.txt is;
User-agent: *
Disallow: /find-new/
Disallow: /account/
Disallow: /goto/
Disallow: /posts/
Disallow: /login/
Disallow: /admin.php
Allow: /
User-agent: BoardReader
Disallow: /
User-agent: Mediapartners-Google
Disallow:
Sitemap:
https://www.cyclechat.net/sitemap.php
The lines
User-agent: *
and
Allow: /
are a public statement that any automated tool such as a search engine is actively given permission to read urls such as
https://www.cyclechat.net/threads/privacy-on-this-site.249843/
the url of your post
unless the site is one of the
User-agent: .........
Disallow:
pairs
I understand that you may be surprised but permission to read the site with a tool such as a search engine has explicitly been given by the site owner.
This is not a legal grey area, someone has created a file that says "yes it is okay for a robot to read these pages" :-)
Bye
Ian
Edit: It is possible that there may be an error in the file and
Disallow: /posts/
may be intended to say
Disallow: /threads/