@OwOarchist@Rhoeri Unlike AI crawlers, search engines generally respect robots.txt and noindex tags, which will tell them not to index or surface those pages in search results. This is how fediverse profiles which have chosen to opt out of internet search indexes do so.
You should still assume things you post in public with no auth required are public of course.
Does robots.txt really work in the fediverse? At least on lemmy, the content can be retrieved on different hosts, all of which have different robots.txt files. Unless it is somehow “baked” into the protocol.
Why wouldn’t they? You don’t even have to be logged in to view them.
You should never assume anything you post publicly online is at all private or hidden from any search engine/AI.
Could you imagine someone legitimately looking some shit up and having trash from lemmy.ml be the result?
The world isn’t really for that level of misinformation.
As if the general level of misinformation online isn’t already several orders of magnitude worse than anything on lemmy.ml.
misinformation > smug and arrogant misinformation
I don’t know about that… smug and arrogant at least turns a lot of people off.
Regular misinformation flies under the radar.
🙄
@OwOarchist @Rhoeri Unlike AI crawlers, search engines generally respect robots.txt and noindex tags, which will tell them not to index or surface those pages in search results. This is how fediverse profiles which have chosen to opt out of internet search indexes do so.
You should still assume things you post in public with no auth required are public of course.
Does robots.txt really work in the fediverse? At least on lemmy, the content can be retrieved on different hosts, all of which have different robots.txt files. Unless it is somehow “baked” into the protocol.
Major search engines respect robots.txt, but as you said some instances allow them but this is not a scalable way