I have a Wordpress site, hosted on WPEngine, that serves as a CMS for our website through an endpoint.
On the Wordpress site, I have installed YoastSEO plugin, and have edited the robots.txt file to the following:
User-agent: Googlebot
Disallow: /nogooglebot/
User-agent: *
Allow: /
Sitemap: https://[my-site].wpengine.com/sitemap_index.xml
however, when I navigate to [my-site].com/robots.txt (on incognito, or via a robots.txt tester), it shows up as:
User-agent: *
Disallow: /
In order to ensure the changes were being applied from the Yoast plugin, I accessed the Wordpress files via SFTP, and the robots.txt in the root directory matches the one Yoast presents on the dashboard,
User-agent: Googlebot
Disallow: /nogooglebot/
User-agent: *
Allow: /
Sitemap: https://[my-site].wpengine.com/sitemap_index.xml
I've cleared the WP cache and my local cache multiple times, but the live robots.txt still is incorrect.
Why would there be a difference here? How can I force the WP environment to use the robots.txt I've edited instead of the one it's providing?
I've just run into this myself, it is due to the following:
As documented here: https://wpengine.com/support/read-use-robots-txt/
In my case, I implemented CloudFront CDN with .wpengine.com as the origin. I talked to WPEngine support and they changed a setting to allow crawling on that domain.