How to resolve 403 errors using nitpick_ignore_regex in Sphinx with linkcheck?

144 Views Asked by At

I am using make linkcheck in Sphinx to find broken or otherwise problematic links in my documentation (.rst files). I fixed all links but two Intel links that keep coming up with the following 403 error message:

(CustomizingTheWorkflow/ConfigWorkflow: line 1109) broken    https://www.intel.com/content/www/us/en/docs/cpp-compiler/developer-guide-reference/2021-10/thread-affinity-interface.html - 403 Client Error: Forbidden for url: https://www.intel.com/content/www/us/en/docs/cpp-compiler/developer-guide-reference/2021-10/thread-affinity-interface.html
(BuildingRunningTesting/ContainerQuickstart: line   26) broken    https://www.intel.com/content/www/us/en/developer/tools/oneapi/hpc-toolkit-download.html - 403 Client Error: Forbidden for url: https://www.intel.com/content/www/us/en/developer/tools/oneapi/hpc-toolkit-download.html

However, both links work fine in the browser, and a "curl -I https://link..." command from the Terminal returns a 200 status code.

I tried to use nitpick_ignore_regex to ignore the links, but I think I must be doing something wrong because the error messages still come up. At the moment, I have the following in my conf.py file:

nitpick_ignore_regex = [r'https://www\.intel\.com/content/www/us/en/docs/cpp\-compiler/developer\-guide\-reference/2021\-10/thread\-affinity\-interface\.html',
                        r'https://www\.intel\.com/content/www/us/en/developer/tools/oneapi/hpc\-toolkit\-download\.html',
                       ]

I have also tried using tuples in the form (type, target) like in the Sphinx documentation, but I'm not sure what "type" or domain is appropriate, and r'.*:.*' has not yielded any success. I assume target is the link, but perhaps my regular expression is wrong. The only other thing I could think of is a user-agent issue. I've tried these two (one at a time) to no avail:

user_agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36"
user_agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9"

Can anyone suggest a solution? Is there a better way to troubleshoot the appropriate user-agent string or more detailed information on how to determine the "type"/domain other than guess-and-check?

1

There are 1 best solutions below

0
On

nitpick_ignore_regex does not work since it affects internal cross-references only (in "nitpicky mode").

You should use linkcheck_ignore instead.

Here is an example that makes linkcheck ignore your problematic links:

linkcheck_ignore = ['.*thread-affinity', '.*hpc-toolkit-download']