Disable caching in open-uri

307 Views Asked by At

I have to, sadly, poll an endpoint and update another system when the data changes. I wrote a loop (with a sleep statement so I don’t DOS the server):

require 'nokogiri'
require 'open-uri'

desired_data = 'foo'
data = nil
url = nil

while data != desired_data do
  sleep(2)
  url = "https://elections.wi.gov/index.php/elections-voting/statistics"
  doc = Nokogiri::HTML.parse(open(url))

  puts doc

  # do some nokogiri stuff to extract the information I want.

  # store information to `data` variable.
end

# if control is here it means the data changed

This works fine except when the server updates, open(url) still returns the old content (even if I restart the script).

It seems like there may be some caching at play. How do I disable it?

Here are the HTTP headers returned:

HTTP/2 200
date: Fri, 02 Oct 2020 14:00:44 GMT
content-type: text/html; charset=UTF-8
set-cookie: __cfduid=dd8fca84d468814dd199dfc08d45c98831601647244; expires=Sun, 01-Nov-20 14:00:44 GMT; path=/; domain=.elections.wi.gov; HttpOnly; SameSite=Lax; Secure
x-powered-by: PHP/7.2.24
cache-control: max-age=3600, public
x-drupal-dynamic-cache: MISS
link: <https://elections.wi.gov/index.php/elections-voting/statistics>; rel="canonical"
x-ua-compatible: IE=edge
content-language: en
x-content-type-options: nosniff
x-frame-options: SAMEORIGIN
expires: Sun, 19 Nov 1978 05:00:00 GMT
last-modified: Fri, 02 Oct 2020 12:47:38 GMT
vary: Cookie
x-generator: Drupal 8 (https://www.drupal.org)
x-drupal-cache: HIT
x-speed-cache: HIT
x-speed-cache-key: /index.php/elections-voting/statistics
x-nocache: Cache
x-this-proto: https
x-server-name: elections.wi.gov
access-control-allow-origin: *
x-xss-protection: 1; mode=block
cache-control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
cf-cache-status: DYNAMIC
cf-request-id: 058b368b9f00002ff234177200000001
expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
server: cloudflare
cf-ray: 5dbef38c3b6a2ff2-ORD```

If it matters, I’m using Ruby 2.7 on macOS Big Sur. 
1

There are 1 best solutions below

0
On

It might be a problem on the Drupal 8 website itself as it has its own cache manager - and it seems like there's a cache per user somewhere if you have new content using a web browser.

It is easy to see which cache contexts a certain page varies by and which cache tags it is invalidated by: one must only look at the X-Drupal-Cache-Contexts and X-Drupal-Cache-Tags headers!

But those headers are not available in your list. If you're in touch with the website's developers ask them to do the following:

You can debug cacheable responses (responses that implement this interface, which may be cached by Page Cache or Dynamic Page Cache) by setting the http.response.debug_cacheability_headers container parameter to true, in your services.yml. Followed by a container rebuild, which is necessary when changing a container parameter.

That will cause Drupal to send X-Drupal-Cache-Tags, X-Drupal-Cache-Contexts headers.