Facebook crawler cURL: 28 (OPERATION_TIMEOUTED) on AWS CLoudFront

419 Views Asked by At

I deployed my site on AWS cloudfront, everything is using https, using postman I have 66ms to download the file but Facebook debugger shows following error:

Curl Timeout The request to scrape the URL timed out.
Error de cURL Error de cURL: 28 (OPERATION_TIMEOUTED)

Using curl as told here: https://developers.facebook.com/docs/sharing/webmasters/crawler/ the result are:

curl -v --compressed -H "Range: bytes=0-524288" -H "Connection: close" -A "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" "https://ubiqq.com/IngenierosCHILE/dia-de-la-ingenieria-2020"



  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 99.84.191.88...
* Connected to ubiqq.com (99.84.191.88) port 443 (#0)
* found 148 certificates in /etc/ssl/certs/ca-certificates.crt
* found 594 certificates in /etc/ssl/certs
* ALPN, offering http/1.1
* SSL connection using TLS1.2 / ECDHE_RSA_AES_128_GCM_SHA256
*        server certificate verification OK
*        server certificate status verification SKIPPED
*        common name: ubiqq.com (matched)
*        server certificate expiration date OK
*        server certificate activation date OK
*        certificate public key: RSA
*        certificate version: #3
*        subject: CN=ubiqq.com
*        start date: Fri, 01 May 2020 00:00:00 GMT
*        expire date: Tue, 01 Jun 2021 12:00:00 GMT
*        issuer: C=US,O=Amazon,OU=Server CA 1B,CN=Amazon
*        compression: NULL
* ALPN, server accepted to use http/1.1
> GET /IngenierosCHILE/Educacion-en-ingenieria-en-tiempos-de-pandemia HTTP/1.1
> Host: ubiqq.com
> User-Agent: facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
> Accept: */*
> Accept-Encoding: deflate, gzip
> Range: bytes=0-524288
> Connection: close
>
< HTTP/1.1 200 OK
< Content-Type: text/html
< Transfer-Encoding: chunked
< Connection: close
< Date: Sat, 23 May 2020 18:35:29 GMT
< x-amzn-RequestId: 8d6763e8-755d-4c25-8a03-dae704aebac1
< Access-Control-Allow-Origin: *
< x-amz-apigw-id: M_3yuF5pIAMF2HA=
< Cache-Control: max-age = 86300
< X-Amzn-Trace-Id: Root=1-5ec96cde-2d079cbf298be05bf81fc01e;Sampled=0
< Access-Control-Allow-Credentials: false
< Via: 1.1 2ad0cde89ab58d454177893ae4447f50.cloudfront.net (CloudFront), 1.1 9742923607374c982a5b7e9258144eab.cloudfront.net (CloudFront)
< X-Amz-Cf-Pop: IAD89-C1
< Content-Encoding: gzip
< Vary: Accept-Encoding
< X-Cache: Hit from cloudfront
< X-Amz-Cf-Pop: IAD89-C2
< X-Amz-Cf-Id: QKbU0J_IgXlTdcGG4lMV7KftU2Y3TsdC1UQi7azGXMhiaAzDp_WfLA==
< Age: 52
<
{ [16360 bytes data]
100  223k    0  223k    0     0  2336k      0 --:--:-- --:--:-- --:--:-- 2351k
* Closing connection 0

I don't how to fix :/

EDIT:

I found the error, it's tricky.

I implemented a server side rendering that takes more than 10 seconds on the rendering and is the origin for cloudfront. I created an script to make the first call, this way the first call takes 10s+ and is stored on cloudfront, then the following calls are on the edge and takes less that 100ms to be served.

The issue is that the Facebook's crawler hits cloudfront on another edge and that edge do not have the data on the cache and goes to the origin to get it rather than get it from the edge where I make the first call, and as that origin takes more than 10s, the crawler abort because it waits up to 10s.

To solve this I must create a SSR that takes less that 10s or make a call to all edges trying to find which one facebook's crawler hits.

0

There are 0 best solutions below