Here's a simple scrapy spider that anyone can use for testing.
from scrapy.utils.response import open_in_browser
import scrapy
import json
class TestSpider(scrapy.Spider):
name = "test-spider"
allowed_domains = ["shopee.ph"]
shopee_cookies = '[{"name": "csrftoken", "value": "RvxBdTixvBfdTR3xfQwbcYippqz8jEbF", "domain": "shopee.ph", "path": "/", "expires": -1, "httpOnly": false, "secure": false, "sameSite": "Lax"}, {"name": "_gcl_au", "value": "1.1.1251411089.1692464842", "domain": ".shopee.ph", "path": "/", "expires": 1700240842, "httpOnly": false, "secure": false, "sameSite": "Lax"}, {"name": "SPC_SI", "value": "sTLbZAAAAABwY1ZrR1NNU+WdNgAAAAAAdzlCYXIyVVQ=", "domain": ".shopee.ph", "path": "/", "expires": 1692551246.336331, "httpOnly": true, "secure": true, "sameSite": "Lax"}, {"name": "_fbp", "value": "fb.1.1692464842990.689078803", "domain": ".shopee.ph", "path": "/", "expires": 1700240846, "httpOnly": false, "secure": false, "sameSite": "Lax"}, {"name": "SPC_R_T_IV", "value": "NnVEbThnRjREMnNMZVpGVQ==", "domain": ".shopee.ph", "path": "/", "expires": 1727024846.336348, "httpOnly": false, "secure": true, "sameSite": "Lax"}, {"name": "SPC_T_ID", "value": "fn/OKngQO3doGdfFGyo/6mzLiviELHkKEbWM9J+x/ezTl/baT96grQer6ILrYX9tj3Kqs71Jg+hCimaK/XauidJXrd6HdPd2Smbxbu/fEStjOJi5g9/ucMmbBwuyh5M6H3TOGdpUop/9Q/zdpNj6MyxZaODnNsT5XprfsQxjB5g=", "domain": ".shopee.ph", "path": "/", "expires": 1727024846.336355, "httpOnly": true, "secure": true, "sameSite": "Lax"}, {"name": "SPC_T_IV", "value": "NnVEbThnRjREMnNMZVpGVQ==", "domain": ".shopee.ph", "path": "/", "expires": 1727024846.336362, "httpOnly": true, "secure": true, "sameSite": "Lax"}, {"name": "SPC_F", "value": "jiOtuCSNUaap3U4BHHfzhDihWwFht32f", "domain": ".shopee.ph", "path": "/", "expires": 1727024843.162052, "httpOnly": false, "secure": true, "sameSite": "Lax"}, {"name": "REC_T_ID", "value": "dc8a2570-3eb2-11ee-ac9b-2cea7fce6c95", "domain": ".shopee.ph", "path": "/", "expires": 1727024843.16206, "httpOnly": true, "secure": true, "sameSite": "Lax"}, {"name": "SPC_R_T_ID", "value": "fn/OKngQO3doGdfFGyo/6mzLiviELHkKEbWM9J+x/ezTl/baT96grQer6ILrYX9tj3Kqs71Jg+hCimaK/XauidJXrd6HdPd2Smbxbu/fEStjOJi5g9/ucMmbBwuyh5M6H3TOGdpUop/9Q/zdpNj6MyxZaODnNsT5XprfsQxjB5g=", "domain": ".shopee.ph", "path": "/", "expires": 1727024846.33634, "httpOnly": false, "secure": true, "sameSite": "Lax"}, {"name": "_QPWSDCXHZQA", "value": "4a585493-a7a0-4f0e-d696-687295d3a4c3", "domain": "shopee.ph", "path": "/", "expires": 1692496379, "httpOnly": false, "secure": false, "sameSite": "Lax"}, {"name": "IDE", "value": "AHWqTUm1b5ZflCqDTn6cpHDjyoeqH6iLfXcCOOm4YNaP8CHTsAZ7F_Daq4-zO-bsGIk", "domain": ".doubleclick.net", "path": "/", "expires": 1727024843.787698, "httpOnly": true, "secure": true, "sameSite": "None"}, {"name": "AMP_TOKEN", "value": "%24NOT_FOUND", "domain": ".shopee.ph", "path": "/", "expires": 1692468444, "httpOnly": false, "secure": false, "sameSite": "Lax"}, {"name": "_ga", "value": "GA1.2.833255521.1692464843", "domain": ".shopee.ph", "path": "/", "expires": 1727024844.498551, "httpOnly": false, "secure": false, "sameSite": "Lax"}, {"name": "_gid", "value": "GA1.2.1347861977.1692464844", "domain": ".shopee.ph", "path": "/", "expires": 1692551244, "httpOnly": false, "secure": false, "sameSite": "Lax"}, {"name": "_dc_gtm_UA-61918643-6", "value": "1", "domain": ".shopee.ph", "path": "/", "expires": 1692464904, "httpOnly": false, "secure": false, "sameSite": "Lax"}, {"name": "shopee_webUnique_ccd", "value": "raj%2F3ukNopIWTrFjVLQeGA%3D%3D%7C1%2BjiV3ga9OlzuAELTZtedUY5BlP1ZNVH5ybZJx2D4KNA9dGTvtFakjnNZvR64zKNG6yBDfEXdabTE%2FRKow%3D%3D%7CsWIQ7u7pR4F3BD7E%7C08%7C3", "domain": "shopee.ph", "path": "/", "expires": 1692496381, "httpOnly": false, "secure": false, "sameSite": "Lax"}, {"name": "ds", "value": "065598fda3b7cca4e5e241e446a075e9", "domain": "shopee.ph", "path": "/", "expires": 1692496381, "httpOnly": false, "secure": false, "sameSite": "Lax"}, {"name": "SPC_EC", "value": "RTJYa2Q5WEV4UDNnN3VGWr68rFv1FRJEeVkpwAzlu09WhtwSxFE1cZlwpQYRhhR56REixPuKfekz6oioE4EaDK12bvALil+QZ5B0EfG42psIFWNDe1moiErTZndyu1502KUlh5+OQoUWCvm1XkVY+2Iy7Jk5qyPI2J655JeZwv0=", "domain": ".shopee.ph", "path": "/", "expires": 1727024846.336291, "httpOnly": true, "secure": true, "sameSite": "Lax"}, {"name": "SPC_ST", "value": ".ek1DVmo5aGJjaVBxcklYU5o4/3v/8ndPeV2/fwtzWYUh1kWOopWvn7SFoQXWuS37Rs+J+Ym7U8OwOG73JbiFRWyOOo1GhKBgwhUeeWfE+q9XPDZXACC33t7qphoBu5hyWvR/G+WkpSUbIkmGPzprCIvhw7Qwyt8UFxk/4bA+47QQQUiDcPfHIq/sJqmVMEqH3Al6nCTDeEh/JCDLALRvNQ==", "domain": ".shopee.ph", "path": "/", "expires": 1727024846.336324, "httpOnly": true, "secure": true, "sameSite": "Lax"}, {"name": "SPC_CLIENTID", "value": "amlPdHVDU05VYWFwgvlavxoisbqjmacw", "domain": ".shopee.ph", "path": "/", "expires": 1727024846.336374, "httpOnly": false, "secure": false, "sameSite": "Lax"}, {"name": "_ga_CB0044GVTM", "value": "GS1.1.1692464843.1.0.1692464846.57.0.0", "domain": ".shopee.ph", "path": "/", "expires": 1727024846.367333, "httpOnly": false, "secure": false, "sameSite": "Lax"}]'
shopee_cookies = json.loads(shopee_cookies)
def start_requests(self):
yield scrapy.Request(
"https://shopee.ph/api/v4/pdp/get_pc?shop_id=237078553&item_id=6929743700",
cookies=self.shopee_cookies,
headers={"x-api-source":"pc","af-ac-enc-dat":"null"},
callback=self.parse_item,
)
def parse_item(self,response):
open_in_browser(response)
Feel free to test it out as I provided the cookies as well (because the cookies are needed). Now as you can see, this piece of code actually worked before, around early August 2023. I had challenges to make it work before but thanks to this answer I managed to get the products data. You can even see my comment there. Here's an image I screenshot before proving that it did work around early August.
As you can see the data is there and works well. Thanks to the headers {"x-api-source":"pc","af-ac-enc-dat":"null"}
that made it worked. However as of August 20, 2023 as I am typing this. It seems that it doesn't work anymore. I'm not sure why, but I think there's some changes with the API that has happened. I spent all day trying to figure out and play with the headers but no luck. All I got right now as a result is this.
Output I am having right now:
{"is_customized":false,"is_login":true,"platform":0,"action_type":2,"error":90309999,"tracking_id":"24d95bd5-40e5-44cd-b30b-885711481170","report_extra_info":""}
Here is the actual product page link I used for testing. You can see the API there when you do "Inspect Element" -> "Network" tab. Take note that the output I am having right now is the same one I had before I managed to implement this solution. But right now it's back at it again. So the question is, could there be a way to make it work again? I feel like it's something with the headers that I am not getting it right, but I am not sure how to figure it out and that is why I am seeking help right now as I am out of solutions.
It looks like Shopee's development team has recently ramped up their anti-crawler measures. It took me several weeks of reverse engineering to "open" their API:
If you want to forge a request to their API, you need to figure out how to generate the following HTTP headers:
x-csrftoken
is the same as the cookie "csrftoken".af-ac-enc-dat
,af-ac-enc-sz-token
,x-sap-access-f
,x-sap-access-s
andx-sap-access-t
are provided by the Security SDK, found in the script likehttps://deo.shopeemobile.com/shopee/web-sdk/js/live/*.js
. It is accessible via the global variablewindow.ssdk00oQOOooO00QoQO
. For example, you can get the SDK version by calling:To understand what are the arguments to obtain the various HTTP header values, put a breakpoint in this function.
x-sap-ri
andx-sap-sec
are set by another script. You don't need to manually set them, just use the fetch function as it's been monkey patched to automatically include these headers.x-sz-sdk-version
is a concatenation of the Security SDK version (obtainable as shown above) and the "Base SDK Util" version, which can be tricky to find directly. It's in thehttps://deo.shopeemobile.com/shopee/shopee-pcmall-live-sg/assets/*.js
script that includes thecreateBaseSdkUtils
function definition, the version is identifiable by a regexvar version="([0-9.]+)"
: Base SDK Util versionEffectively scraping Shopee, especially at scale, demands web browser automation tools like Puppeteer, alongside strategies like fingerprint injection and residential proxies. If you prefer a ready-to-use solution, you can use my Apify scraper.