Importxml function (googlesheets) not working for Skillshare website?

209 Views Asked by At

I'm here today because I have a strange problem using ImportXML function in Googlesheets.

I'm trying to extract the hrefs (facebook, twitter, youtube, etc...) from different profiles in 2 websites: udemy.com and skillshare.com.

I got my goal from udemy profiles, but got no luck from skillshare ones.

After trying many times (even with weird xpaths_query combinations), I believe the problem could be 2 causes: 1. Website is blicking me (is this possible?, 2. My xpath_query is bad)... However I still think the solution is very straight, but I can't see what I could be missing.

Please visit my Googlesheet with a couple of samples for better understanding, you can find more details in order, there...

Look THESE EXAMPLES: GOOD RESULTS > For Udemy, I've used the following google sheet function

=IMPORTXML("https://www.udemy.com/user/saddam-kassim-2/"; "//div[@class='instructor-profile--social-links--3Kub5']/a/@href")

N/A RESULTS > For Skillshare, I've used the following google sheet function

=IMPORTXML("https://www.skillshare.com/user/sridhar"; "//div[@class='user-information-social-links']/a/@href")

These are the pieces of html code: UDEMY SAMPLE HTML

SKILLSHARE SAMPLE HTML

I really appreciate your thoughts and ideas. What am I doing wrong? What could I try? Thanks in advance!

1

There are 1 best solutions below

0
On BEST ANSWER

The links are populated by a script after loading, so we cannot use just an XPATH. The data is also located in the third script tag on the initial document. Using the first Skillshare URL, we can construct the following formula:

=ArrayFormula(
    IFNA(
        REGEXEXTRACT(
            SPLIT(
                REGEXEXTRACT(
                    REGEXEXTRACT(
                        INDEX(
                            IMPORTXML(
                                "https://www.skillshare.com/user/profkarim",
                                "//script"
                            ),
                            3
                        ),
                        "socialLinks.*sections"
                    ),
                    "(\{.*\})"
                ),
                "},{",
                0,
                1
            ),
            """url"":""?(.*?)""?,"
        )
    )
)