URL parsing: Get N number of folders followed by filename starting from a certain folder

Question

URL parsing: Get N number of folders followed by filename starting from a certain folder

114 Views Asked by coredumped0x At 01 October 2020 at 17:42

I have a URL which could have any number of folders, and it ends with a filename.extension.

example:

https://cdn.example.com/user/image/upload/v87879798/images/profile/oaz4wkjkjsbzxa3xlkmu.jpg

I am trying to get everything after the /v87879798 version folded, thus:

images/profile/oaz4wkjkjsbzxa3xlkmu.jpg

I have tried a number of approaches as below, but nothing worked as I know I will most likely need a regular expression, but my knowledege of them don't allow me to construct such ones yet. Some of the methods I tried are:

import os
from urllib.parse import urlparse
# url https://cdn.example.com/user/image/upload/v87879798/images/profile/oaz4wkjkjsbzxa3xlkmu.jpg
parsed_url = urlparse(url)
# parsed_url.path /lang-code/image/upload/v1601568948/images/profile
path = os.path.dirname(parsed_url.path)
# file_name oaz4wkjkjsbzxa3xlkmu.jpg
file_name = os.path.basename()

But nothing that works so far. Any help would be greatly appreciated.

Edit: Sorry, forgot to mention What I meant by saying N number of folders is that any of the below urls could be possible:

https://cdn.example.com/user/image/upload/v87879798/images/profile/oaz4wkjkjsbzxa3xlkmu.jpg

https://cdn.example.com/user/image/upload/v87879798/images/oaz4wkjkjsbzxa3xlkmu.jpg

https://cdn.example.com/user/image/upload/v87879798/oaz4wkjkjsbzxa3xlkmu.jpg

Original Q&A

There are 1 best solutions below

**Rakesh** · Accepted Answer · 2020-10-01T17:47:20.750000

Using Regex pattern --> r"v\d+\/(.+)$". Assuming the random numbers start with v

Ex:

import re
from urllib.parse import urlparse

ptrn = re.compile(r"v\d+\/(.+)$")
url = "https://cdn.example.com/user/image/upload/v87879798/images/profile/oaz4wkjkjsbzxa3xlkmu.jpg"
parsed_url = urlparse(url)
print(ptrn.search(parsed_url.path).group(1))

Output:

images/profile/oaz4wkjkjsbzxa3xlkmu.jpg

Demo:

ptrn = re.compile(r"v\d+\/(.+)$")
urls = ["https://cdn.example.com/user/image/upload/v87879798/images/profile/oaz4wkjkjsbzxa3xlkmu.jpg", "https://cdn.example.com/user/image/upload/v87879798/images/profile/oaz4wkjkjsbzxa3xlkmu.jpg",
       "https://cdn.example.com/user/image/upload/v87879798/images/oaz4wkjkjsbzxa3xlkmu.jpg", "https://cdn.example.com/user/image/upload/v87879798/oaz4wkjkjsbzxa3xlkmu.jpg"]

for url in urls:
    parsed_url = urlparse(url)
    print(ptrn.search(parsed_url.path).group(1))

Output:

images/profile/oaz4wkjkjsbzxa3xlkmu.jpg
images/profile/oaz4wkjkjsbzxa3xlkmu.jpg
images/oaz4wkjkjsbzxa3xlkmu.jpg
oaz4wkjkjsbzxa3xlkmu.jpg

URL parsing: Get N number of folders followed by filename starting from a certain folder

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in REGEX

Related Questions in URLPARSE

Trending Questions

Popular # Hahtags

Popular Questions