Regex lookbehind to extract string

94 Views Asked by At

So I have this ugly string which I'm picking up off the wire:

{"feedtype": "playlist", "base_url": "http://feeds.xhis.com/rteavgen/player/", "feed_title": "Single Item Playlist", "feedid": "playlist", "alt_url": "http://www.xhis.com/player/#v=10322367", "platform": "iptv", "current_date": "2014-11-14T12:24:39.84167", "full_url": "http://feeds.xhis.com/rteavgen/player/playlist?type=iptv&showId=10343367", "shows": [{"itemid": 10332367, "showid": 11544367, "valid_start": "2014-11-13T21:37:39", "ispodcast": 0, "programmeid": 1, "BRINumber": "ih011305791", "duration": 2053247, "id": 10323367, "media:group": [{"rte:server": "http://vod.hds.xhis.com/hds-vod", "medium": "video", "url": "/2014/1113/20141113-dumbydoozle_cl10344367_10344406_260_/manifest.f4m", "type": "video/mp4", "i

It's sorta JSONy - the string I get isn't always guaranteed to be complete, so I can't parse it. Also, the protocol could change.

Anyway, I'm trying to do this:

  • find "manifest.f4m"
  • extract the string: "/2014/1113/20141113-dumbydoozle_cl10344367_10344406_260_/manifest.f4m"

Once I have the location of manifest.f4m, I'm done.


So I'm trying to formulate a regex to do this reliably, but I'm having terrible trouble...

Here's my regex sofar:

/(?<=\/)manifest.f4m(?=("|\s))/

It matches "manifest.f4m" (with either a " or a space after it).

I'm a bit stuck with the lookbehind - I want to look back to the first "/" and extract the entire string that is pointed to by "url".

Though maybe there's a much better way of doing all this?

2

There are 2 best solutions below

0
On

Could you just start from the url: part and use non capturing group I presume that at least it will be present I test it against your example and seem to work

\b(?:url.+)(/.+manifest\.f4m)
0
On

So I came up with this regex:

[-A-Za-z0-9+&@#\/%?=~_|!:,.;]+[-A-Za-z0-9+&@#\/%=~_|]manifest\.f4m(?=("|\s))

It seems to work rather well.

http://regex101.com/r/iT7vG2/2