extract strings from patterned string list and convert it into dataFrame in python

136 Views Asked by At

I have a list which contains patterned string like this:

['"Bandcamp" (2014)\t\t\t\t\ttv-mini-series',
'"ByMySide" (2012){The Happening (#1.3)}\t\t\t\t\ttwitter-hashtag-in-title',
'"Elmira" (2014)\t\t\t\t\telmira-new-york',
'"Elmira" (2014){The Happening (#1.3)}\t\t\tfriend',
...]

Now, I am trying to extract sub-strings from each line, and make them into a data frame like:

Movie    Year Keyword
Bandcamp 2014 tv-mini-series
ByMySide 2012 twitter-hashtag-in-title
Elmira   2014 elmira-new-york
Elmira   2014 friend
...
1

There are 1 best solutions below

1
On BEST ANSWER

Here you go:

>>> a
['"Bandcamp" (2014)\t\t\t\t\ttv-mini-series', '"ByMySide" (2012){The Happening (#1.3)}\t\t\t\t\ttwitter-hashtag-in-title', '"Elmira" (2014)\t\t\t\t\telmira-new-york', '"Elmira" (2014){The Happening (#1.3)}\t\t\tfriend']
>>> data = []
>>> for x in a:
...     data.append(re.findall("\"(\w+)\" \((\d+)\).*\t{2,5}(\S+)", x)[0])
... 
>>> import pandas as pd
>>> pd.DataFrame(data, columns=['Movie', 'Year', 'Keyword'])
      Movie  Year                   Keyword
0  Bandcamp  2014            tv-mini-series
1  ByMySide  2012  twitter-hashtag-in-title
2    Elmira  2014           elmira-new-york
3    Elmira  2014                    friend