I have the following source code from the Wikipedia page of a list of Games. I need to grab the name of the game from the source, which is located within the title attribute, as follows:
<td><i><a href="/wiki/007:_Quantum_of_Solace" title="007: Quantum of Solace">007: Quantum of Solace</a></i><sup id="cite_ref-4" class="reference"><a href="#cite_note-4"><span>[</span>4<span>]</span></a></sup></td>
As you can see above, in the title attribute there's a string. I need to use GREP to search through every single line for when that occurs, and remove everything excluding:
title="Game name"
I have the following (in TextWrangler) which returns every single occurrence:
title="(.*)"
How can I now set it to remove everything surrounding that, but to ensure it keeps either the string alone, or title="string".
I've figured this problem out, it was quite simple. Instead of retrieving the content in the title attribute, I'd retrieve the page name.
To ensure I only struck the correct line where the content was, I'd use the following string for searching the code.
(.)/wiki/(.)" Returning \2
After that, I simply remove any cases where there is HTML code:
<(.*) Returning ''
Finally, I'll remove the remaining content after the page name:
"(.*) Returning ''
A bit of cleaning up the spacing and I have a list for all game names.