I am scraping data from local jail site. I am trying to remove all the elements from a list except for the charges. I want all the statutes, bond, etc, gone.
Here is what I have tried:
charges = [[], ['13A-12-214.1'], ["ECSO (ETOWAH COUNTY SHERIFF\\'S OFFICE)"], ['SALVIA MISD POSS'], [''], ['M'], ['$1000.00'], [], [], ['13A-10-41'], ["ECSO (ETOWAH COUNTY SHERIFF\\'S OFFICE)"], ['RESISTING ARREST'], [''], ['M'], ['$1000.00'], [], [], ['32.5A.88'], ["ECSO (ETOWAH COUNTY SHERIFF\\'S OFFICE)"], ['IMPROPER LANE USAGE'], [''], ['U'], ['$500.00'], [], [], [''], [''], ['DET FOR COMM CORR'], [''], ['U'], ['$0.00'], [], [], ['<tr>\\r\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t <td class="SearchHeader" colspan="2">']]
for string in charges:
if string == arrestedBy:
charges.remove(string)
elif string.isalpha() == False:
charges.remove(string)
elif len(string) < 2:
charges.remove(string)
if charges[-1] == '<tr>\\r\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t <td class="SearchHeader" colspan="2">':
charges.remove(charges[-1])
charges = filter(None, charges)
charges = str(charges)
What I get instead is:
"ECSO (ETOWAH COUNTY SHERIFF\\S OFFICE)", $1000.00, "ECSO (ETOWAH COUNTY SHERIFF\\S OFFICE)", $1000.00, "ECSO (ETOWAH COUNTY SHERIFF\\S OFFICE)", $500.00, $0.00
What I want is:
"SALVIA MISD POSS, RESISTING ARREST, IMPROPER LANE USAGE, DET FOR COMM CORR"
If you can't limit what you are getting to charges when you are scraping, consider, rather than iterating over the list and deleting elements as you go (which is inadvisable), using python list comprehension.
For example, if you define some function
is_charge
that contains your logic for defining a charge and returns a boolean: