List comprehension to remove element from python list if it is only digits (even if there are "_" or "-" in it)

97 Views Asked by At

I have many lists like this

synonyms = ["3,2'-DIHYDROXYCHALCONE", '36574-83-1', '36574831', "2',3-Dihydroxychalcone",  '(E)-1-(2-hydroxyphenyl)-3-(3-hydroxyphenyl)prop-2-en-1-one', MLS002693861]

from which I need to remove all elements that are comprised of only digits. I can't figure out how to remove element [1] because it's only digits but has random intervening dashes.

Of course, this doesn't work since the dashes make the element not a digit:

synonym_subset = [x for x in synonym_subset if not (x.isdigit())]

And I can't just remove the dashes because I want the dashes in the other elements to be retained:

synonym_subset = [x.replace('-','') for x in synonym_subset]

I could run the above to find the index of the elements to be removed, then remove them that way, but I was hoping for a one-liner.

Thanks.

3

There are 3 best solutions below

3
Andrej Kesely On BEST ANSWER

Try:

synonyms = [
    "3,2'-DIHYDROXYCHALCONE",
    "36574-83-1",
    "36574831",
    "2',3-Dihydroxychalcone",
    "(E)-1-(2-hydroxyphenyl)-3-(3-hydroxyphenyl)prop-2-en-1-one",
    "MLS002693861",
]

out = [s for s in synonyms if not all(ch in "0123456789-_" for ch in s)]
print(out)

Prints:

[
    "3,2'-DIHYDROXYCHALCONE",
    "2',3-Dihydroxychalcone",
    "(E)-1-(2-hydroxyphenyl)-3-(3-hydroxyphenyl)prop-2-en-1-one",
    "MLS002693861",
]
9
e-motta On

You can use filter [edit: although in this case you probably shouldn't, as mentioned in the comments]:

import re

synonyms = [
    "3,2'-DIHYDROXYCHALCONE",
    "36574-83-1",
    "36574831",
    "2',3-Dihydroxychalcone",
    "(E)-1-(2-hydroxyphenyl)-3-(3-hydroxyphenyl)prop-2-en-1-one",
    "MLS002693861",
]

filtered_synonyms = list(
    filter(lambda x: not re.sub(r"[-_]", "", x).isdigit(), synonyms)
)

Results in:

["3,2'-DIHYDROXYCHALCONE", "2',3-Dihydroxychalcone", '(E)-1-(2-hydroxyphenyl)-3-(3-hydroxyphenyl)prop-2-en-1-one', 'MLS002693861']
3
srn On

As a minor addition to the already posted replies, depending on the length of such strings, it could make sense to use set() - for instance:

synonyms = [
    "3,2'-DIHYDROXYCHALCONE",
    "36574-83-1",
    "36574831",
    "2',3-Dihydroxychalcone",
    "(E)-1-(2-hydroxyphenyl)-3-(3-hydroxyphenyl)prop-2-en-1-one",
    "MLS002693861",
]

myset = set("0123456789-_")
[s for s in synonyms if not set(s).issubset(myset)]

Edit: As mentioned by @no comment, this can be improved further by using issuperset like so:

isdigits = set("0123456789-_").issuperset
[s for s in synonyms if not isdigits(s)]

Each gives:

["3,2'-DIHYDROXYCHALCONE", "2',3-Dihydroxychalcone", '(E)-1-(2-hydroxyphenyl)-3-(3-hydroxyphenyl)prop-2-en-1-one', 'MLS002693861']

P.S. Another way would be using the ord(), but this is typically slower and of course it is less readable:

[s for s in synonyms if not all(ord(k) in (*range(48,58),45,95) for k in s)]