I have an ordered list of things to process that includes some duplicates and I only want to process the first occurrence. Presently, I'm doing it like this in Python v2.7:
seen = set()
for (value, fmt) in formats:
if fmt not in seen:
seen.add(fmt)
process(value, fmt)
Is there anyway to simultaneously insert a new element into seen
and detect whether or not it was already present? (This would avoid the repeated lookup of fmt
in the set
.)
seen = set()
for (value, fmt) in formats:
# myInsert() would return true if item was not already present.
if seen.myInsert(fmt):
process(value, fmt)
Or, alternatively, can I somehow just filter my formats
to exclude duplicate entries before looping?
unique_formats = removeDuplicates(formats, key=itemgetter(1))
for (value, fmt) in unique_formats:
process(value, fmt)
You could take the length of the set before and after the
add()
. If it didn't change, the format was already in the set.Your question presumes that the
in
test is a costly operation. This turns out not to be the case. Usinglen()
can take more time, although both are quite fast;(measured with CPython 2.7.3 on a 2.5 GHz Core Quad Q9300)