Remove substring from ordereddict

52 Views Asked by At

I want to remove the chr prefix from all the keys in an OrderedDict and update the OrderedDict with the new keys.

from collections import OrderedDict

for key, item in hg38_genome.items():
    key = {x.removeprefix("chr") for x in key}
    hg38_genome = OrderedDict([(key, item)])

Traceback:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [29], in <cell line: 1>()
      1 for key, item in hg38_genome.items():
      2     key = {x.removeprefix("chr") for x in key}
----> 3     hg38_genome = OrderedDict([(key, item)])

TypeError: unhashable type: 'set'

hg38_genome

OrderedDict([('chr1',
              <bioframe.io.fileops.PysamFastaRecord at 0x2baa3dca9100>),
             ('chr10',
              <bioframe.io.fileops.PysamFastaRecord at 0x2baa3dca9130>),
             ('chr11',
              <bioframe.io.fileops.PysamFastaRecord at 0x2baa3dca9610>),
             ('chr11_KI270721v1_random')]

Expected output:

OrderedDict([('1',
              <bioframe.io.fileops.PysamFastaRecord at 0x2baa3dca9100>),
             ('10',
              <bioframe.io.fileops.PysamFastaRecord at 0x2baa3dca9130>),
             ('11',
              <bioframe.io.fileops.PysamFastaRecord at 0x2baa3dca9610>),
             ('11_KI270721v1_random')]
1

There are 1 best solutions below

0
suraj sharma On

Just create a new OrderedDict where the keys are modified.

# assuming hg38_genome is your initial OrderedDict
new_hg38_genome = OrderedDict((key.removeprefix('chr'), val) for key, val in hg38_genome.items())


print(new_hg38_genome)

Here is the complete code.

from collections import OrderedDict

hg38_genome = OrderedDict([
    ('chr1', 'bioframe.io.fileops.PysamFastaRecord at 0x2baa3dca9100'),
    ('chr10', 'bioframe.io.fileops.PysamFastaRecord at 0x2baa3dca9130'),
    ('chr11', 'bioframe.io.fileops.PysamFastaRecord at 0x2baa3dca9610'),
    ('chr11_KI270721v1_random', 'bioframe.io.fileops.PysamFastaRecord at 0x2baa3dca9610')
])


new_hg38_genome = OrderedDict((key.removeprefix('chr'), val) for key, val in hg38_genome.items())


print(new_hg38_genome)