Select i+4 elements from a python dictionary based on a keyword

69 Views Asked by At

I have a Python dictionary as follows:

ip_dict = {
  "doc_1": "ADMINISTRATION LIABILITY COVERAGE PART CG7023 1096 EXCL-ASBESTOS",
  "doc_2": "DIRECT BILL L7F6 20118 INSURED COPY ACP GLDO 7285650787 919705952 43 0001404",
  "doc_3": "What Contractor Additional Insured LIABILITY CG 20 10 04 13 THIS ENDORSEMENT CHANGES",
  "doc_4": "That portion of \"your work\" out of which the 1. Required by the contract or agreement",
  "doc_5": "LIABILITY CG 20 10 04 13 Contractor Additional Insured THIS ENDORSEMENT CHANGES THE POLICY",
  "doc_6": "That portion of \"your work\" out of which the 1. Contractor Additional Insured Required",
  "doc_7": "LIABILITY CG 20 26 04 13 THIS ENDORSEMENT CHANGES THE POLICY.",
  "doc_8": "COMMERCIAL GENERAL LIABILITY CG 21 87 0115 THIS ENDORSEMENT CHANGES THE POLICY.",
  "doc_9": "Page 2 of 2 ACP GLDO7285650787 L7F6 20118 CG 21 87 01 15 B. The following definitions are added",
  "doc_10": "POLICY NUMBER: THIS ENDORSEMENT CHANGES THE POLICY. COMMERCIAL GENERAL LIABILITY CG 25 03 05 09 ",
  "doc_11": "Page 2 of 2 ACP GLDO7285650787 L7F6 20118 CG 25 03 05 09 B"
}

Now I want search for the keyword Contractor Additional Insured in the values and if found then extract that element plus the next 4 consecutive elements appearing after that element and store in a new dictionary. So my output would look something like this:

op_dict = {
"doc_3": "What Contractor Additional Insured LIABILITY CG 20 10 04 13 THIS ENDORSEMENT CHANGES",
"doc_4": "That portion of \"your work\" out of which the 1. Required by the contract or agreement",
"doc_5": "LIABILITY CG 20 10 04 13 Contractor Additional Insured THIS ENDORSEMENT CHANGES THE POLICY",
"doc_6": "That portion of \"your work\" out of which the 1. Contractor Additional Insured Required",
"doc_7": "LIABILITY CG 20 26 04 13 THIS ENDORSEMENT CHANGES THE POLICY.",
"doc_8": "COMMERCIAL GENERAL LIABILITY CG 21 87 0115 THIS ENDORSEMENT CHANGES THE POLICY.",
"doc_9": "Page 2 of 2 ACP GLDO7285650787 L7F6 20118 CG 21 87 01 15 B. The following definitions are added",
"doc_10": "POLICY NUMBER: THIS ENDORSEMENT CHANGES THE POLICY. COMMERCIAL GENERAL LIABILITY CG 25 03 05 09 ",
}

Here the keyword appears in the third element doc_3, so we consider 4 elements after doc_3 i.e. doc_4, doc_5, doc_6, doc_7. Hence elements till doc_7 will be considered.

Now next the keyword appears in doc_5. Hence 4 elements after doc_5 (which are doc_6, doc_7, doc_8, doc_9).

Similarly next the keyword appears in doc_6 so the next 4 consecutive elements will be selected (doc_7, doc_8, doc_9, doc_10).

Any help is appreciated!

3

There are 3 best solutions below

4
Chris On BEST ANSWER

Let's convert your dict to a list of tuples with indices.

>>> lst = list(enumerate(ip_dict.items()))
>>> lst
[(0, ('doc_1', 'ADMINISTRATION LIABILITY COVERAGE PART CG7023 1096 EXCL-ASBESTOS')), 
 (1, ('doc_2', 'DIRECT BILL L7F6 20118 INSURED COPY ACP GLDO 7285650787 919705952 43 0001404')), 
 (2, ('doc_3', 'What Contractor Additional Insured LIABILITY CG 20 10 04 13 THIS ENDORSEMENT CHANGES')), 
 (3, ('doc_4', 'That portion of "your work" out of which the 1. Required by the contract or agreement')), 
 (4, ('doc_5', 'LIABILITY CG 20 10 04 13 Contractor Additional Insured THIS ENDORSEMENT CHANGES THE POLICY')), 
 (5, ('doc_6', 'That portion of "your work" out of which the 1. Contractor Additional Insured Required')), 
 (6, ('doc_7', 'LIABILITY CG 20 26 04 13 THIS ENDORSEMENT CHANGES THE POLICY.')), 
 (7, ('doc_8', 'COMMERCIAL GENERAL LIABILITY CG 21 87 0115 THIS ENDORSEMENT CHANGES THE POLICY.')), 
 (8, ('doc_9', 'Page 2 of 2 ACP GLDO7285650787 L7F6 20118 CG 21 87 01 15 B. The following definitions are added')), 
 (9, ('doc_10', 'POLICY NUMBER: THIS ENDORSEMENT CHANGES THE POLICY. COMMERCIAL GENERAL LIABILITY CG 25 03 05 09 ')), 
 (10, ('doc_11', 'Page 2 of 2 ACP GLDO7285650787 L7F6 20118 CG 25 03 05 09 B'))]

Now, get all indices where the keyword is found.

>>> idxs = [i for i, x in lst if 'Contractor Additional Insured' in x[1]]
>>> idxs
[2, 4, 5]

Now we can use a set comprehension to get the indices within 4 elements of each index.

>>> {j 
...  for i in idxs 
...  for j in range(i, i+5)}
{2, 3, 4, 5, 6, 7, 8, 9}

And then a dictionary comprehension over lst checking for membership in that set.

>>> {v[0]: v[1] 
...  for i, v in lst 
...  if i in {j for i in idxs for j in range(i, i+4)}}
{'doc_3': 'What Contractor Additional Insured LIABILITY CG 20 10 04 13 THIS ENDORSEMENT CHANGES', 
 'doc_4': 'That portion of "your work" out of which the 1. Required by the contract or agreement',
 'doc_5': 'LIABILITY CG 20 10 04 13 Contractor Additional Insured THIS ENDORSEMENT CHANGES THE POLICY',
 'doc_6': 'That portion of "your work" out of which the 1. Contractor Additional Insured Required',
 'doc_7': 'LIABILITY CG 20 26 04 13 THIS ENDORSEMENT CHANGES THE POLICY.',
 'doc_8': 'COMMERCIAL GENERAL LIABILITY CG 21 87 0115 THIS ENDORSEMENT CHANGES THE POLICY.',
 'doc_9': 'Page 2 of 2 ACP GLDO7285650787 L7F6 20118 CG 21 87 01 15 B. The following definitions are added',
 'doc_10': 'POLICY NUMBER: THIS ENDORSEMENT CHANGES THE POLICY. COMMERCIAL GENERAL LIABILITY CG 25 03 05 09 '}
2
ti7 On

You can do this by combining a loop with a counter

def also_a_few_more(data_dict, substring, count=4):
    countdown = -1  # too low to quit early
    for key, value in data_dict.items():
        if substring in value:  # reset the countdown
            countdown = count
        if countdown >= 0:      # within the counted values
            yield key, value 
            countdown -= 1
            # inject break to stop after beginning runs
            #if countdown < 0:   # out of values to count
            #    break           # quit loop (return)
>>> dict(also_a_few_more(ip_dict, "Contractor Additional Insured"))
{'doc_3': 'What Contractor Additional Insured LIABILITY CG 20 10 04 13 THIS ENDORSEMENT CHANGES', 'doc_4': 'That portion of "your work" out of which the 1. Required by the contract or agreement', 'doc_5': 'LIABILITY CG 20 10 04 13 Contractor Additional Insured THIS ENDORSEMENT CHANGES THE POLICY', 'doc_6': 'That portion of "your work" out of which the 1. Contractor Additional Insured Required', 'doc_7': 'LIABILITY CG 20 26 04 13 THIS ENDORSEMENT CHANGES THE POLICY.', 'doc_8': 'COMMERCIAL GENERAL LIABILITY CG 21 87 0115 THIS ENDORSEMENT CHANGES THE POLICY.', 'doc_9': 'Page 2 of 2 ACP GLDO7285650787 L7F6 20118 CG 21 87 01 15 B. The following definitions are added', 'doc_10': 'POLICY NUMBER: THIS ENDORSEMENT CHANGES THE POLICY. COMMERCIAL GENERAL LIABILITY CG 25 03 05 09 '}
0
arjunsiva On
keys = []
for k,v in ip_dict.items():
    if "Contractor Additional" in v:
        keys.append(k)

new_dict = {}

for k in keys:
    doc_num = int(k[4:])
    new_dict[k] = ip_dict[k]
    for d_n in range(doc_num + 1, doc_num + 5):
        if f"doc_{d_n}" in ip_dict:
            new_dict[f"doc_{d_n}"] = ip_dict[f"doc_{d_n}"]
        # end_if
    # end_loop
# end_loop

print(new_dict)