Python: Dataframe loop repeats by only one element

130 Views Asked by At

When I run the code and output it, I notice that the messages for the third item in the list are output three times in a row. With the previous and subsequent elements from the list it works problem los. Can anyone help me with this, or does anyone know how to at least remove such duplicates?

Nachrichten = []
    

    
    for row in googlenews.results(): 
        table_new.append({ 
            'City': ort, 
            'Title': row['title'],  
            'URL':row['link'], 
            'Source': row['site'], }) 
    
        df = pd.DataFrame(table_new) 

dfges = pd.concat(nachrichten, axis='index')
´´´
1

There are 1 best solutions below

0
On BEST ANSWER

Your code included some issues regarding lower and upper case e. g. nachrichten vs. Nachrichten. Python is case-sensitive though.

To answer your question, you could use drop_duplicates() to eliminate duplicates based on 'Title'.

This yields:

dfges['Title'].value_counts().max()
>>> 1

Extended code:

import pandas as pd 
from GoogleNews import GoogleNews 
    
googlenews = GoogleNews() 
googlenews.set_encode('utf_8') 
googlenews.set_lang('en') 
googlenews.set_period('7d')
    
orte = ["Munich", "New York", "Madrid", "London", "Los Angeles", "Frankfurt", "Rom"] 
nachrichten = []
    
for ort in orte: 
    googlenews.clear() 
    googlenews.get_news(ort) 
    table_new = [] 
    
    for row in googlenews.results(): 
        table_new.append({ 
            'City': ort, 
            'Title': row['title'], 
            'Date': row['date'], 
            'URL':row['link'], 
            'Source': row['site'], }) 
    
        df = pd.DataFrame(table_new) 
        
    nachrichten.append(df)

dfges = pd.concat(nachrichten, axis='index')
dfges.drop_duplicates(subset=['Title'], keep='last', inplace=True)
print(dfges)