How can I remove datetime elements of a list outside of a specified startdate and enddate period?

766 Views Asked by At

I have a list of datetime objects called 'date', and I am trying to remove the elements of the list which are outside of a startdate and an enddate. Can anyone help me understand how to properly do this and why I am getting this list index out of range error? I feel like I am so close!

My code:

startDate = datetime.strptime('1948-1-1',"%Y-%m-%d")
endDate = datetime.strptime('1950-2-1',"%Y-%m-%d")

for row in range(0,len(date)):
  if date[row] < startDate:
    del date[row]
  elif date[row] > endDate:
    del date[row]

List index out of range error

I have also tried the following way and it runs but does not delete the list elements:

count = 0

for row in date:
  if row < startDate:
    del date[count]
  elif row > endDate:
    del date[count]
  count += 1
3

There are 3 best solutions below

4
On BEST ANSWER

as you are looping through the list and deleting the same list, which is making it out of the index. think like, you are looping through len(list) but the list is not of the same length as deleted some entries.

so list comprehension can be helpful here, pls note I changed > and < to other way for the expected result, please see the example below:

from datetime import datetime
# datasetup
date=['1947-01-01','1948-01-01','1948-02-02','1951-01-01']
date=[datetime.strptime(each,"%Y-%m-%d") for each in date]
#Control date
startDate = datetime.strptime('1948-1-1',"%Y-%m-%d")
endDate = datetime.strptime('1950-2-1',"%Y-%m-%d")
#list comprehension
date = [each for each in date if  each >= startDate and each <= endDate ]

taking a solution to further, download the data from google drive, filter the needed data using pandas, and then plot it for analysis. step 1- download the data

import pandas as pd
import requests
from io import StringIO

gd_url='https://drive.google.com/file/d/1N2J136mog2CZK_XRyL3pxocaoUV8DByS/view?usp=sharing'
file_id = gd_url.split('/')[-2]
download_url='https://drive.google.com/uc?export=download&id=' + file_id
url = requests.get(download_url).text # get the file
csv_raw = StringIO(url)
df = pd.read_csv(csv_raw)
print(df.head(1))

Step 2: filter the data

#Control date
startDate = '1948-01-01'
endDate = '1950-02-01'
df_new=df.loc[(df['DATE'] >= startDate) & (df['DATE'] <= endDate)] # as doing string compare, make sure that 
#data looks okay otherwise change it to date for comparision

Step 3: show the graph.

import pandas as pd
import matplotlib.pyplot as plt
df_new.plot()
plt.show() 

enter image description here

0
On

Here is a similar snippet of code with the same problem as yours.

numbers = list(range(10))

for i in range(len(numbers)):
    if numbers[i] < 3:
        del numbers[i]
    elif numbers[i] > 7:
        del numbers[i]

The problem with this is that the range(len(numbers)) is created right at the beginning of the loop, and it doesn't notice that the length of numbers has changed as it iterates.

This could be fixed with a while loop:

numbers = list(range(10))

i = 0
while i < len(numbers):
    if numbers[i] < 2:
        del numbers[i]
    elif numbers[i] > 7:
        del numbers[i]
    else:
        i += 1

print(numbers)

Note that i is only incremented if nothing is deleted from the list, since if say, index 1 is deleted, then the item from index 2 will move left to fill the gap, so index 1 needs to be checked again.

However this solution is verbose and not very Pythonic, and quite inefficient (O(n^2) complexity, since deleting an item from a list is O(n), and it may be done n times). I would recommend you use a list comprehension to filter values like this:

numbers = list(range(10))

print([number for number in numbers if 2 <= number <= 7])

Or, if more complex computation is needed inside the loop, you could append to a new list (O(n) complexity in total):

numbers = list(range(10))
new_numbers = []
for i in range(len(numbers)):
    do_delete = False
    if numbers[i] < 2:
        do_delete =True
    elif numbers[i] > 7:
        do_delete = True

    if not do_delete:
        new_numbers.append(numbers[i])

print new_numbers

Or you could use a generator function (also O(n)):

numbers = list(range(10))

def my_filter(numbers):
    for i in range(len(numbers)):
        do_delete = False
        if numbers[i] < 2:
            do_delete = True
        elif numbers[i] > 7:
            do_delete = True

        if not do_delete:
            yield numbers[i]

print(list(my_filter(numbers)))
0
On

Use below sample code

from datetime import datetime


startDate = datetime.strptime('1948-1-1',"%Y-%m-%d")
endDate = datetime.strptime('1950-2-1',"%Y-%m-%d")


date_list = []

date_list.append( datetime.strptime('1949-1-1',"%Y-%m-%d"))
date_list.append( datetime.strptime('1949-2-1',"%Y-%m-%d"))
date_list.append( datetime.strptime('1949-2-3',"%Y-%m-%d"))
date_list.append( datetime.strptime('1950-2-3',"%Y-%m-%d"))
date_list.append( datetime.strptime('1950-2-1',"%Y-%m-%d"))
date_list.append( datetime.strptime('1999-2-1',"%Y-%m-%d"))
date_list.append( datetime.strptime('1993-2-1',"%Y-%m-%d"))
date_list.append( datetime.strptime('1995-2-1',"%Y-%m-%d"))


new_list = copy.deep_copy(date_list)

for idx, date in enumerate(ll):
    if not date < startDate or not date > endDate:
        new_list.append(date)
        

print(new_list)