How to modify subsetting and datetime handling with .loc[] to avoid warning?

27 Views Asked by At

I try to practice time series with real data. The difficulty is in data wrangling.

This exercise is to show the local passenger departure trend of one of the borders in Hong Kong in 2023.

Jupyter warned me twice about subsetting and datetime handling, but I do not know how to change accordingly and why the warning is needed. Grateful if you can point out the solution. Thank you very much.

The code is here:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from datetime import date
from datetime import datetime

df = pd.read_csv("https://www.immd.gov.hk/opendata/eng/transport/immigration_clearance/statistics_on_daily_passenger_traffic.csv")

df = df.iloc[: , :-1]
df = df[df["Date"].str.contains("2023") == True]

#First warning

df["Date"] = df["Date"].apply(lambda x: datetime.strptime(str(x), "%d-%m-%Y"))
SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["Date"] = df["Date"].apply(lambda x: datetime.strptime(str(x), "%d-%m-%Y"))

Continue the script:

options = ['Airport', 'Express Rail Link West Kowloon', 'Lo Wu', 'Lok Ma Chau Spur Line', 'Heung Yuen Wai', 'Hong Kong-Zhuhai-Macao Bridge', 'Shenzhen Bay'] 
df_clean = df.loc[df['Control Point'].isin(options)] 
df_XRL = df[df["Control Point"].str.contains("Heung Yuen Wai") & df["Arrival / Departure"].str.contains("Departure")]

df_XRL = df_XRL[["Date","Hong Kong Residents"]]

The second warning:

df_XRL['Month'] = pd.DatetimeIndex(df_XRL['Date']).strftime("%b")
df_XRL['Week day'] = pd.DatetimeIndex(df_XRL['Date']).strftime("%a")
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_XRL['Month'] = pd.DatetimeIndex(df_XRL['Date']).strftime("%b")
C:\Users\User\AppData\Local\Temp\ipykernel_28232\787368475.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_XRL['Week day'] = pd.DatetimeIndex(df_XRL['Date']).strftime("%a")

Continue the script:

from numpy import nan

monthOrder = ['Jan', 'Feb', 'Mar', 'Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
dayOrder = ['Mon','Tue','Wed','Thu','Fri','Sat','Sun']

pivot_XRL = pd.pivot_table(df_XRL, index=['Month'],
                        values=['Hong Kong Residents'],
                        columns=['Week day'], aggfunc=('sum')).loc[monthOrder, (slice(None), dayOrder)]

pivot_XRL.plot(figsize = (20,8))
plt.grid()
plt.legend(loc='best');
plt.xlabel('Month',fontsize=15)
plt.ylabel('Persons',fontsize=15)
plt.rc('xtick',labelsize=15)
plt.rc('ytick',labelsize=15)
plt.legend(fontsize="x-large")
0

There are 0 best solutions below