I am new to Python.

This is my code:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.graph_objects as go

url = "https://storage.googleapis.com/courses_data/Assignment%20CSV/finance_liquor_sales.csv"
df = pd.read_csv(url)

print("Missing Data: \n", df.isna().sum())
df.dropna(inplace=True)

time_period = pd.date_range(start="2016-01-01", end="2019-12-31")
print(df[df["date"].isin(time_period)])

while df[df["date"]] in time_period:

    popular_item = df.groupby("zip_code")["bottles_sold"].sum().sort_values(ascending=False)
    print(popular_item)

    popular_item = plt.scatter(df["zip_code"], df["bottles_sold"])
    plt.title("Bottles Sold per region in 2016-2019")
    plt.xlabel("Zip Code")
    plt.ylabel("Bottles Sold")
    plt.show()

I want to visualize the Bottles Sold per zip code in the time range 2016-2019 and tried to write a code

time_period = pd.date_range(start="2016-01-01", end="2019-12-31")
print(df[df["date"].isin(time_period)])

while df[df["date"]] in time_period:

to obtain a time range in my data, so the calculations be derived only from this specified time period.

1

There are 1 best solutions below

1
liaifat85 On

You can modify your code like this:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.graph_objects as go

# Load libraries
url = "https://storage.googleapis.com/courses_data/Assignment%20CSV/finance_liquor_sales.csv"
df = pd.read_csv(url)

# Check for missing data
print("Missing Data: \n", df.isna().sum())

# Drop rows with missing values
df.dropna(inplace=True)

# Convert 'date' column to datetime type
df['date'] = pd.to_datetime(df['date'])

# Filter data for the specified time period
start_date = "2016-01-01"
end_date = "2019-12-31"
filtered_df = df[(df['date'] >= start_date) & (df['date'] <= end_date)]

# Calculate total bottles sold per zip code
bottles_sold_per_zip = filtered_df.groupby("zip_code")["bottles_sold"].sum().sort_values(ascending=False)

# Plotting
plt.figure(figsize=(10, 6))
plt.bar(bottles_sold_per_zip.index, bottles_sold_per_zip.values)
plt.title("Total Bottles Sold per Zip Code (2016-2019)")
plt.xlabel("Zip Code")
plt.ylabel("Total Bottles Sold")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()