Pulling weather data into excel in bulk

92 Views Asked by At

We have a government source of weather data at https://climate.weather.gc.ca/historical_data/search_historic_data_e.html

I go and search for the station I want and then get to hourly data. It shows me one day worth, but will let me download the whole month as a csv. However, I am hoping to download or get many years of hourly data for the specific station. I would like to not do this one month at a time. I was wondering if anyone has insight on how to efficiently do this?

There is a section for get more data: https://collaboration.cmc.ec.gc.ca/cmc/climate/Get_More_Data_Plus_de_donnees/ but I am not well versed enough to fully understand how to use this and then what is the best way to get it into excel format.

Thanks!!

I'm not sure how to pull more than the one month csv file which is what I have tried so far.

1

There are 1 best solutions below

0
On

The site says

Hourly data interval:

for year in seq 1998 2008;do for month in seq 1 12;do wget --content-disposition "https://climate.weather.gc.ca/climate_data/bulk_data_e.html?format=csv&stationID=1706&Year=${year}&Month=${month}&Day=14&timeframe=1&submit=Download+Data"

So the server will only give one month per request. However, if we can make a table of years and months (as numbers) then we can make a list of urls and have PowerQuery get them all!

So let's say you want to get 5 years of data starting with November 2001. We can get the URL per month with a query like...

let
  StartDate = #date(2001, 11, 1),
  Years = 5,
  Dates = #table(
    type table [Date=date],
    {List.Dates(
      StartDate,
      Years * 12, //months
      #duration(0,1,0,0,0,0)
    )}
  ),
  #"Added Year" = Table.AddColumn(
    Dates,
    "Year",
    each Date.Year([Date]),
    Int64.Type
  ),
  #"Added Month" = Table.AddColumn(
    #"Added Year",
    "Month",
    each Date.Month([Date]),
    Int64.Type
  ),
  #"Added Day" = Table.AddColumn(
    #"Added Month",
    "Day",
    each Date.Day([Date]),
    Int64.Type
  ),
  #"URLs" = Table.TransformRows(
    #"Added Day",
    each
      "https://climate.weather.gc.ca/climate_data/bulk_data_e.html?format=csv&stationID=1706&Year=" &
      Text.From([Year]) &
      "&Month=" &
      Text.From([Month]) &
      "&Day=" &
      Text.From([Day]) &
      "&timeframe=1&submit=Download+Data"
  )
in
  #"URLs"

If you called that query "URLs", and then create a function query called "fCsvFromUrl" with code like

(URL as text) as table =>
let
  ToBinary = Web.Content(URL),
  ToText = Binary.ToText(ToBinary),
  ToTable = Csv.Document(ToText),
  #"Promoted Headers" = Table.PromoteHeaders(ToTable)
in
  #""Promoted Headers"

Then the combined data would be:

let
  URLs = URLs,
  Responses = List.Transform(
    URLs,
    fCsvFromUrl
  ),
  Combined = Table.Combine(Responses)
in
  Combined

Now, I have not debugged any of this, but once you debug I believe this will work!