I am trying to calculate the sum of volume for the last thirty days for a set of stocks on particular days in the table important_stock_dates. The table all_stock_dates contains the same stocks but with trading volume for all dates, not just the particular days.
Sample data
all_stock_dates
stockid, date, volume  
0231245, 20060314, 153  
0231245, 20060315, 154  
2135411, 20060314, 23  
important_stock_dates
stockid, date, thirtydaysprior  
0231245, 20060314, 20060130  
0231245, 20060315, 20060201  
2135411, 20060314, 20060130  
My code
create table sum_trading_volume as
select a.stockid, a.date, sum(b.volume) as thirty_day_volume
from important_stock_dates a, all_stock_dates b
where b.date<a.date AND b.date ge a.thirtydaysprior
group by a.stockid, a.date;
Desired outcome
A table with all the observations from important_stock_dates that also has the sum of the volume from the previous 30 days based on matching stockid and dates in all_stock_dates.
Problem
The problem I'm running into is that important_stock_dates has 15 million observations and all_stock_dates has 350 million. It uses up a few hundred gigabytes of swap file running this code (maxes out the hard drive) then aborts. I can't see how to optimize the code. I couldn't find a similar problem on StackOverflow or Google.
 
                        
Presumably, the query that you want joins on
stockid:If this worked, it will probably run to completion.