lapply() and spline() on two data frames in R , No Merging

309 Views Asked by At

I have two data frames (df, df5) with shared factor level ("Auction_ID"). so df has num.bidders and res.bid and Auction_ID. df5, has bid.points, Auction_ID.

I used smooth.splines() function to get spline estimates, and I saved it as new column in df (I am not sure if I should save it in df5)

    spline  <- smooth.spline(df$c_bidders,df$res.bid)

the question is how to use predict() function on df$spline1 and df5$bid.points for each level. I tried to use lapply and send df,df5 as input data for function, but seems I can't do it. like:

 lapply(df,df5, function(t,t1)
   {
    tt<-predict(t$spline,t1$bid.points,deriv=0)$y 
   return(tt)}
    )

I dont know if I introduce a list variable, will this help?

if I use merge(df,df5,by="Auction_ID") then I am ending up very large data frame:

   str(df1):
   Classes ‘tbl_df’, ‘tbl’ and 'data.frame':    3967 obs. of  17 variables:

   str(df5)
   'data.frame':    18338 obs. of  2 variables:

    x <- merge(df5, df1, by = "Auction_ID")
    str(x)
    'data.frame':   501367 obs. of  19 variables:

( merge() with "all" options are already tried. like all.y = TRUE ... gives the same number of obs. which is not good for my purpose.

2

There are 2 best solutions below

0
On BEST ANSWER

Is the issue that you don't want to deal with the large df with 50k rows?

Maybe a merge (aka join) isn't what you need. Perhaps you just need to use the "match" function to essentially perform a vlookup and match each value of df$spline1 to each corresponding value of df5$bid.points (based on auction ID).

See if this works for your purposes:

# assuming df5 is the target df:
df5$spline1 <- df$spline1[match(df$Auction_ID,df5$Auction_ID)]

## OR

# assuming df is the target df:
df$bid.points <- df5$bid.points[match(df$Auction_ID,df5$Auction_ID)]
0
On

Consider using Map to pass both dataframes which returns a list of values returned from predict():

List return

Map(function(t, t1) predict(t$spline, t1$bid.points,deriv=0)$y, df, df5)

Above would be equivalent to passing the second dataframe as a third argument in lapply():

lapply(df, function(t,t1) { 
     predict(t$spline, t1$bid.points, deriv=0)$y
}, df5)

Matrix Return

Alternatively, using sapply() which returns a matrix:

sapply(df, function(t,t1) { 
     predict(t$spline, t1$bid.points, deriv=0)$y
}, df5)

Or mapply() the base function behind Map() (its non-simplified wrapper)

mapply(function(t,t1) predict(t$spline, t1$bid.points, deriv=0)$y, df, df5)