Colon-Equals operator proper usage

363 Views Asked by At

I used the := in R to perform some manipulations in my data set but the usage which I am implementing throws an error.

I tried using other functions like c() for creating subsets but I need something efficient and apparently := should do the job for me. With the subset function, I have a lot of intermediate data frames which are of course unnecessary.

#preprocessing steps for getting rid of the null values rows 
df_data[Quantity<=0,Quantity:=NA]
df_data[UnitPrice<=0,UnitPrice:=NA]
df_data <- na.omit(df_data)

(from the console):

> df_data[Quantity<=0,Quantity:=NA]
Error in `:=`(Quantity, NA) : 
 Check that is.data.table(DT) == TRUE. Otherwise, := and `:=`(...) are defined for use in j, once only and in particular ways. See help(":=").
2

There are 2 best solutions below

2
James B On

:= only works in data.tables

This should work

df_data <- data.table(Quantity = -5:5)
df_data[Quantity<=0,Quantity:=NA]
na.omit(df_data)

This will produce the error

df_data <- data.frame(Quantity = -5:5)
df_data[Quantity<=0,Quantity:=NA]
na.omit(df_data)

That said if you're just filtering out values less than 0 you could do

df_data <- df_data[Quantity > 0 & UnitPrice > 0]
1
Maleeha On

Fixed the problem now by using fread instead of read.csv while loading the dataset and it works with the := function.

Also, here, posting a useful link for understanding fread and read.csv:

Reason behind speed of fread in data.table package in R