error using a gam model to analyse phytoplankton abundances and envionmental parameters (mgcv package)

66 Views Asked by At

I am quite new to R and new in this forum. I have been trying to use a gam model to model phytoplankton species count data against environmental predictors, but I am stuck with en error.
My code is the following:

file <- read.csv("sg1.csv", header = TRUE, sep = ";", dec = ".", check.names = FALSE,na.strings=c("","NA")) #my dataset contains empty cells that I substitute with NA
data.selected <- file[,c(5,6,14:19)] #I select only the columns on which I am interested
data.no_na <- na.omit(data.selected)
colnames(data.no_na) <- c("T", "S", "P", "Si", "DIN", "DIN_P", "Si_DIN", "Diato")
set.seed(123)
training.samples <- data.no_na$Diato %>% createDataPartition(p =0.8, list = FALSE) #to use Diatom as outcome variable
train.data <- data.no_na[training.samples, ]
test.data <- data.no.na[-training.samples, ]
model <- gam(Diato ~ s(T) + s(S) + s(P) + s(Si), data = train.data)

When I run the code, I get this error: Error in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) : NA/NaN/Inf in chiamata a funzione esterna (arg 1) Inoltre: Warning messages: 1: In mean.default(xx) : argument is not numeric or logical: returning NA 2: In Ops.factor(xx, shift[i]) : ‘-’ not meaningful for factors

I saw that this happens only when I put the T parameter in the command line and if I analyse the values using data.no_na$T I get the list of values and '3011 Levels: 1.25321 10 10.001 10.0043 10.0094 10.025 10.0304 ... S' at the end.

Can someone help me understand what is going on and what I am doing wrong? Thank you in advance! Please let me know if you need any further information.

0

There are 0 best solutions below