What ML algorithm should I use to determine fish age-at-maturity?

69 Views Asked by At

Fish age-at-maturity is where there is a change in the slope of the growth rate. Here is an example of a simulated individual fish and its two growth rates.

I want to create an algorithm that will determine age-at-maturity from age and length data similar to the picture I attached. I have very little idea on what kind of algorithm would be useful and how to apply it to my sample data set:

> head(data)
  age     length
1   0 0.01479779
2   1 0.05439856
3   2 0.18308919
4   3 0.24380771
5   4 0.37759992
6   5 0.44871502

It was suggested to me to try and use the Cox Proportional Hazards model. To do that I would consider age-at-maturity as a time to event (maturity is the event and age is the time when maturity is reached). I tried fitting that model but got this error:

> cox.model <- coxph(Surv(age ~ length), data = data)
Error in Surv(age ~ length) : Time variable is not numeric

I tried making both variables numeric using as. numeric() but that did not help.

Please let me know if I am using this model wrong or if I should be using a different model.

2

There are 2 best solutions below

0
On

As I know, time-to-event data should include an event indicator, i.e. a binary variable. If maturity is the event, then it should have been included in the dataset as such a binary variable, and you should run this cox.model <- coxph(Surv(age, maturity) ~ length, data = data)

Please check manual for more details

  1. Survival package
  2. Cox model

BTW, the figure was created by something like segmented regression and ggplot, I think you may want to use such tech. Here is an example.

0
On

I agree with @C.C., 1) a survival model is not applicable for this provided dataset and 2) a simple piecewise linear regression method would be more appropriate.

Please see below the proposed R code for it, together with a sample output graph: sample output graph

library(segmented)

# create dummy data set, extended from provided one, with noise
df <- data.frame(
  age = seq(from = 0, to = 20, by = 1),
  length = c(
    seq(from = 0, to = 0.45, length.out = 5) + rnorm(5, mean = 1e-3, sd = 1e-2),
    seq(from = 0.48, to = 0.6, length.out = 16) + rnorm(16, mean = 1e-3, sd = 1e-2)
    )
)

# fit normal linear regression and segmented regression
lm1 <- lm(length ~ age, data = df)
seg_lm <- segmented(lm1, ~ age)

# determine age break point
age_break_point <- seg_lm$psi.history$all.selected.psi[[1]]

# plot raw data points, segmented fit and age break point
plot(seg_lm, res = TRUE, main=paste0('Growth rate change @ ', round(age_break_point, 1), ' years old'), xlab='Age', ylab='Length')
abline(v = age_break_point, col='red')