uniroot() function in source code does not work with modification; Could not figure out the error

419 Views Asked by At

I was trying to find out coordinates of the intersection of two curves in R. The input data are coordinates of empirical points from the two curves. My solution is to use the function curve_intersect(). I need to do this for 2000 replications (i.e., 2000 pairs of curves). So I put the data in two lists. Each list contains 1000 data frames with x & y coordinates of one curve in each data frame.

Here is my data: data

Below are the code that I used.

threshold_or1 <- map2_df(recall_or1_4, precision_or1_4,
                         ~curve_intersect(.x, .y, empirical = TRUE, domain = NULL))

# recall_or_4 is a list of 2000 data frames. Each data frame 
# |contains coordinates from curve #1. 

# precision_or_4 is a list of 2000 data frames. Each data frame 
# |contains coordinates from curve #2.

I got this error message below.

Error in uniroot(function(x) curve1_f(x) - curve2_f(x), c(min(curve1$x),  : f() values at end points not of opposite sign

Since the function curve_intersect() can be successfully applied to some individual data frames from the two lists. I ran the following code in order to see exactly which pair of data frames made the process fail.

test <- for (i in 1:2000){
            curve_intersect(recall_or1_4[[i]], precision_or1_4[[i]], empirical = TRUE, domain = NULL)
            print(paste("i=",i))}

Then, I got the following message, which means that the process ran successfully until it reaches the data pair #460. So I checked that individual data pair.

[1] "i= 457"
[1] "i= 458"
[1] "i= 459"
Error in uniroot(function(x) curve1_f(x) - curve2_f(x), c(min(curve1$x),  : f() values at end points not of opposite sign

I plotted data pairs #460.

test1 <- precision_or1_4[[460]] %>% mutate(statistics = 'precision')
test2 <- recall_or1_4[[460]] %>% mutate(statistics = 'recall')
test3 <- rbind(test1, test2)
test3 <- test3 %>% mutate(statistics = as.factor(statistics))
curve_test3 <- ggplot(test3, aes(x = x, y = y))+
        geom_line(aes(colour = statistics))
curve_test3

Find coordinates of the intersection point

I then went to modify the source code of curve_intersect(). The original source code is

    curve_intersect <- function(curve1, curve2, empirical=TRUE, domain=NULL) {
        if (!empirical & missing(domain)) {
                stop("'domain' must be provided with non-empirical curves")
        }
        
        if (!empirical & (length(domain) != 2 | !is.numeric(domain))) {
                stop("'domain' must be a two-value numeric vector, like c(0, 10)")
        }
        
        if (empirical) {
                # Approximate the functional form of both curves
                curve1_f <- approxfun(curve1$x, curve1$y, rule = 2)
                curve2_f <- approxfun(curve2$x, curve2$y, rule = 2)
                
                # Calculate the intersection of curve 1 and curve 2 along the x-axis
                point_x <- uniroot(function(x) curve1_f(x) - curve2_f(x),
                                   c(min(curve1$x), max(curve1$x)))$root
                
                # Find where point_x is in curve 2
                point_y <- curve2_f(point_x)
        } else {
                # Calculate the intersection of curve 1 and curve 2 along the x-axis
                # within the given domain
                point_x <- uniroot(function(x) curve1(x) - curve2(x), domain)$root
                
                # Find where point_x is in curve 2
                point_y <- curve2(point_x)
        }
        
        return(list(x = point_x, y = point_y))
}

I modified the uniroot() part from the third if statement. Instead of using c(min(curve1$x), max(curve1$x)) as an argument of uniroot(), I used lower = -100000000, upper = 100000000. The modified function is

curve_intersect_tq <- function(curve1, curve2, empirical=TRUE, domain=NULL) {
        if (!empirical & missing(domain)) {
                stop("'domain' must be provided with non-empirical curves")
        }
        
        if (!empirical & (length(domain) != 2 | !is.numeric(domain))) {
                stop("'domain' must be a two-value numeric vector, like c(0, 10)")
        }
        
        if (empirical) {
                # Approximate the functional form of both curves
                curve1_f <- approxfun(curve1$x, curve1$y, rule = 2)
                curve2_f <- approxfun(curve2$x, curve2$y, rule = 2)
                
                # Calculate the intersection of curve 1 and curve 2 along the x-axis
                point_x <- uniroot(function(x) curve1_f(x) - curve2_f(x),
                                   lower = -100000000, upper = 100000000)$root
                
                # Find where point_x is in curve 2
                point_y <- curve2_f(point_x)
        } else {
                # Calculate the intersection of curve 1 and curve 2 along the x-axis
                # within the given domain
                point_x <- uniroot(function(x) curve1(x) - curve2(x), domain)$root
                
                # Find where point_x is in curve 2
                point_y <- curve2(point_x)
        }
        
        return(list(x = point_x, y = point_y))
}

I tried to change the values of lower =, upper = arguments. It did not work. I got the same error message as shown below.

curve_intersect_tq(recall_or1_4[[460]], precision_or1_4[[460]], empirical = TRUE, domain = NULL)

Error in uniroot(function(x) curve1_f(x) - curve2_f(x), c(min(curve1$x),  : 
  f() values at end points not of opposite sign

I also tried to use possibly(fun, NA) from the tidyverse package hoping that the process can run even with an error message. It did not work when I used

(1) possibly(curve_intersect(), NA) or (2) possibly(uniroot(), NA)

The same error message appeared.

Why do I have the error message? What could be possible solutions? Thanks in advance.

1

There are 1 best solutions below

0
On BEST ANSWER

Might be a little late to the party, but here's why your code still fails and what you could do, depending on what you want to get out of your analysis:

First of all, the reason why your code fails, even after adaptation, is that you are merely telling uniroot to search a wider window in x. However, the underlying curves will never intersect - there just isn't any curve1_f(x) - curve2_f(x) == 0 to be found.

From the doc of uniroot:

"The function values at the endpoints must be of opposite signs (or zero), for extendInt="no", the default."

In the original curve_intersect implementation, uniroot is searching the x-interval that is defined in your data (that's the c(min(curve1$x), max(curve1$x))). In your alteration, you're telling it to search in the x interval [-100000000, 100000000]. You could as well have set extendInt = "yes", but it wouldn't change anything.
The problem doesn't lie in the search interval, it lies with approxfun!

approxfun merely helps you by interpolating empirical data between points. Outside of the data you pass in, the returned function wouldn't know what to do.
approxfun allows you to specify explicit values for y which should be returned outside the empirically defined window (with its params yleft/yright) or lets you set a rule for each side.
In the code you posted above, rule = 2 decides that "the value at the closest data extreme is used". So, approxfun does not extrapolate the data you pass in. It only extends the known.

We can plot how curve1_f and curve2_f will extend outside the empirically defined x-interval into infinity:

tibble(
    x = seq(0, 1, by = 0.001),
    curve1_approxed = curve1_f(x),
    curve2_approxed = curve2_f(x)
  ) %>%
  pivot_longer(starts_with("curve"), names_to = "curve", values_to = "y") %>%
  ggplot(aes(x = x, y = y, color = curve)) +
  geom_line() +
  geom_vline(xintercept = c(min(curve1$x), max(curve1$x)), color = "grey75")

approxfun outside empirically defined x interval


So, now to what you can do to get your code to not crash:
(spoiler: it pretty much depends on what you're trying to accomplish with your project)

  1. accept that there is no intersection in the observed limits of your data.
    If you don't want to make any assumptions, I'd suggest you wrap your mapped function in a tryCatch statement and let it fail where the out-of-the-box solution doesn't give you any results. Let's run this for the part of your list that previously made the whole thing crash:
threshold_or1.fix1 <- map2_df(
  recall_or1_4, precision_or1_4,
  ~tryCatch({
    curve_intersect(.x, .y, empirical = TRUE, domain = NULL)
  }, error = function(e){
    return(tibble(.rows = 1))
  }),
  .id = "i"
)

Now, there is just a NA row when curve_intersect isn't able to give you a result.

threshold_or1.fix1[459:461,]
# A tibble: 3 x 3
  i          x      y
  <chr>  <dbl>  <dbl>
1 459    0.116  0.809
2 460   NA     NA    
3 461    0.264  0.773
  1. try to extrapolate your data with a linear model
    In this case, we'll use a custom curve_intersect-function. Let's wrap the problematic uniroot call in a tryCatch and if no root can be found, we'll fit a lm for each curve and let uniroot find an intersection on the fitted linears.
    That might or might not make sense in the light of your experiment, so I'll let you be the judge here. And obviously you can use other models than the simplistic lm if your data is more complex than that...
    Just to visualize this approach vs the default:
tibble(
    x = seq(-1, 2, by = 0.001),
    curve1_approxed = curve1_f(x),
    curve2_approxed = curve2_f(x),
    curve1_lm = predict(lm(y ~ x, data = curve1), newdata = tibble(x = x)),
    curve2_lm = predict(lm(y ~ x, data = curve2), newdata = tibble(x = x))
  ) %>%
  pivot_longer(starts_with("curve"), names_to = "curve", values_to = "y") %>%
  ggplot(aes(x = x, y = y, color = curve)) +
  geom_line() +
  geom_vline(xintercept = c(min(curve1$x), max(curve1$x)), color = "grey75")

approxfun vs lm outside empirically defined x interval
You see, where approxfun "fails", with lm we make that assumption that we can extrapolate linearly and find an intersection around x = 1.27 outside of your observed frame.

To go for that second approach and include an extrapolation with lm in our search, you could throw together something like this:
(here, too, only the third if is edited.)

curve_intersect_custom <- function(curve1, curve2, empirical=TRUE, domain=NULL) {
  if (!empirical & missing(domain)) {
    stop("'domain' must be provided with non-empirical curves")
  }
  
  if (!empirical & (length(domain) != 2 | !is.numeric(domain))) {
    stop("'domain' must be a two-value numeric vector, like c(0, 10)")
  }
  
  if (empirical) {
    
    return(
      tryCatch({
        # Approximate the functional form of both curves
        curve1_f <- approxfun(curve1$x, curve1$y, rule = 2)
        curve2_f <- approxfun(curve2$x, curve2$y, rule = 2)
        
        # Calculate the intersection of curve 1 and curve 2 along the x-axis
        point_x <- uniroot(
          f = function(x) curve1_f(x) - curve2_f(x),
          interval = c(min(curve1$x), max(curve1$x))
        )$root
        
        # Find where point_x is in curve 2
        point_y <- curve2_f(point_x)
        
        return(list(x = point_x, y = point_y, method = "approxfun"))
        
      }, error = function(e) {
        tryCatch({
          curve1_lm_f <- function(x) predict(lm(y ~ x, data = curve1), newdata = tibble(x = x))
          curve2_lm_f <- function(x) predict(lm(y ~ x, data = curve2), newdata = tibble(x = x))
          
          point_x <- uniroot(
            f = function(x) curve1_lm_f(x) - curve2_lm_f(x),
            interval = c(min(curve1$x), max(curve1$x)),
            extendInt = "yes"
          )$root
          
          point_y <- curve2_lm_f(point_x)
          
          return(list(x = point_x, y = point_y, method = "lm"))
          
        }, error = function(e) {
          return(list(x = NA_real_, y = NA_real_, method = NA_character_))
        })
      })
    )
    
    
  } else {
    # Calculate the intersection of curve 1 and curve 2 along the x-axis
    # within the given domain
    point_x <- uniroot(function(x) curve1(x) - curve2(x), domain)$root
    
    # Find where point_x is in curve 2
    point_y <- curve2(point_x)
  }
  
  return(list(x = point_x, y = point_y))
}

For your problematic list elements, this now tries to extrapolate with the naively fitted lm model:

threshold_or1.fix2 <- map2_df(
    recall_or1_4, precision_or1_4,
    ~curve_intersect_custom(.x, .y, empirical = TRUE, domain = NULL),
    .id = "i"
)

threshold_or1.fix2[459:461,]
# A tibble: 3 x 4
  i         x     y method   
  <chr> <dbl> <dbl> <chr>    
1 459   0.116 0.809 approxfun
2 460   1.27  0.813 lm       
3 461   0.264 0.773 approxfun

Hope this helps a little in understanding and fixing your issue :)