I was running h2o.automl()
example from: http://h2o-release.s3.amazonaws.com/h2o/master/3888/docs-website/h2o-docs/automl.html . Everything went fine except NaN
values in leaderboard
. Predictions also works fine. Is it a bug or I'm doing something wrong?
library(h2o)
localH2O <- h2o.init(ip = "localhost",
port = 54321,
nthreads = -1,
min_mem_size = "20g")
train <- h2o.importFile("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv")
test <- h2o.importFile("https://s3.amazonaws.com/erin-data/higgs/higgs_test_5k.csv")
y <- "response"
x <- setdiff(names(train), y)
train[,y] <- as.factor(train[,y])
test[,y] <- as.factor(test[,y])
aml <- h2o.automl(x = x, y = y,
training_frame = train,
leaderboard_frame = test,
max_runtime_secs = 30)
lb <- aml@leaderboard
lb
model_id auc logloss
1 StackedEnsemble_0_AutoML_20170908_094736 NaN NaN
2 StackedEnsemble_0_AutoML_20170908_094407 NaN NaN
3 GBM_grid_0_AutoML_20170908_094736_model_1 NaN NaN
4 GBM_grid_0_AutoML_20170908_094407_model_0 NaN NaN
5 GBM_grid_0_AutoML_20170908_094407_model_1 NaN NaN
6 GBM_grid_0_AutoML_20170908_094736_model_0 NaN NaN
I've checked and there are normal values in H2O Flow on localhost:54321
and also I'm getting normal values using h2o.getFrame()
:
h2o.getFrame("leaderboard")
model_id auc logloss
1 StackedEnsemble_0_AutoML_20170908_094736 0,787145 0,554983
2 StackedEnsemble_0_AutoML_20170908_094407 0,785154 0,556897
3 GBM_grid_0_AutoML_20170908_094736_model_1 0,778587 0,563741
4 GBM_grid_0_AutoML_20170908_094407_model_0 0,776755 0,564247
5 GBM_grid_0_AutoML_20170908_094407_model_1 0,776640 0,564436
6 GBM_grid_0_AutoML_20170908_094736_model_0 0,774611 0,566920
I'm using h2o v. 3.15.0.4018
h2o.clusterInfo()
R is connected to the H2O cluster:
H2O cluster uptime: 2 hours 8 minutes
H2O cluster version: 3.15.0.4018
H2O cluster version age: 15 hours and 47 minutes
H2O cluster name: H2O_started_from_R_maju116_ozj558
H2O cluster total nodes: 1
H2O cluster total memory: 19.03 GB
H2O cluster total cores: 8
H2O cluster allowed cores: 8
H2O cluster healthy: TRUE
H2O Connection ip: localhost
H2O Connection port: 54321
H2O Connection proxy: NA
H2O Internal Security: FALSE
H2O API Extensions: XGBoost, Algos, AutoML, Core V3, Core V4
R Version: R version 3.4.1 (2017-06-30)
Session info:
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS
Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so
locale:
[1] LC_CTYPE=pl_PL.UTF-8 LC_NUMERIC=C
[3] LC_TIME=pl_PL.UTF-8 LC_COLLATE=pl_PL.UTF-8
[5] LC_MONETARY=pl_PL.UTF-8 LC_MESSAGES=pl_PL.UTF-8
[7] LC_PAPER=pl_PL.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=pl_PL.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_0.7.2 purrr_0.2.3 readr_1.1.1 tidyr_0.7.1
[5] tibble_1.3.4 ggplot2_2.2.1 tidyverse_1.1.1 h2oEnsemble_0.2.1
[9] h2o_3.15.0.4018
loaded via a namespace (and not attached):
[1] Rcpp_0.12.12 cellranger_1.1.0 compiler_3.4.1 plyr_1.8.4
[5] bindr_0.1 forcats_0.2.0 bitops_1.0-6 tools_3.4.1
[9] lubridate_1.6.0 jsonlite_1.5 nlme_3.1-131 gtable_0.2.0
[13] lattice_0.20-35 pkgconfig_2.0.1 rlang_0.1.2 psych_1.7.5
[17] parallel_3.4.1 haven_1.1.0 bindrcpp_0.2 xml2_1.1.1
[21] httr_1.3.1 stringr_1.2.0 hms_0.3 grid_3.4.1
[25] glue_1.1.1 R6_2.2.2 readxl_1.0.0 foreign_0.8-69
[29] modelr_0.1.1 reshape2_1.4.2 magrittr_1.5 scales_0.5.0
[33] rvest_0.3.2 assertthat_0.2.0 mnormt_1.5-5 colorspace_1.3-2
[37] stringi_1.1.5 lazyeval_0.2.0 munsell_0.4.3 RCurl_1.95-4.8
[41] broom_0.4.2
Just a hunch, but try running R in the en_US locale.
If that fixes it, I imagine what is happening is that either
aml@leaderboard
orh2o.getFrame("leaderboard")
is choking on the comma in the floating point numbers, and that is where the NaN is coming from. I.e. display bug, not an data bug.(If that does fix it, it might also be useful to know what happens if you run both H2O and R in the same pl_PL.UTF-8 locale.)