Hotfix of issue with solve(tol = 0) in systems with no large double support (noLD). This one wasn’t fun.
Added argument “smoothing” to target_encoding_mean()
function to implement original target encoding method.
Added alias f_rf_rsquared()
to the function f_rf_deviance()
.
Added column “vi_binary” to vi
as a binary version of “vi_mean”.
Added function auc_score()
to compute the area under the curve of predictions from binary models.
Added function case_weights()
to compute case weights when binary responses are unbalanced.
Added function f_rf_auc_balanced()
to be used as input for the f
argument of preference_order()
when the response is binary and balanced.
Added function f_rf_auc_unbalanced()
to be used as input for the f
argument of preference_order()
when the response is binary and unbalanced.
Added function f_gam_auc_balanced()
to be used as input for the f
argument of preference_order()
when the response is binary and balanced.
Added function f_gam_auc_unbalanced()
to be used as input for the f
argument of preference_order()
when the response is binary and unbalanced.
Added function f_logistic_auc_balanced()
to be used as input for the f
argument of preference_order()
when the response is binary and balanced.
Added function f_logistic_auc_unbalanced()
to be used as input for the f
argument of preference_order()
when the response is binary and unbalanced.
Fixed issue with perfect correlations in vif_df()
. Now perfect correlations are replaced with 0.99 (for correlation == 1) and -0.99 (for correlation == -1) in the correlation matrix to avoid errors in solve()
.
Added the example dataset toy
, derived from vi
, but with known relationships between all variables.
Fixed issue in function cor_df() where many cases would be lost because the logic to remove diagonals was flawed, as all pairs with correlation == 1 were being removed.
Fixed issue in functions cor_select() and vif_select() where ignoring predictors
and using only df
would lead to empty selections.
This version fixes bugs in two functions: cor_select()
and cor_df()
cor_select()
When only one variable was left in the correlation matrix, the one column matrix became a vector with no colnames, which yielded an error. Now, to avoid this issue, drop = FALSE is used in the matrix subsetting.
The previous version started removing predictors on a backwards fashion, from the last predictor in preference order, moving up one by one to the top. Under the wrong circumstances (low number of predictors, low cor_max, and high correlation between first and second predictors in preference order) this configuration would lead to keep only the first predictor, even when having others comply with the max_cor restriction lower down in the preference order. The new version produces smaller subsets of predictors with a higher diversity.
cor_df()
Re-submission after minor CRAN comments.
Changes:
version number bumped up to 1.0.1
removed if(interactive()){} from all @examples.
removed plot() call from the @examples of the function target_encoding_lab() because it was messing up pkgdown’s build. This piece of code triggered the comment “Please always make sure to reset to user’s options()” by the reviewer, so this should solve the issue.
made sure that all examples run in less than 5 seconds.
fixed a bug in which all functions would include the response as a predictor when ‘predictors = NULL’ and ‘response’ was a valid column of the input data frame.
First functional version of the package submitted to CRAN.