This is wrapper using GLM-Net [glmnet-api] to train a Regularized Linear Model via ElasticNet regression and
find the optimal penalty values through N-Folds cross validation. In principle, GLMNet (also known
as ElasticNet) can also be used for feature selection and dimensionality reduction using the LASSO
(Least Absolute Shrinkage and Selection Operator) Regression part of the alogrithm while reaching
a solid solution using the Ridge Regression part of the algorithm.
Parameters:
alpha (float, optional) – The stability parameter with a possible values of 0<=alpha<=1 where alpha=0.0
and alpha=1.0 will lead to classic Ridge and LASSO regression models, respectively, by
default 0.5
n_lambda (int, optional) – Maximum number of penalty values to compute, by default 100
n_splits (int, optional) – Number of cross validation folds for computing performance metrics and determining
lambda_best_ and lambda_max_. If non-zero, must beat least 3, by default 3
metric (str, optional) – Metric used for model selection during cross validation. Valid options are "r2",
"mean_squared_error", "mean_absolute_error", and "median_absolute_error".
The metric affects the selection of lambda_best_ and lambda_max_. Thus, fitting the
same data with different metric methods will result in the selection of different models, by
default “r2”
scale (bool, optional) – Whether to standardize the input features to have a mean value of 0.0 and standard deviation
of 1 prior to fitting. The final coefficients will be on the scale of the original data regardless
of this step. Therefore, there is no need to pre-process the data when using scale=True,
by default True
sparse_matrix (bool, optional) – Whether to convert the input features to sparse matrix with csr format or not. This would increase
the speed of feature selection for relatively large sparse datasets. Additionally, this
parameter cannot be used along with scale=True where standardizing the feature matrix
to have a mean value of zero would turn the feature matrix into a dense matrix, by default False
fit_intercept (bool, optional) – Include an intercept term in the model, by default True
cut_point (float, optional) – The cut point to use for selecting lambda_best_. Based on this value, the distance between
lambda_max_ and lambda_best_ would be cut_point*standard_error(lambda_best_)``arg_max(lambda) for cv_score(lambda) >= cv_score(lambda_max_) - cut_point * standard_error(lambda_max_),
by default 1.0
min_lambda_ratio (float, optional) – In combination with n_lambda, the ratio of the smallest and largest values of lambda
computed (min_lambda/max_lambda>=min_lambda_ratio), by default 1e-4
tolerance (float, optional) – Convergence criteria tolerance, by default 1e-7
max_iter (int, optional) – Maximum passes over the data, by default 100000
random_state (int, optional) – Seed for the random number generator. The glmnet solver is not
deterministic, this seed is used for determining the cv folds.
lambda_path (Union[List[float], np.ndarray, pd.Series], optional) – In place of supplying n_lambda, provide an array of specific values to compute. The
specified values must be in decreasing order. When None, the path of lambda values will be
determined automatically. A maximum of n_lambda values will be computed, by default None
max_features (int, optional) – Optional maximum number of features with nonzero coefficients after regularization. If not
set, defaults to the number features (X_train.shape[1]) during fit. Note, this will be
ignored if the user specifies lambda_path, by default None
Fits a glmnet.ElasticNet to input training data. Proper X_train matrix based on chosen
options i.e. sparse_matrix, and scale is being created based on the passed X_train
and y_train
This uses PEP-487 [1]_ to set the set_{method}_request methods. It
looks for the information available in the set default values which are
set using __metadata_request__* class attributes, or inferred
from method signatures.
The __metadata_request__* class attributes are used when a method
does not explicitly accept a metadata through its arguments or if the
developer would like to specify a request value for those metadata
which are different from the default None.
figsize (tuple, optional) – Figure size, by default (8, 5)
linestyle (str, optional) – Linestyle of paths, by default “-”
fontsize (Union[int, float], optional) – Fontsize of the title. The fontsizes of xlabel, ylabel, tick_params, and legend are resized
with 0.85, 0.85, 0.75, and 0.85 fraction of title fontsize, respectively, by default 12
grid (bool, optional) – Whether to show (x,y) grid on the plot or not, by default True
legend (bool, optional) – Whether to show legend on the plot or not, by default True
legendloc (Union[int, str], optional) – Location of legend, by default “center”
xlabel (str, optional) – Xlabel of the plot, by default “-Log(Lambda)”
ylabel (str, optional) – Ylabel of the plot, by default “Coefficients”
title (str, optional) – Title of the plot, by default “Best {lambda_best} with {n} Features”
yscale (str, optiona) – Scale for y-axis (coefficients). Possible options are "linear", "log", "symlog",
"logit"[yscale], by default “linear”
bbox_to_anchor (Tuple[float, float], optional) – Relative coordinates for legend location outside of the plot, by default (1.1, 0.5)
save_path (str, optional) – The full or relative path to save the plot including the image format such as
“myplot.png” or “../../myplot.pdf”, by default None
display_plot (bool, optional) – Whether to show the plot, by default True
return_fig (bool, optional) – Whether to return figure object, by default False
**kwargs (Dict[str, Any]) – Key-value pairs of results. results_ attribute can be used
This plotting function can be used along with results_ attribute of any of
GLMNetCVClassifier, or GLMNetCVRegressor classes as kwargs.
Parameters:
figsize (tuple, optional) – Figure size, by default (8, 5)
marker (str, optional) – Marker style of the metric to distinguish the error bars. More valid marker styles can be
found at [markers-api], by default “o”
markersize (Union[int, float], optional) – Markersize, by default 5
color (str, optional) – Line and marker color, by default “red”
errorbarcolor (str, optional) – Error bar color, by default “black”
maxlambdacolor (str, optional) – Color of vertical line for lambda_max_, by default “purple”
bestlambdacolor (str, optional) – Color of vertical line for lambda_best_, by default “navy”
linestyle (str, optional) – Linestyle of vertical lambda lines, by default “–”
fontsize (Union[int, float], optional) – Fontsize of the title. The fontsizes of xlabel, ylabel, tick_params, and legend are resized
with 0.85, 0.85, 0.75, and 0.85 fraction of title fontsize, respectively, by default 12
grid (bool, optional) – Whether to show (x,y) grid on the plot or not, by default True
legend (bool, optional) – Whether to show legend on the plot or not, by default True
legendloc (Union[int, str], optional) – Location of legend, by default “best”
xlabel (str, optional) – Xlabel of the plot, by default “-Log(Lambda)”
ylabel (str, optional) – Ylabel of the plot, by default “{n_splits}-Folds CV Mean {metric}”
title (str, optional) – Title of the plot, by default “Best {lambda_best} with {n} Features”
save_path (str, optional) – The full or relative path to save the plot including the image format such as
“myplot.png” or “../../myplot.pdf”, by default None
display_plot (bool, optional) – Whether to show the plot, by default True
return_fig (bool, optional) – Whether to return figure object, by default False
**kwargs (Dict[str, Any]) – Key-value pairs of results. results_ attribute can be used
Visualizes shap beeswarm plot as summary of shapley values.
Notes
This is a helper function to plot the shap summary plot based on all types of
shap.Explainer including shap.LinearExplainer for linear models, shap.TreeExplainer
for tree-based models, and shap.DeepExplainer deep neural network models. More on details
are available at [shap-api]. Note that this function should be ran after the predict_proba()
to make sure the X_test is being instansiated or set validation=False.
Parameters:
validation (bool, optional) – Whether to calculate Shap values of using the validation data X_test or not. When
validation=False, Shap values are calculated using X_train, be default True
plot_type (str, optional) – The type of summary plot where possible options are “bar”, “dot”, “violin”, “layered_violin”,
and “compact_dot”. Recommendations are “dot” for single-output such as binary classifications,
“bar” for multi-output problems, “compact_dot” for Shap interactions, by default “dot”
figsize (tuple, optional) – Figure size where “auto” is auto-scaled figure size based on the number of features that are
being displayed. Passing a single float will cause each row to be that many inches high.
Passing a pair of floats will scale the plot by that number of inches. If None is passed
then the size of the current figure will be left unchanged, by default “auto”
color (str, optional) – Color of plots when plot_type="violin" and plot_type=layered_violin" are “RdBl”
color-map while color of the horizontal lines when plot_type="bar" is “#D0AAF3”, by
default None
cmap (LinearSegmentedColormap, optional) – Color map when plot_type="violin" and plot_type=layered_violin", by default “RdBl”
max_display (int, optional) – Limit to show the number of features in the plot, by default 20
feature_names (List[str], optional) – List of feature names to pass. It should follow the order of features, by default None
layered_violin_max_num_bins (int, optional) – The number of bins for calculating the violin plots ranges and outliers, by default 10
title (str, optional) – Title of the plot, by default None
sort (bool, optional) – Flag to plot sorted shap vlues in descending order, by default True
color_bar (bool, optional) – Flag to show a color bar when plot_type="dot" or plot_type="violin"
class_names (List[str], optional) – List of class names for multi-output problems, by default None
class_inds (List[int], optional) – List of class indices for multi-output problems, by default None
color_bar_label (str, optional) – Label for color bar, by default “Feature Value”
save_path (str, optional) – The full or relative path to save the plot including the image format such as
“myplot.png” or “../../myplot.pdf”, by default None
display_plot (bool, optional) – Whether to show the plot, by default True
Visualizes the Shapley values as a waterfall plot.
Notes
Waterfall is defined as the cumulitative/composite ratios of shap values per feature.
Therefore, it can be easily seen with each feature how much explainability we can achieve.
Note that this function should be ran after the predict_proba() to make sure the
X_test is being instansiated or set validation=False.
Parameters:
validation (bool, optional) – Whether to calculate Shap values of using the validation data X_test or not. When
validation=False, Shap values are calculated using X_train, be default True
max_display (int, optional) – Limit to show the number of features in the plot, by default 20
title (str, optional) – Title of the plot, by default None
fontsize (Union[int, float], optional) – Fontsize for xlabel and ylabel, and ticks parameters, by default 12
save_path (str, optional) – The full or relative path to save the plot including the image format such as
“myplot.png” or “../../myplot.pdf”, by default None
display_plot (bool, optional) – Whether to show the plot, by default True
return_fig (bool, optional) – Whether to return figure object, by default False
X_test (Union[pd.DataFrame, np.ndarray]) – Input data for testing (features)
y_test (Union[List[float], np.ndarray, pd.Series], optional) – Input ground truth for testing (targets)
lamb (np.ndarray, optional) – Values with shape (n_lambda,) of lambda from lambda_path_ from which to make
predictions. If no values are provided (None), the returned predictions will be those
corresponding to lambda_best_. The values of lamb must also be in the range of
lambda_path_, values greater than max(lambda_path_) or less than
min(lambda_path_) will be clipped
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as
\((1 - \frac{u}{v})\), where \(u\) is the residual
sum of squares ((y_true-y_pred)**2).sum() and \(v\)
is the total sum of squares ((y_true-y_true.mean())**2).sum().
The best possible score is 1.0 and it can be negative (because the
model can be arbitrarily worse). A constant model that always predicts
the expected value of y, disregarding the input features, would get
a \(R^2\) score of 0.0.
Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
(n_samples,n_samples_fitted), where n_samples_fitted
is the number of samples used in the fitting for the estimator.
y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
Returns:
score (float) – \(R^2\) of self.predict(X) w.r.t. y.
Notes
The \(R^2\) score used when calling score on a regressor uses
multioutput='uniform_average' from version 0.23 to keep consistent
with default value of r2_score().
This influences the score method of all the multioutput
regressors (except for
MultiOutputRegressor).
The method works on simple estimators as well as on nested objects
(such as Pipeline). The latter have
parameters of the form <component>__<parameter> so that it’s
possible to update each component of a nested object.
This is wrapper using XGBoostRegressor to train a XGBoost [xgboost-api] model with using the optimum
number of boosting rounds from the inputs. It used xgboost.cv() model with n-folds
cross-validation and train model based on the best number of boosting round to avoid over-fitting.
Parameters:
num_boost_round (int, optional) – Number of boosting rounds to fit a model, by default 200
n_splits (int, optional) – Number of folds for cross-validation, by default 4
metrics (str, optional) – Metrics to be tracked at cross-validation fitting time with possible values of "rmse",
"rmsle", "mae". Note this is different than eval_metric that needs to be passed to
params dict, by default “rmse”
early_stopping_rounds (int, optional) – The criterion to early abort the xgboost.cv() phase if the test metric is not improved,
by default 20
random_state (int, optional) – Random seed number, by default 1367
shuffle (bool, optional) – Whether to shuffle data to have the ability of building stratified folds in xgboost.cv(),
by default True
sparse_matrix (bool, optional) – Whether to convert the input features to sparse matrix with csr format or not. This would
increase the speed of feature selection for relatively large/sparse datasets. Consequently,
this would actually act like an un-optimize solution for dense feature matrix. Additionally,
this feature cannot be used along with scale_mean=True standardizing the feature matrix
to have a mean value of zeros would turn the feature matrix into a dense matrix. Therefore,
by default our API banned this feature, by default False
scale_mean (bool, optional) – Whether to standarize the feauture matrix to have a mean value of zero per feature (center
the features before scaling). As laid out in sparse_matrix, scale_mean=False when
using sparse_matrix=True, since centering the feature matrix would decrease the sparsity
and in practice it does not make any sense to use sparse matrix method and it would make
it worse. The StandardScaler object can be accessed via cls.scaler_ if scale_mean or
scale_strd is used unless it is None, by default False
scale_std (bool, optional) – Whether to scale the feauture matrix to have unit variance (or equivalently, unit standard
deviation) per feature. The StandardScaler object can be accessed via cls.scaler_
if scale_mean or scale_strd is used unless it is None, by default False
importance_type (str, optional) – Importance type of xgboost.train() with possible values "weight", "gain",
"total_gain", "cover", "total_cover", by default “total_gain”
params (Dict[str, Union[str, float, int]], optional) – Set of parameters required for fitting a Booster, by default {“eval_metric”: “rmse”,
“tree_method”: “hist”, “objective”: “reg:squarederror”, “learning_rate”: 0.05,
“max_depth”: 2, “min_child_weight”: 1, “gamma”: 0.0, “reg_alpha”: 0.0, “reg_lambda”: 1.0,
“subsample”: 0.9, “max_delta_step”: 1, “verbosity”: 0, “nthread”: 4}
Other options for objective: "reg:logistic", "reg:squaredlogerror"
verbose (bool, optional) – Whether to log the final results of xgboost.cv(), by default True
callbacks (bool, optional) – Whether to logging standard deviation of metrics on train data and track the early stopping
criterion, by default False
Fits a XGBoost.Booster to input training data. Proper dtrain_ matrix based on chosen
options i.e. sparse_matrix, scale_mean, scale_std is being created based on the
passed X_train and y_train
Transformed features when scale_mean=True or scale_std=True using clf.scaler_ that
has be fitted on X_train and y_train data. In other case, it will be the same as the
passed X_train features
This uses PEP-487 [1]_ to set the set_{method}_request methods. It
looks for the information available in the set default values which are
set using __metadata_request__* class attributes, or inferred
from method signatures.
The __metadata_request__* class attributes are used when a method
does not explicitly accept a metadata through its arguments or if the
developer would like to specify a request value for those metadata
which are different from the default None.
linestyle (str, optional) – Style of lines [linestyles-api], by default “–”
train_label (str, optional) – Label in the figure legend for the train line, by default “Train”
test_label (str, optional) – Label in the figure legend for the test line, by default “Test”
train_color (str, optional) – Color of the training line, by default “navy”
train_std_color (str, optional) – Color of the edge color of the training std bars, by default “#B3C3F3”
test_color (str, optional) – Color of the testing line, by default “purple”
test_std_color (str, optional) – Color of the edge color of the testing std bars, by default “#D0AAF3”
save_path (str, optional) – The full or relative path to save the plot including the image format such as
“myplot.png” or “../../myplot.pdf”, by default None
display_plot (bool, optional) – Whether to show the plot, by default False
return_fig (bool, optional) – Whether to return figure object, by default False
fontsize (Union[int, float], optional) – Fontsize for xlabel and ylabel, and ticks parameters, by default 12
save_path (str, optional) – The full or relative path to save the plot including the image format such as
“myplot.png” or “../../myplot.pdf”, by default None
display_plot (bool, optional) – Whether to show the plot, by default True
return_fig (bool, optional) – Whether to return figure object, by default False
Visualizes shap beeswarm plot as summary of shapley values.
Notes
This is a helper function to plot the shap summary plot based on all types of
shap.Explainer including shap.LinearExplainer for linear models, shap.TreeExplainer
for tree-based models, and shap.DeepExplainer deep neural network models. More on details
are available at [shap-api]. Note that this function should be ran after the predict()
to make sure the X_test is being instansiated or set validation=False.
Parameters:
validation (bool, optional) – Whether to calculate Shap values of using the validation data X_test or not. When
validation=False, Shap values are calculated using X_train, be default True
plot_type (str, optional) – The type of summary plot where possible options are “bar”, “dot”, “violin”, “layered_violin”,
and “compact_dot”. Recommendations are “dot” for single-output such as binary classifications,
“bar” for multi-output problems, “compact_dot” for Shap interactions, by default “dot”
figsize (tuple, optional) – Figure size where “auto” is auto-scaled figure size based on the number of features that are
being displayed. Passing a single float will cause each row to be that many inches high.
Passing a pair of floats will scale the plot by that number of inches. If None is passed
then the size of the current figure will be left unchanged, by default “auto”
color (str, optional) – Color of plots when plot_type="violin" and plot_type=layered_violin" are “RdBl”
color-map while color of the horizontal lines when plot_type="bar" is “#D0AAF3”, by
default None
cmap (LinearSegmentedColormap, optional) – Color map when plot_type="violin" and plot_type=layered_violin", by default “RdBl”
max_display (int, optional) – Limit to show the number of features in the plot, by default 20
feature_names (List[str], optional) – List of feature names to pass. It should follow the order of features, by default None
layered_violin_max_num_bins (int, optional) – The number of bins for calculating the violin plots ranges and outliers, by default 10
title (str, optional) – Title of the plot, by default None
sort (bool, optional) – Flag to plot sorted shap vlues in descending order, by default True
color_bar (bool, optional) – Flag to show a color bar when plot_type="dot" or plot_type="violin"
class_names (List[str], optional) – List of class names for multi-output problems, by default None
class_inds (List[int], optional) – List of class indices for multi-output problems, by default None
color_bar_label (str, optional) – Label for color bar, by default “Feature Value”
save_path (str, optional) – The full or relative path to save the plot including the image format such as
“myplot.png” or “../../myplot.pdf”, by default None
display_plot (bool, optional) – Whether to show the plot, by default True
Visualizes the Shapley values as a waterfall plot.
Notes
Waterfall is defined as the cumulitative/composite ratios of shap values per feature.
Therefore, it can be easily seen with each feature how much explainability we can achieve.
Note that this function should be ran after the predict() to make sure the
X_test is being instansiated or set validation=False.
Parameters:
validation (bool, optional) – Whether to calculate Shap values of using the validation data X_test or not. When
validation=False, Shap values are calculated using X_train, be default True
max_display (int, optional) – Limit to show the number of features in the plot, by default 20
title (str, optional) – Title of the plot, by default None
fontsize (Union[int, float], optional) – Fontsize for xlabel and ylabel, and ticks parameters, by default 12
save_path (str, optional) – The full or relative path to save the plot including the image format such as
“myplot.png” or “../../myplot.pdf”, by default None
display_plot (bool, optional) – Whether to show the plot, by default True
return_fig (bool, optional) – Whether to return figure object, by default False
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as
\((1 - \frac{u}{v})\), where \(u\) is the residual
sum of squares ((y_true-y_pred)**2).sum() and \(v\)
is the total sum of squares ((y_true-y_true.mean())**2).sum().
The best possible score is 1.0 and it can be negative (because the
model can be arbitrarily worse). A constant model that always predicts
the expected value of y, disregarding the input features, would get
a \(R^2\) score of 0.0.
Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
(n_samples,n_samples_fitted), where n_samples_fitted
is the number of samples used in the fitting for the estimator.
y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
Returns:
score (float) – \(R^2\) of self.predict(X) w.r.t. y.
Notes
The \(R^2\) score used when calling score on a regressor uses
multioutput='uniform_average' from version 0.23 to keep consistent
with default value of r2_score().
This influences the score method of all the multioutput
regressors (except for
MultiOutputRegressor).
The method works on simple estimators as well as on nested objects
(such as Pipeline). The latter have
parameters of the form <component>__<parameter> so that it’s
possible to update each component of a nested object.
This is wrapper using XGBoost regressor to train a XGBoost [xgboost-api] model using the number of
boosting rounds from the inputs. This is also the base class for XGBoostCVRegressor.
Parameters:
num_boost_round (int, optional) – Number of boosting rounds to fit a model, by default 200
sparse_matrix (bool, optional) – Whether to convert the input features to sparse matrix with csr format or not. This would
increase the speed of feature selection for relatively large/sparse datasets. Consequently,
this would actually act like an un-optimize solution for dense feature matrix. Additionally,
this feature cannot be used along with scale_mean=True standardizing the feature matrix
to have a mean value of zeros would turn the feature matrix into a dense matrix. Therefore,
by default our API banned this feature, by default False
scale_mean (bool, optional) – Whether to standarize the feauture matrix to have a mean value of zero per feature (center
the features before scaling). As laid out in sparse_matrix, scale_mean=False when
using sparse_matrix=True, since centering the feature matrix would decrease the sparsity
and in practice it does not make any sense to use sparse matrix method and it would make
it worse. The StandardScaler object can be accessed via cls.scaler_ if scale_mean or
scale_strd is used unless it is None, by default False
scale_std (bool, optional) – Whether to scale the feauture matrix to have unit variance (or equivalently, unit standard
deviation) per feature. The StandardScaler object can be accessed via cls.scaler_
if scale_mean or scale_strd is used unless it is None, by default False
importance_type (str, optional) – Importance type of xgboost.train() with possible values "weight", "gain",
"total_gain", "cover", "total_cover", by default “total_gain”
params (Dict[str, Union[str, float, int]], optional) – Set of parameters required for fitting a Booster, by default {“eval_metric”: “rmse”,
“tree_method”: “hist”, “objective”: “reg:squarederror”, “learning_rate”: 0.05,
“max_depth”: 2, “min_child_weight”: 1, “gamma”: 0.0, “reg_alpha”: 0.0, “reg_lambda”: 1.0,
“subsample”: 0.9, “max_delta_step”: 1, “verbosity”: 0, “nthread”: 4}
Other options for objective: "reg:logistic", "reg:squaredlogerror"
Fits a XGBoost.Booster to input training data. Proper dtrain_ matrix based on chosen
options i.e. sparse_matrix, scale_mean, scale_std is being created based on the
passed X_train and y_train
Transformed features when scale_mean=True or scale_std=True using clf.scaler_ that
has be fitted on X_train and y_train data. In other case, it will be the same as the
passed X_train features
This uses PEP-487 [1]_ to set the set_{method}_request methods. It
looks for the information available in the set default values which are
set using __metadata_request__* class attributes, or inferred
from method signatures.
The __metadata_request__* class attributes are used when a method
does not explicitly accept a metadata through its arguments or if the
developer would like to specify a request value for those metadata
which are different from the default None.
fontsize (Union[int, float], optional) – Fontsize for xlabel and ylabel, and ticks parameters, by default 12
save_path (str, optional) – The full or relative path to save the plot including the image format such as
“myplot.png” or “../../myplot.pdf”, by default None
display_plot (bool, optional) – Whether to show the plot, by default True
return_fig (bool, optional) – Whether to return figure object, by default False
Visualizes shap beeswarm plot as summary of shapley values.
Notes
This is a helper function to plot the shap summary plot based on all types of
shap.Explainer including shap.LinearExplainer for linear models, shap.TreeExplainer
for tree-based models, and shap.DeepExplainer deep neural network models. More on details
are available at [shap-api]. Note that this function should be ran after the predict()
to make sure the X_test is being instansiated or set validation=False.
Parameters:
validation (bool, optional) – Whether to calculate Shap values of using the validation data X_test or not. When
validation=False, Shap values are calculated using X_train, be default True
plot_type (str, optional) – The type of summary plot where possible options are “bar”, “dot”, “violin”, “layered_violin”,
and “compact_dot”. Recommendations are “dot” for single-output such as binary classifications,
“bar” for multi-output problems, “compact_dot” for Shap interactions, by default “dot”
figsize (tuple, optional) – Figure size where “auto” is auto-scaled figure size based on the number of features that are
being displayed. Passing a single float will cause each row to be that many inches high.
Passing a pair of floats will scale the plot by that number of inches. If None is passed
then the size of the current figure will be left unchanged, by default “auto”
color (str, optional) – Color of plots when plot_type="violin" and plot_type=layered_violin" are “RdBl”
color-map while color of the horizontal lines when plot_type="bar" is “#D0AAF3”, by
default None
cmap (LinearSegmentedColormap, optional) – Color map when plot_type="violin" and plot_type=layered_violin", by default “RdBl”
max_display (int, optional) – Limit to show the number of features in the plot, by default 20
feature_names (List[str], optional) – List of feature names to pass. It should follow the order of features, by default None
layered_violin_max_num_bins (int, optional) – The number of bins for calculating the violin plots ranges and outliers, by default 10
title (str, optional) – Title of the plot, by default None
sort (bool, optional) – Flag to plot sorted shap vlues in descending order, by default True
color_bar (bool, optional) – Flag to show a color bar when plot_type="dot" or plot_type="violin"
class_names (List[str], optional) – List of class names for multi-output problems, by default None
class_inds (List[int], optional) – List of class indices for multi-output problems, by default None
color_bar_label (str, optional) – Label for color bar, by default “Feature Value”
save_path (str, optional) – The full or relative path to save the plot including the image format such as
“myplot.png” or “../../myplot.pdf”, by default None
display_plot (bool, optional) – Whether to show the plot, by default True
Visualizes the Shapley values as a waterfall plot.
Notes
Waterfall is defined as the cumulitative/composite ratios of shap values per feature.
Therefore, it can be easily seen with each feature how much explainability we can achieve.
Note that this function should be ran after the predict() to make sure the
X_test is being instansiated or set validation=False.
Parameters:
validation (bool, optional) – Whether to calculate Shap values of using the validation data X_test or not. When
validation=False, Shap values are calculated using X_train, be default True
max_display (int, optional) – Limit to show the number of features in the plot, by default 20
title (str, optional) – Title of the plot, by default None
fontsize (Union[int, float], optional) – Fontsize for xlabel and ylabel, and ticks parameters, by default 12
save_path (str, optional) – The full or relative path to save the plot including the image format such as
“myplot.png” or “../../myplot.pdf”, by default None
display_plot (bool, optional) – Whether to show the plot, by default True
return_fig (bool, optional) – Whether to return figure object, by default False
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as
\((1 - \frac{u}{v})\), where \(u\) is the residual
sum of squares ((y_true-y_pred)**2).sum() and \(v\)
is the total sum of squares ((y_true-y_true.mean())**2).sum().
The best possible score is 1.0 and it can be negative (because the
model can be arbitrarily worse). A constant model that always predicts
the expected value of y, disregarding the input features, would get
a \(R^2\) score of 0.0.
Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
(n_samples,n_samples_fitted), where n_samples_fitted
is the number of samples used in the fitting for the estimator.
y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
Returns:
score (float) – \(R^2\) of self.predict(X) w.r.t. y.
Notes
The \(R^2\) score used when calling score on a regressor uses
multioutput='uniform_average' from version 0.23 to keep consistent
with default value of r2_score().
This influences the score method of all the multioutput
regressors (except for
MultiOutputRegressor).
The method works on simple estimators as well as on nested objects
(such as Pipeline). The latter have
parameters of the form <component>__<parameter> so that it’s
possible to update each component of a nested object.