XGBoost Hyper-Parameters Tuner using Bayesian Optimization.
This is wrapper using Bayesian Optimization algorithm [bayesian-optimization] to tune the
hyper-parameter of XGBoost [xgboost-api] using xgboost.cv() functionality with n-folds
cross-validation iteratively. This feature can be used to find the set of optimized set of
hyper-parameters for both classification and regression tasks.
Notes
The optimizier objective is always to maximize the target values. Therefore, in case of using a
metric such as logloss, error, mae, rmse, or rmsle, the negative value of the
metric will be maximized. One of the big pitfall of the current implementation is the way we are
sampling hyper-parameters from the params_bounds where we are looking for an integer which
is not possible. Therefore, for some of cases i.e. max_depth we must cast the sampled value
which is mathematically wrong (i.e. f(1.1)!=f(1)).
Parameters:
n_iter (int, optional) – Number of iteration rounds for hyper-parameters tuning after initialization, by default 10
n_init_iter (int, optional) – Number of initial iterations to initialize the optimizer, by default 5
n_splits (int, optional) – Number of folds for cross-validation, by default 4
metrics (str, optional) – Metrics to be tracked at cross-validation fitting time depends on the task
(classification vs regression) with possible values of “auc”, “aucpr”, “error”, “logloss”,
“rmse”, “rmsle”, “mae”. Note this is different than eval_metric that needs to be passed to
params dict, by default “auc”
objective (str, optional) – Objective function depending on the task whether it is regression or classification. Possible
objectives for classification "binary:logistic" and for regression "reg:logistic",
"reg:squarederror", and "reg:squaredlogerror", by default “binary:logistic”
acquisition_criterion (str, optional) – Acquisition criterion method with possible options of "ei" (Expected Improvement),
"ucb" (Upper Confidence Bounds), and "poi" (Probability Of Improvement), by default “ei”
params_bounds (Dict[str, Tuple[Union[int, float], Union[int, float]]], optional) – Set of hyper-parameters boundaries for Bayesian Optimization where all fields are required,
by default {“max_depth” : (2, 7), “learning_rate” : (0, 1), “min_child_weight” : (1, 20),
“colsample_bytree”: (0.1, 1.0), “subsample” : (0.1, 1), “gamma” : (0, 1),
“reg_alpha” : (0, 1), “reg_lambda” : (0, 1)}
num_boost_round (int, optional) – Number of boosting rounds to fit a model, by default 200
early_stopping_rounds (int, optional) – The criterion to early abort the xgboost.cv() phase if the test metric is not improved,
by default 20
random_state (int, optional) – Random seed number, by default 1367
stratified (bool, optional) – Whether to use stratificaiton of the targets (only available for classification tasks) to run
xgboost.cv() to find the best number of boosting round at each fold of each iteration,
by default True
shuffle (bool, optional) – Whether to shuffle data to have the ability of building stratified folds in xgboost.cv(),
by default True
sparse_matrix (bool, optional) – Whether to convert the input features to sparse matrix with csr format or not. This would
increase the speed of feature selection for relatively large/sparse datasets. Consequently,
this would actually act like an un-optimize solution for dense feature matrix. Additionally,
this parameter cannot be used along with scale_mean=True standardizing the feature matrix
to have a mean value of zeros would turn the feature matrix into a dense matrix. Therefore,
by default our API banned this feature, by default False
scale_mean (bool, optional) – Whether to standarize the feauture matrix to have a mean value of zero per feature (center
the features before scaling). As laid out in sparse_matrix, scale_mean=False when
using sparse_matrix=True, since centering the feature matrix would decrease the sparsity
and in practice it does not make any sense to use sparse matrix method and it would make
it worse. The StandardScaler object can be accessed via cls.scaler_ if scale_mean or
scale_strd is used unless it is None, by default False
scale_std (bool, optional) – Whether to scale the feauture matrix to have unit variance (or equivalently, unit standard
deviation) per feature. The StandardScaler object can be accessed via cls.scaler_
if scale_mean or scale_strd is used unless it is None, by default False
importance_type (str, optional) – Importance type of xgboost.train() with possible values "weight", "gain",
"total_gain", "cover", "total_cover", by default “total_gain”
verbose (bool, optional) – Whether to show the Bayesian Optimization progress at each iteration, by default True
This uses PEP-487 [1]_ to set the set_{method}_request methods. It
looks for the information available in the set default values which are
set using __metadata_request__* class attributes, or inferred
from method signatures.
The __metadata_request__* class attributes are used when a method
does not explicitly accept a metadata through its arguments or if the
developer would like to specify a request value for those metadata
which are different from the default None.
At each iteration, one set of parameters gets passed from the params_bounds and the
evaluation occurs based on the cross-validation results. Bayesian optimizier always
maximizes the objectives. Therefore, based on the metrics we should be careful
when using self.metrics that are supposed to get minimized i.e. error. For those,
we can maximize (-1) * metric. One of the big pitfall of the current implementation
is the way we are sampling hyper-parameters from the params_bounds where we are looking
for an integer which is not possible. Therefore, for some of cases i.e. max_depth we
must cast the sampled value which is mathematically wrong (i.e. f(1.1) != f(1)).
Parameters:
X (Union[pd.DataFrame, np.ndarray]) – Input data for training (features)
y (Union[List[float], np.ndarray, pd.Series]) – Input ground truth for training (targets)
The method works on simple estimators as well as on nested objects
(such as Pipeline). The latter have
parameters of the form <component>__<parameter> so that it’s
possible to update each component of a nested object.
XGBoost Hyper-Parameters Tuner using HyperOpt Optimization.
This is wrapper using HyperOpt [hyperopt] a Python library for serial and parallel optimization
over search spaces, which may include real-valued, discrete, and conditional dimensions to tune the
hyper-parameter of XGBoost [xgboost-api] using xgboost.cv() functionality with n-folds
cross-validation iteratively. This feature can be used to find the set of optimized set of
hyper-parameters for both classification and regression tasks.
Notes
The optimizier objective is always to minimize the target values. Therefore, in case of using a
metric such as auc, or aucpr the negative value of the metric will be minimized.
Parameters:
n_iter (int, optional) – Maximum number of iteration rounds for hyper-parameters tuning before convergance, by default 100
n_splits (int, optional) – Number of folds for cross-validation, by default 4
metrics (str, optional) – Metrics to be tracked at cross-validation fitting time depends on the task
(classification vs regression) with possible values of “auc”, “aucpr”, “error”, “logloss”,
“rmse”, “rmsle”, “mae”. Note this is different than eval_metric that needs to be passed to
params dict, by default “auc”
objective (str, optional) – Objective function depending on the task whether it is regression or classification. Possible
objectives for classification "binary:logistic" and for regression "reg:logistic",
"reg:squarederror", and "reg:squaredlogerror", by default “binary:logistic”
params_bounds (Dict[str, Any], optional) – Set of hyper-parameters boundaries for HyperOpt using``hyperopt.hp`` and hyperopt.pyll_utils,
by default {“max_depth” : (2, 7), “learning_rate” : (0, 1), “min_child_weight” : (1, 20),
“colsample_bytree”: (0.1, 1.0), “subsample” : (0.1, 1), “gamma” : (0, 1),
“reg_alpha” : (0, 1), “reg_lambda” : (0, 1)}
num_boost_round (int, optional) – Number of boosting rounds to fit a model, by default 200
early_stopping_rounds (int, optional) – The criterion to early abort the xgboost.cv() phase if the test metric is not improved,
by default 20
random_state (int, optional) – Random seed number, by default 1367
stratified (bool, optional) – Whether to use stratificaiton of the targets (only available for classification tasks) to run
xgboost.cv() to find the best number of boosting round at each fold of each iteration,
by default True
shuffle (bool, optional) – Whether to shuffle data to have the ability of building stratified folds in xgboost.cv(),
by default True
sparse_matrix (bool, optional) – Whether to convert the input features to sparse matrix with csr format or not. This would
increase the speed of feature selection for relatively large/sparse datasets. Consequently,
this would actually act like an un-optimize solution for dense feature matrix. Additionally,
this parameter cannot be used along with scale_mean=True standardizing the feature matrix
to have a mean value of zeros would turn the feature matrix into a dense matrix. Therefore,
by default our API banned this feature, by default False
scale_mean (bool, optional) – Whether to standarize the feauture matrix to have a mean value of zero per feature (center
the features before scaling). As laid out in sparse_matrix, scale_mean=False when
using sparse_matrix=True, since centering the feature matrix would decrease the sparsity
and in practice it does not make any sense to use sparse matrix method and it would make
it worse. The StandardScaler object can be accessed via cls.scaler_ if scale_mean or
scale_strd is used unless it is None, by default False
scale_std (bool, optional) – Whether to scale the feauture matrix to have unit variance (or equivalently, unit standard
deviation) per feature. The StandardScaler object can be accessed via cls.scaler_
if scale_mean or scale_strd is used unless it is None, by default False
importance_type (str, optional) – Importance type of xgboost.train() with possible values "weight", "gain",
"total_gain", "cover", "total_cover", by default “total_gain”
verbose (bool, optional) – Whether to show the HyperOpt Optimization progress at each iteration, by default True
This uses PEP-487 [1]_ to set the set_{method}_request methods. It
looks for the information available in the set default values which are
set using __metadata_request__* class attributes, or inferred
from method signatures.
The __metadata_request__* class attributes are used when a method
does not explicitly accept a metadata through its arguments or if the
developer would like to specify a request value for those metadata
which are different from the default None.
At each iteration, one set of parameters gets passed from the params_bounds and the
evaluation occurs based on the cross-validation results. Hyper optimizier always
minimizes the objectives. Therefore, based on the metrics we should be careful
when using self.metrics that are supposed to get maximized i.e. auc. For those,
we can maximize (-1) * metric.
Parameters:
X (Union[pd.DataFrame, np.ndarray]) – Input data for training (features)
y (Union[List[float], np.ndarray, pd.Series]) – Input ground truth for training (targets)
The method works on simple estimators as well as on nested objects
(such as Pipeline). The latter have
parameters of the form <component>__<parameter> so that it’s
possible to update each component of a nested object.