Module Contents#



Base Estimator for XGBoost.

class slickml.base._estimator.BaseXGBoostEstimator[source]#

Bases: abc.ABC, sklearn.base.BaseEstimator

Base Estimator for XGBoost.


This is an abstractbaseclass using XGBoost [xgboost-api] that can be used for any estimator using XGBoost as the base estimator such as XGBoostCVClassifier, XGBoostRegressor, XGBoostFeatureSelector, XGBoostBayesianOptimizer, and so on. This base estimator comes with the base validation utilities that can reduce the amount of copy/paste codes in the downstream classes.

  • num_boost_round (int) – Number of boosting rounds to fit a model

  • sparse_matrix (bool) – Whether to convert the input features to sparse matrix with csr format or not. This would increase the speed of feature selection for relatively large/sparse datasets. Consequently, this would actually act like an un-optimize solution for dense feature matrix. Additionally, this parameter cannot be used along with scale_mean=True standardizing the feature matrix to have a mean value of zeros would turn the feature matrix into a dense matrix. Therefore, by default our API banned this feature

  • scale_mean (bool) – Whether to standarize the feauture matrix to have a mean value of zero per feature (center the features before scaling). As laid out in sparse_matrix, scale_mean=False when using sparse_matrix=True, since centering the feature matrix would decrease the sparsity and in practice it does not make any sense to use sparse matrix method and it would make it worse. The StandardScaler object can be accessed via cls.scaler_ if scale_mean or scale_strd is used unless it is None

  • scale_std (bool) – Whether to scale the feauture matrix to have unit variance (or equivalently, unit standard deviation) per feature. The StandardScaler object can be accessed via cls.scaler_ if scale_mean or scale_strd is used unless it is None

  • importance_type (str) – Importance type of xgboost.train() with possible values "weight", "gain", "total_gain", "cover", "total_cover"

  • params (Dict[str, Union[str, float, int]], optional) – Set of parameters required for fitting a Booster

fit(X, y)[source]#

Abstract method to fit a model to the features/target depend on the task


__slots__ = []#
importance_type :Optional[str]#
num_boost_round :Optional[int]#
params :Optional[Dict[str, Union[str, float, int]]]#
scale_mean :Optional[bool]#
scale_std :Optional[bool]#
sparse_matrix :Optional[bool]#
__post_init__() None[source]#

Post instantiation validations and assignments.


Return repr(self).

abstract fit(X: Union[pandas.DataFrame, numpy.ndarray], y: Union[List[float], numpy.ndarray, pandas.Series]) None[source]#

Abstractmethod to fit a model to the features/targets depends on the task.

  • X (Union[pd.DataFrame, np.ndarray]) – Input data for training (features)

  • y (Union[List[float], np.ndarray, pd.Series]) – Input ground truth for training (targets)




Get parameters for this estimator.


deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.


params (dict) – Parameter names mapped to their values.


Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.


**params (dict) – Estimator parameters.


self (estimator instance) – Estimator instance.