slickml.base#

Package Contents#

Classes#

BaseXGBoostEstimator

Base Estimator for XGBoost.

ExtendedEnum

Base Enum type with compatible string functionalities.

Metrics

Protocol for Metrics.

class slickml.base.BaseXGBoostEstimator[source]#

Bases: abc.ABC, sklearn.base.BaseEstimator

Base Estimator for XGBoost.

Notes

This is an abstractbaseclass using XGBoost [xgboost-api] that can be used for any estimator using XGBoost as the base estimator such as XGBoostCVClassifier, XGBoostRegressor, XGBoostFeatureSelector, XGBoostBayesianOptimizer, and so on. This base estimator comes with the base validation utilities that can reduce the amount of copy/paste codes in the downstream classes.

Parameters:
  • num_boost_round (int) – Number of boosting rounds to fit a model

  • sparse_matrix (bool) – Whether to convert the input features to sparse matrix with csr format or not. This would increase the speed of feature selection for relatively large/sparse datasets. Consequently, this would actually act like an un-optimize solution for dense feature matrix. Additionally, this parameter cannot be used along with scale_mean=True standardizing the feature matrix to have a mean value of zeros would turn the feature matrix into a dense matrix. Therefore, by default our API banned this feature

  • scale_mean (bool) – Whether to standarize the feauture matrix to have a mean value of zero per feature (center the features before scaling). As laid out in sparse_matrix, scale_mean=False when using sparse_matrix=True, since centering the feature matrix would decrease the sparsity and in practice it does not make any sense to use sparse matrix method and it would make it worse. The StandardScaler object can be accessed via cls.scaler_ if scale_mean or scale_strd is used unless it is None

  • scale_std (bool) – Whether to scale the feauture matrix to have unit variance (or equivalently, unit standard deviation) per feature. The StandardScaler object can be accessed via cls.scaler_ if scale_mean or scale_strd is used unless it is None

  • importance_type (str) – Importance type of xgboost.train() with possible values "weight", "gain", "total_gain", "cover", "total_cover"

  • params (Dict[str, Union[str, float, int]], optional) – Set of parameters required for fitting a Booster

fit(X, y)[source]#

Abstract method to fit a model to the features/target depend on the task

References

__slots__ = []#
importance_type :Optional[str]#
num_boost_round :Optional[int]#
params :Optional[Dict[str, Union[str, float, int]]]#
scale_mean :Optional[bool]#
scale_std :Optional[bool]#
sparse_matrix :Optional[bool]#
__getstate__()#
__post_init__() None[source]#

Post instantiation validations and assignments.

__repr__(N_CHAR_MAX=700)#

Return repr(self).

__setstate__(state)#
abstract fit(X: Union[pandas.DataFrame, numpy.ndarray], y: Union[List[float], numpy.ndarray, pandas.Series]) None[source]#

Abstractmethod to fit a model to the features/targets depends on the task.

Parameters:
  • X (Union[pd.DataFrame, np.ndarray]) – Input data for training (features)

  • y (Union[List[float], np.ndarray, pd.Series]) – Input ground truth for training (targets)

Returns:

None

get_params(deep=True)#

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params (dict) – Parameter names mapped to their values.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self (estimator instance) – Estimator instance.

class slickml.base.ExtendedEnum[source]#

Bases: enum.Enum

Base Enum type with compatible string functionalities.

names()[source]#

Returns a list of Enum names as string

values()[source]#

Returns a list of Enum values as string

to_dict()[source]#

Returns a dictionary of all Enum name-value pairs

Examples

>>> from slickml.utils import ExtendedEnum
>>> class FooBar(ExtendedEnum):
...    FOO = "foo"
...    BAR = "bar"
>>> FooBar.FOO
>>> FooBar.names()
>>> FooBar.values()
>>> FooBar.to_dict()
__dir__()#

Returns all members and all public methods

__format__(format_spec)#

Returns format using actual value type unless __str__ has been overridden.

__hash__()#

Return hash(self).

__reduce_ex__(proto)#

Helper for pickle.

__repr__() str[source]#

Returns the Enum str representation value.

Returns:

str

__str__() str[source]#

Returns the Enum str value.

Returns:

str

name()#

The name of the Enum member.

classmethod names() List[str][source]#

Returns a list of Enum names as string.

Returns:

List[str]

classmethod to_dict() Dict[str, str][source]#

Returns a dictionary of all Enum name-value pairs as string.

Returns:

Dict[str, str]

value()#

The value of the Enum member.

classmethod values() List[str][source]#

Returns a list of Enum values as string.

Returns:

List[str]

class slickml.base.Metrics[source]#

Bases: Protocol

Protocol for Metrics.

Notes

The main reason of this protocol is proper duck typing (PEP-544) [1] when using metrics such as RegressionMetrics or ClassificationMetrics in pipelines.

References

__slots__ = []#
classmethod __class_getitem__(params)#
classmethod __init_subclass__(*args, **kwargs)#
get_metrics(dtype: Optional[str]) Union[pandas.DataFrame, Dict[str, Optional[float]]][source]#

Returns calculated metrics in a desired output dtype.

Parameters:

dtype (Optional[str]) – Metrics output dtype

Returns:

Union[pd.DataFrame, Dict[str, Optional[float]]]

plot() matplotlib.figure.Figure[source]#

Plots calculated metrics visualization.

Returns:

Figure