slickml.base¶

Classes¶

`BaseXGBoostEstimator`	Base Estimator for XGBoost.
`ExtendedEnum`	Base Enum type with compatible string functionalities.
`Metrics`	Protocol for Metrics.

Package Contents¶

class slickml.base.BaseXGBoostEstimator[source]¶

Bases: abc.ABC, sklearn.base.BaseEstimator

Base Estimator for XGBoost.

Notes

This is an abstractbaseclass using XGBoost [xgboost-api] that can be used for any estimator using XGBoost as the base estimator such as XGBoostCVClassifier, XGBoostRegressor, XGBoostFeatureSelector, XGBoostBayesianOptimizer, and so on. This base estimator comes with the base validation utilities that can reduce the amount of copy/paste codes in the downstream classes.

Parameters:

num_boost_round (int) – Number of boosting rounds to fit a model
sparse_matrix (bool) – Whether to convert the input features to sparse matrix with csr format or not. This would increase the speed of feature selection for relatively large/sparse datasets. Consequently, this would actually act like an un-optimize solution for dense feature matrix. Additionally, this parameter cannot be used along with scale_mean=True standardizing the feature matrix to have a mean value of zeros would turn the feature matrix into a dense matrix. Therefore, by default our API banned this feature
scale_mean (bool) – Whether to standarize the feauture matrix to have a mean value of zero per feature (center the features before scaling). As laid out in sparse_matrix, scale_mean=False when using sparse_matrix=True, since centering the feature matrix would decrease the sparsity and in practice it does not make any sense to use sparse matrix method and it would make it worse. The StandardScaler object can be accessed via cls.scaler_ if scale_mean or scale_strd is used unless it is None
scale_std (bool) – Whether to scale the feauture matrix to have unit variance (or equivalently, unit standard deviation) per feature. The StandardScaler object can be accessed via cls.scaler_ if scale_mean or scale_strd is used unless it is None
importance_type (str) – Importance type of xgboost.train() with possible values "weight", "gain", "total_gain", "cover", "total_cover"
params (Dict[str, Union[str, float, int]], optional) – Set of parameters required for fitting a Booster

fit(X, y)[source]¶: Abstract method to fit a model to the features/target depend on the task

References

[xgboost-api]

https://xgboost.readthedocs.io/en/latest/python/python_api.html

__getstate__()¶

classmethod __init_subclass__(**kwargs)¶

Set the set_{method}_request methods.

This uses PEP-487 [1]_ to set the set_{method}_request methods. It looks for the information available in the set default values which are set using __metadata_request__* class attributes, or inferred from method signatures.

The __metadata_request__* class attributes are used when a method does not explicitly accept a metadata through its arguments or if the developer would like to specify a request value for those metadata which are different from the default None.

References

__post_init__() → None[source]¶: Post instantiation validations and assignments.

__repr__(N_CHAR_MAX=700)¶: Return repr(self).

__setstate__(state)¶

__sklearn_clone__()¶

__slots__ = ()¶

abstract fit(X: pandas.DataFrame | numpy.ndarray, y: List[float] | numpy.ndarray | pandas.Series) → None[source]¶

Abstractmethod to fit a model to the features/targets depends on the task.

Parameters:

X (Union[pd.DataFrame, np.ndarray]) – Input data for training (features)
y (Union[List[float], np.ndarray, pd.Series]) – Input ground truth for training (targets)

Returns:

None

get_metadata_routing()¶

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:: routing (MetadataRequest) – A MetadataRequest encapsulating routing information.

get_params(deep=True)¶

Get parameters for this estimator.

Parameters:: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params (dict) – Parameter names mapped to their values.

importance_type: str | None¶

num_boost_round: int | None¶

params: Dict[str, str | float | int] | None = None¶

scale_mean: bool | None¶

scale_std: bool | None¶

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:: **params (dict) – Estimator parameters.
Returns:: self (estimator instance) – Estimator instance.

sparse_matrix: bool | None¶

class slickml.base.ExtendedEnum[source]¶

Bases: enum.Enum

Base Enum type with compatible string functionalities.

names()[source]¶: Returns a list of Enum names as string

values()[source]¶: Returns a list of Enum values as string

to_dict()[source]¶: Returns a dictionary of all Enum name-value pairs

Examples

>>> from slickml.utils import ExtendedEnum
>>> class FooBar(ExtendedEnum):
...    FOO = "foo"
...    BAR = "bar"
>>> FooBar.FOO
>>> FooBar.names()
>>> FooBar.values()
>>> FooBar.to_dict()

__dir__()¶: Returns all members and all public methods

__format__(format_spec)¶: Returns format using actual value type unless __str__ has been overridden.

__hash__()¶: Return hash(self).

__reduce_ex__(proto)¶: Helper for pickle.

__repr__() → str[source]¶

Returns the Enum str representation value.

Returns:: str

__str__() → str[source]¶

Returns the Enum str value.

Returns:: str

name()¶: The name of the Enum member.

classmethod names() → List[str][source]¶

Returns a list of Enum names as string.

Returns:: List[str]

classmethod to_dict() → Dict[str, str][source]¶

Returns a dictionary of all Enum name-value pairs as string.

Returns:: Dict[str, str]

value()¶: The value of the Enum member.

classmethod values() → List[str][source]¶

Returns a list of Enum values as string.

Returns:: List[str]

class slickml.base.Metrics[source]¶

Bases: Protocol

Protocol for Metrics.

Notes

The main reason of this protocol is proper duck typing (PEP-544) [1]_ when using metrics such as RegressionMetrics or ClassificationMetrics in pipelines.

References

classmethod __class_getitem__(params)¶

classmethod __init_subclass__(*args, **kwargs)¶

__slots__ = ()¶

get_metrics(dtype: str | None) → pandas.DataFrame | Dict[str, float | None][source]¶

Returns calculated metrics in a desired output dtype.

Parameters:: dtype (Optional[str]) – Metrics output dtype
Returns:: Union[pd.DataFrame, Dict[str, Optional[float]]]

plot() → matplotlib.figure.Figure[source]¶

Plots calculated metrics visualization.

Returns:: Figure