slickml.utils¶

Classes¶

Colors

Colors for foreground and background.

Functions¶

`add_noisy_features`(→ pandas.DataFrame)	Creates a new feature matrix augmented with noisy features via permutation.
`array_to_df`(→ pandas.DataFrame)	Transforms a numpy array into a pandas DataFrame.
`check_var`(→ None)	Validates the variable's dtype and possible value.
`deprecated`(→ Callable[[Callable[P, R]], Callable[P, R]])	Annotation decorator for marking APIs as deprecated in docstrings and raising a warning if called.
`df_to_csr`(→ scipy.sparse.csr_matrix)	Transforms a pandas DataFrame into a Compressed Sparse Row (CSR) matrix [csr-api].
`memory_use_csr`(→ int)	Memory use of a Compressed Sparse Row (CSR) matrix in bytes.

Package Contents¶

class slickml.utils.Colors[source]¶

Bases: slickml.base.ExtendedEnum

Colors for foreground and background.

names()¶: Returns a list of color names as string

values()¶: Returns a list of color values as string

to_dict()¶: Returns a dictionary of all colors name-value pairs as string

Examples

>>> from slickml.utils import Colors
>>> Colors.RED
>>> str(Colors.BLUE)
>>> Colors.names()
>>> Colors.values()
>>> Colors.to_dict()

BLUE = '\x1b[94m'¶

BOLD = '\x1b[1m'¶

B_Black = '\x1b[40m'¶

B_Blue = '\x1b[44m'¶

B_Cyan = '\x1b[46m'¶

B_DarkGray = '\x1b[100m'¶

B_Default = '\x1b[49m'¶

B_Green = '\x1b[42m'¶

B_LightBlue = '\x1b[104m'¶

B_LightCyan = '\x1b[106m'¶

B_LightGray = '\x1b[47m'¶

B_LightGreen = '\x1b[102m'¶

B_LightMagenta = '\x1b[105m'¶

B_LightRed = '\x1b[101m'¶

B_LightYellow = '\x1b[103m'¶

B_Magenta = '\x1b[45m'¶

B_Red = '\x1b[41m'¶

B_White = '\x1b[107m'¶

B_Yellow = '\x1b[43m'¶

CYAN = '\x1b[96m'¶

DARKCYAN = '\x1b[36m'¶

END = '\x1b[0m'¶

F_Black = '\x1b[30m'¶

F_Blue = '\x1b[34m'¶

F_Cyan = '\x1b[36m'¶

F_DarkGray = '\x1b[90m'¶

F_Default = '\x1b[39m'¶

F_Green = '\x1b[32m'¶

F_LightBlue = '\x1b[94m'¶

F_LightCyan = '\x1b[96m'¶

F_LightGray = '\x1b[37m'¶

F_LightGreen = '\x1b[92m'¶

F_LightMagenta = '\x1b[95m'¶

F_LightRed = '\x1b[91m'¶

F_LightYellow = '\x1b[93m'¶

F_Magenta = '\x1b[35m'¶

F_Red = '\x1b[31m'¶

F_White = '\x1b[97m'¶

F_Yellow = '\x1b[33m'¶

GREEN = '\x1b[92m'¶

PURPLE = '\x1b[95m'¶

RED = '\x1b[91m'¶

UNDERLINE = '\x1b[4m'¶

YELLOW = '\x1b[93m'¶

__dir__()¶: Returns all members and all public methods

__format__(format_spec)¶: Returns format using actual value type unless __str__ has been overridden.

__hash__()¶: Return hash(self).

__reduce_ex__(proto)¶: Helper for pickle.

__repr__() → str¶

Returns the Enum str representation value.

Returns:: str

__str__() → str¶

Returns the Enum str value.

Returns:: str

name()¶: The name of the Enum member.

classmethod names() → List[str]¶

Returns a list of Enum names as string.

Returns:: List[str]

classmethod to_dict() → Dict[str, str]¶

Returns a dictionary of all Enum name-value pairs as string.

Returns:: Dict[str, str]

value()¶: The value of the Enum member.

classmethod values() → List[str]¶

Returns a list of Enum values as string.

Returns:: List[str]

slickml.utils.add_noisy_features(X: pandas.DataFrame | numpy.ndarray, *, random_state: int | None = 1367, prefix: str | None = 'noisy') → pandas.DataFrame[source]¶

Creates a new feature matrix augmented with noisy features via permutation.

The main goal of this algorithm to augment permutated records as noisy features to explore the stability of any trained models. In principle, we are permutating the target classes. The input data with a shape of (n, m) would be transformed into an output data with a shape of (n, 2m).

Parameters:

X (Union[pd.DataFrame, np.ndarray]) – Input features
random_state (int, optional) – Random seed for randomizing the permutations and reproducibility, by default 1367
prefix (str, optional) – Prefix string that will be added to the noisy features’ names, by default “noisy”

Returns:

pd.DataFrame – Transformed feature matrix with noisy features and shape of (n, 2m)

Examples

>>> import pandas as pd
>>> from slickml.utils import add_noisy_features
>>> df_noisy = add_noisy_features(
...     df=pd.DataFrame({"foo": [1, 2, 3, 4, 5]}),
...     random_state=1367,
...     prefix="noisy",
... )

slickml.utils.array_to_df(X: numpy.ndarray, *, prefix: str | None = 'F', delimiter: str | None = '_') → pandas.DataFrame[source]¶

Transforms a numpy array into a pandas DataFrame.

The prefix and delimiter along with the index of each column (0-based index) of the array are used to create the columnnames of the DataFrame.

Parameters:

X (np.ndarray) – Input array
prefix (str, optional) – Prefix string for each column name, by default “F”
delimiter (str, optional) – Delimiter to separate prefix and index number, by default “_”

Returns:

pd.DataFrame

Examples

>>> import numpy as np
>>> from slickml.utils import array_to_df
>>> df = array_to_df(
...     X=np.array([1, 2, 3]),
...     prefix="F",
...     delimiter="_",
... )

slickml.utils.check_var(var: Any, *, var_name: str, dtypes: Any | Tuple[Any], values: Any | Tuple[Any] | None = None) → None[source]¶

Validates the variable’s dtype and possible value.

Parameters:

var (Any) – Variable
var_name (str) – Variable name
dtypes (Union[type, Tuple[type]]) – Data type classes
values (Union[Any, Tuple[Any]], optional) – Possible values, by default None

Returns:

None

Raises:

TypeError – If dtypes are invalid
ValueError – If values are invalid

Notes

This is the main function that is being used across the API as the variable checker before any class/function being instantiated. This is our solution instead of using pydantic validator and root_validator due to a lot of issues (i.e. data type casting/truncation in a silence mode) that we have seen in our investigation. Hopefully, when pydantic version 2.0 is released, we can use it.

Examples

>>> from dataclasses import dataclass
>>> from slickml.utils import check_var
>>> @dataclass
... class Foo:
...    var_str: str
...    var_float: float = 42.0
...    var_int: int = 1367
...    def __post_init__(self):
...        check_var(self.var_str, var_name="var_str", dtypes=str)
...        check_var(self.var_float, var_name="var_float", dtypes=float, values=(41, 42))
...        check_var(self.var_int, var_name="var_int", dtypes=str, values=(1367, 1400))

slickml.utils.deprecated(alternative: str | None = None, since: str | None = None) → Callable[[Callable[P, R]], Callable[P, R]][source]¶

Annotation decorator for marking APIs as deprecated in docstrings and raising a warning if called.

Parameters:

alternative (str, optional) – The name of a superseded replacement function, method, or class to use in place of the deprecated one, by default None
since (str, optional) – A version designator defining during which release the function, method, or class was marked as deprecated, by default None

Returns:

Callable[[Callable[P, R]], Callable[P, R]]

slickml.utils.df_to_csr(df: pandas.DataFrame, *, fillna: float | None = 0.0, verbose: bool | None = False) → scipy.sparse.csr_matrix[source]¶

Transforms a pandas DataFrame into a Compressed Sparse Row (CSR) matrix [csr-api].

Parameters:

df (pd.DataFrame) – Input dataframe
fillna (float, optional) – Value to fill nulls, by default 0.0
verbose (bool, optional) – Whether to show the memory usage comparison of csr matrix and pandas DataFrame, by default False

Returns:

csr_matrix – Transformed pandas DataFrame in CSR matrix format

Notes

This utility function is being used across API when the sparse_matrix=True for all classifiers and regressors. In practice, when we are dealing with sparse matrices, it does make sense to employ this functionality. It should be noted that using sparse matrices when the input matrix is dense would actually end up using more memory. This can be checked by passing verbose=True option or using memory_use_csr() function directly on top of your csr matrix. Additionally, you can compare the memory usage of the csr matrix with the input pandas.DataFrame via df.memory_usage().sum().

References

[csr-api] (1,2)

https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html

Examples

>>> import pandas as pd
>>> from slickml.utils import df_to_csr
>>> csr = df_to_csr(
...     df=pd.DataFrame({"foo": [0, 1, 0, 1]}),
...     fillna=0.0,
...     verbose=True,
... )

slickml.utils.memory_use_csr(csr: scipy.sparse.csr_matrix) → int[source]¶

Memory use of a Compressed Sparse Row (CSR) matrix in bytes.

Parameters:: csr (csr_matrix) – Compressed sparse row matrix
Returns:: int – Memory use in bytes

Examples

>>> import numpy as np
>>> from scipy.sparse import csr_matrix
>>> from slickml.utils import memory_use_csr
>>> csr = csr_matrix((3, 4), dtype=np.int8)
>>> mem = memory_use_csr(csr=csr)