slickml.utils._transform
#
Module Contents#
Functions#
|
Creates a new feature matrix augmented with noisy features via permutation. |
|
Transforms a numpy array into a pandas DataFrame. |
|
Transforms a pandas DataFrame into a Compressed Sparse Row (CSR) matrix [csr-api]. |
|
Memory use of a Compressed Sparse Row (CSR) matrix in bytes. |
- slickml.utils._transform.add_noisy_features(X: Union[pandas.DataFrame, numpy.ndarray], *, random_state: Optional[int] = 1367, prefix: Optional[str] = 'noisy') pandas.DataFrame [source]#
Creates a new feature matrix augmented with noisy features via permutation.
The main goal of this algorithm to augment permutated records as noisy features to explore the stability of any trained models. In principle, we are permutating the target classes. The input data with a shape of
(n, m)
would be transformed into an output data with a shape of(n, 2m)
.- Parameters:
X (Union[pd.DataFrame, np.ndarray]) – Input features
random_state (int, optional) – Random seed for randomizing the permutations and reproducibility, by default 1367
prefix (str, optional) – Prefix string that will be added to the noisy features’ names, by default “noisy”
- Returns:
pd.DataFrame – Transformed feature matrix with noisy features and shape of (n, 2m)
Examples
>>> import pandas as pd >>> from slickml.utils import add_noisy_features >>> df_noisy = add_noisy_features( ... df=pd.DataFrame({"foo": [1, 2, 3, 4, 5]}), ... random_state=1367, ... prefix="noisy", ... )
- slickml.utils._transform.array_to_df(X: numpy.ndarray, *, prefix: Optional[str] = 'F', delimiter: Optional[str] = '_') pandas.DataFrame [source]#
Transforms a numpy array into a pandas DataFrame.
The
prefix
anddelimiter
along with the index of each column (0-based index) of the array are used to create the columnnames of the DataFrame.- Parameters:
X (np.ndarray) – Input array
prefix (str, optional) – Prefix string for each column name, by default “F”
delimiter (str, optional) – Delimiter to separate prefix and index number, by default “_”
- Returns:
pd.DataFrame
Examples
>>> import numpy as np >>> from slickml.utils import array_to_df >>> df = array_to_df( ... X=np.array([1, 2, 3]), ... prefix="F", ... delimiter="_", ... )
- slickml.utils._transform.df_to_csr(df: pandas.DataFrame, *, fillna: Optional[float] = 0.0, verbose: Optional[bool] = False) scipy.sparse.csr_matrix [source]#
Transforms a pandas DataFrame into a Compressed Sparse Row (CSR) matrix [csr-api].
- Parameters:
df (pd.DataFrame) – Input dataframe
fillna (float, optional) – Value to fill nulls, by default 0.0
verbose (bool, optional) – Whether to show the memory usage comparison of csr matrix and pandas DataFrame, by default False
- Returns:
csr_matrix – Transformed pandas DataFrame in CSR matrix format
Notes
This utility function is being used across API when the
sparse_matrix=True
for all classifiers and regressors. In practice, when we are dealing with sparse matrices, it does make sense to employ this functionality. It should be noted that using sparse matrices when the input matrix is dense would actually end up using more memory. This can be checked by passingverbose=True
option or usingmemory_use_csr()
function directly on top of your csr matrix. Additionally, you can compare the memory usage of the csr matrix with the inputpandas.DataFrame
viadf.memory_usage().sum()
.References
Examples
>>> import pandas as pd >>> from slickml.utils import df_to_csr >>> csr = df_to_csr( ... df=pd.DataFrame({"foo": [0, 1, 0, 1]}), ... fillna=0.0, ... verbose=True, ... )
- slickml.utils._transform.memory_use_csr(csr: scipy.sparse.csr_matrix) int [source]#
Memory use of a Compressed Sparse Row (CSR) matrix in bytes.
- Parameters:
csr (csr_matrix) – Compressed sparse row matrix
- Returns:
int – Memory use in bytes
Examples
>>> import numpy as np >>> from scipy.sparse import csr_matrix >>> from slickml.utils import memory_use_csr >>> csr = csr_matrix((3, 4), dtype=np.int8) >>> mem = memory_use_csr(csr=csr)