API Reference¶
Clustering¶
- class dbgsom.SomVQ.SomVQ(n_iter=500, lambda_=115.0, sigma_start=None, sigma_end=None, sigma_fine=None, vertical_growth=False, decay_function='exponential', neighborhood_function='gaussian', neighborhood_cutoff=3.0, verbose=False, coarse_training_frac=0.5, random_state=None, convergence_threshold=0.001, max_neurons=None, metric='euclidean', growth_criterion='quantization_error', min_samples_vertical_growth=100, tau_2=0.5, n_jobs=1, winner_stability_threshold=0.01, pointer_search='fine', cutgauss_phase='fine', smoothing_steps=0, smoothing_epsilon=0.5)¶
Bases:
TransformerMixin,ClusterMixin,BaseSomDirected Batch Growing SOM for unsupervised clustering and vector quantization.
See
BaseSomfor all parameters.- labels_¶
Cluster index of each training sample.
- Type:
ndarray of shape (n_samples,)
- som_¶
Graph containing neurons with
weight,error,hit_countattributes.- Type:
networkx.Graph
- weights_¶
Learned prototype weight vectors.
- Type:
ndarray of shape (n_prototypes, n_features)
- topographic_error_¶
Fraction of samples whose two nearest prototypes are not grid-adjacent.
- Type:
float
- quantization_error_¶
Mean distance from each training sample to its nearest prototype.
- Type:
float
- calculate_quantization_error(X)¶
Return the average distance from each sample to the nearest prototype.
- Return type:
float- Parameters:
X (array_like of shape (n_samples, n_features)) – Data to quantize.
- Returns:
error – Average distance from each sample to the nearest prototype.
- Return type:
float
- fit(X, y=None)¶
Train SOM on training data.
- Return type:
Self- Parameters:
X (array_like of shape (n_samples, n_features)) – Training data.
y (array_like of shape (n_samples), optional) – Class labels of the samples.
- Returns:
self – Trained estimator
- Return type:
DBGSOM
- fit_predict(X, y=None, **kwargs)¶
Perform clustering on X and returns cluster labels.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input data.
y (Ignored) – Not used, present for API consistency by convention.
**kwargs (dict) –
Arguments to be passed to
fit.Added in version 1.4.
- Returns:
labels – Cluster labels.
- Return type:
ndarray of shape (n_samples,), dtype=np.int64
- fit_transform(X, y=None, **fit_params)¶
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters. Pass only if the estimator accepts additional params in its fit method.
- Returns:
X_new – Transformed array.
- Return type:
ndarray array of shape (n_samples, n_features_new)
- get_metadata_routing()¶
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequestencapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters:
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
params – Parameter names mapped to their values.
- Return type:
dict
- plot(color=None, pointsize=None, layout='grid', palette='magma_r', X=None)¶
Plot the SOM neurons and their neighbourhood edges using seaborn objects.
Edges are drawn first as grey lines; nodes are drawn on top and can be colour- and size-coded by any node attribute stored in the graph.
- Return type:
Plot- Parameters:
color ({'label', 'epoch_created', 'error', 'average_distance', 'density',) – ‘hit_count’}, optional Node attribute mapped to colour. Numeric attributes with all identical values are cast to string to avoid a degenerate continuous scale.
pointsize ({'label', 'epoch_created', 'error', 'average_distance',) – ‘density’, ‘hit_count’}, optional Node attribute mapped to point size.
layout ({'grid', 'pca'}, default 'grid') –
Algorithm used to compute node positions.
'grid'Neurons are placed at their integer SOM grid coordinates. Preserves the topological map structure.
'pca'Weight vectors projected to 2-D with PCA. Node positions reflect the principal directions of variance in feature space.
palette (str, default
'magma_r') – Seaborn / Matplotlib colormap name applied to the colour mapping.X (array-like of shape (n_samples, n_features), optional) – Training data used to fit the PCA basis when
layout='pca'. When provided, PCA is fit on X and the weight vectors are projected into that space, yielding components aligned with the true data variance. WhenNone(default), PCA is fit directly on the weight vectors.
- predict(X)¶
Predict the closest neuron each sample in X belongs to.
- Return type:
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]]- Parameters:
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – New data to predict.
- Returns:
labels – Contiguous cluster index of the best matching prototype.
- Return type:
ndarray of shape (n_samples,)
- set_output(*, transform=None)¶
Set output container.
Refer to the user guide for more details and sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.
- Parameters:
transform ({"default", "pandas", "polars"}, default=None) –
Configure output of transform and fit_transform.
”default”: Default output format of a transformer
”pandas”: DataFrame output
”polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- topographic_function(X)¶
Compute the topographic function for the SOM.
Measures topology preservation across all neighbourhood scales k. Positive k values detect fold-overs (map neighbours that are far apart in data space); negative k values detect tears (data neighbours that are far apart on the map). phi(0) = phi(-1) + phi(1).
Reference: Villmann et al., “Topology preservation in self-organizing feature maps: exact definition and measurement”, IEEE Trans. Neural Networks, 1997.
- Return type:
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]]- Parameters:
X (array_like of shape (n_samples, n_features)) – Data used to compute the topographic function.
- Returns:
Row 0: phi values; row 1: normalised k-axis in [-1, 1].
- Return type:
ndarray of shape (2, 2 * max_dist + 1)
- transform(X, y=None)¶
Calculate a non negative least squares mixture model of prototypes that approximate each sample.
- Return type:
ndarray- Parameters:
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – Data to transform.
y (Ignored.) – Not used, present here for API consistency by convention.
- Returns:
coefficients (np.ndarray of shape (n_samples, n_protoypes)) – Coefficients of the linear regression model.
Reference (Teuvo Kohonen, “Description of Input Patterns by)
Linear Mixtures of SOM Models”, Proceedings of the 6th International
Workshop on Self-Organizing Maps, 2007.
Classifier¶
- class dbgsom.SomClassifier.SomClassifier(n_iter=500, lambda_=115.0, sigma_start=None, sigma_end=None, sigma_fine=None, vertical_growth=False, decay_function='exponential', neighborhood_function='gaussian', neighborhood_cutoff=3.0, verbose=False, coarse_training_frac=0.5, random_state=None, convergence_threshold=0.001, max_neurons=None, metric='euclidean', growth_criterion='quantization_error', min_samples_vertical_growth=100, tau_2=0.5, n_jobs=1, winner_stability_threshold=0.01, pointer_search='fine', cutgauss_phase='fine', smoothing_steps=0, smoothing_epsilon=0.5)¶
Bases:
TransformerMixin,ClassifierMixin,BaseSomDirected Batch Growing SOM for supervised classification.
See
BaseSomfor all parameters.- labels_¶
Predicted class label of each training sample.
- Type:
ndarray of shape (n_samples,)
- classes_¶
Unique class labels seen during fit.
- Type:
ndarray of shape (n_classes,)
- som_¶
Graph containing neurons with
weight,label,probabilitiesattributes.- Type:
networkx.Graph
- weights_¶
Learned prototype weight vectors.
- Type:
ndarray of shape (n_prototypes, n_features)
- topographic_error_¶
Fraction of samples whose two nearest prototypes are not grid-adjacent.
- Type:
float
- quantization_error_¶
Mean distance from each training sample to its nearest prototype.
- Type:
float
- calculate_quantization_error(X)¶
Return the average distance from each sample to the nearest prototype.
- Return type:
float- Parameters:
X (array_like of shape (n_samples, n_features)) – Data to quantize.
- Returns:
error – Average distance from each sample to the nearest prototype.
- Return type:
float
- fit(X, y=None)¶
Train SomClassifier on labelled data.
- Return type:
- Parameters:
X (array_like of shape (n_samples, n_features)) – Training data.
y (array_like of shape (n_samples,)) – Class labels. Required for the classifier.
- Returns:
self – Trained estimator.
- Return type:
- fit_transform(X, y=None, **fit_params)¶
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters. Pass only if the estimator accepts additional params in its fit method.
- Returns:
X_new – Transformed array.
- Return type:
ndarray array of shape (n_samples, n_features_new)
- get_metadata_routing()¶
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequestencapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters:
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
params – Parameter names mapped to their values.
- Return type:
dict
- plot(color=None, pointsize=None, layout='grid', palette='magma_r', X=None)¶
Plot the SOM neurons and their neighbourhood edges using seaborn objects.
Edges are drawn first as grey lines; nodes are drawn on top and can be colour- and size-coded by any node attribute stored in the graph.
- Return type:
Plot- Parameters:
color ({'label', 'epoch_created', 'error', 'average_distance', 'density',) – ‘hit_count’}, optional Node attribute mapped to colour. Numeric attributes with all identical values are cast to string to avoid a degenerate continuous scale.
pointsize ({'label', 'epoch_created', 'error', 'average_distance',) – ‘density’, ‘hit_count’}, optional Node attribute mapped to point size.
layout ({'grid', 'pca'}, default 'grid') –
Algorithm used to compute node positions.
'grid'Neurons are placed at their integer SOM grid coordinates. Preserves the topological map structure.
'pca'Weight vectors projected to 2-D with PCA. Node positions reflect the principal directions of variance in feature space.
palette (str, default
'magma_r') – Seaborn / Matplotlib colormap name applied to the colour mapping.X (array-like of shape (n_samples, n_features), optional) – Training data used to fit the PCA basis when
layout='pca'. When provided, PCA is fit on X and the weight vectors are projected into that space, yielding components aligned with the true data variance. WhenNone(default), PCA is fit directly on the weight vectors.
- predict(X)¶
Predict class labels for samples in X.
- Return type:
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]]- Parameters:
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – New data to predict.
- Returns:
labels – Predicted class labels for samples in X.
- Return type:
ndarray of shape (n_samples,)
- predict_proba(X, y=None)¶
Predict the probability of each class and each sample.
- Return type:
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]]- Parameters:
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – New data to predict.
y (Ignored. Only accepted for API compliance.)
- Returns:
Probabilities (array of shape (n_samples, n_classes))
Returns the probability of the sample for each class in the model, where
classes are ordered as they are in self.classes_.
- score(X, y, sample_weight=None)¶
Return accuracy on provided data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True labels for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
- Returns:
score – Mean accuracy of
self.predict(X)w.r.t. y.- Return type:
float
- set_output(*, transform=None)¶
Set output container.
Refer to the user guide for more details and sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.
- Parameters:
transform ({"default", "pandas", "polars"}, default=None) –
Configure output of transform and fit_transform.
”default”: Default output format of a transformer
”pandas”: DataFrame output
”polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') SomClassifier¶
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weightparameter inscore.- Returns:
self – The updated object.
- Return type:
object
- topographic_function(X)¶
Compute the topographic function for the SOM.
Measures topology preservation across all neighbourhood scales k. Positive k values detect fold-overs (map neighbours that are far apart in data space); negative k values detect tears (data neighbours that are far apart on the map). phi(0) = phi(-1) + phi(1).
Reference: Villmann et al., “Topology preservation in self-organizing feature maps: exact definition and measurement”, IEEE Trans. Neural Networks, 1997.
- Return type:
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]]- Parameters:
X (array_like of shape (n_samples, n_features)) – Data used to compute the topographic function.
- Returns:
Row 0: phi values; row 1: normalised k-axis in [-1, 1].
- Return type:
ndarray of shape (2, 2 * max_dist + 1)
- transform(X, y=None)¶
Calculate a non negative least squares mixture model of prototypes that approximate each sample.
- Return type:
ndarray- Parameters:
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – Data to transform.
y (Ignored.) – Not used, present here for API consistency by convention.
- Returns:
coefficients (np.ndarray of shape (n_samples, n_protoypes)) – Coefficients of the linear regression model.
Reference (Teuvo Kohonen, “Description of Input Patterns by)
Linear Mixtures of SOM Models”, Proceedings of the 6th International
Workshop on Self-Organizing Maps, 2007.