API Reference

Clustering

class dbgsom.SomVQ.SomVQ(n_iter=500, lambda_=115.0, sigma_start=None, sigma_end=None, sigma_fine=None, vertical_growth=False, decay_function='exponential', neighborhood_function='gaussian', neighborhood_cutoff=3.0, verbose=False, coarse_training_frac=0.5, random_state=None, convergence_threshold=0.001, max_neurons=None, metric='euclidean', growth_criterion='quantization_error', min_samples_vertical_growth=100, tau_2=0.5, n_jobs=1, winner_stability_threshold=0.01, pointer_search='fine', cutgauss_phase='fine', smoothing_steps=0, smoothing_epsilon=0.5)

Bases: TransformerMixin, ClusterMixin, BaseSom

Directed Batch Growing SOM for unsupervised clustering and vector quantization.

See BaseSom for all parameters.

labels_

Cluster index of each training sample.

Type:

ndarray of shape (n_samples,)

som_

Graph containing neurons with weight, error, hit_count attributes.

Type:

networkx.Graph

weights_

Learned prototype weight vectors.

Type:

ndarray of shape (n_prototypes, n_features)

topographic_error_

Fraction of samples whose two nearest prototypes are not grid-adjacent.

Type:

float

quantization_error_

Mean distance from each training sample to its nearest prototype.

Type:

float

calculate_quantization_error(X)

Return the average distance from each sample to the nearest prototype.

Return type:

float

Parameters:

X (array_like of shape (n_samples, n_features)) – Data to quantize.

Returns:

error – Average distance from each sample to the nearest prototype.

Return type:

float

fit(X, y=None)

Train SOM on training data.

Return type:

Self

Parameters:
  • X (array_like of shape (n_samples, n_features)) – Training data.

  • y (array_like of shape (n_samples), optional) – Class labels of the samples.

Returns:

self – Trained estimator

Return type:

DBGSOM

fit_predict(X, y=None, **kwargs)

Perform clustering on X and returns cluster labels.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input data.

  • y (Ignored) – Not used, present for API consistency by convention.

  • **kwargs (dict) –

    Arguments to be passed to fit.

    Added in version 1.4.

Returns:

labels – Cluster labels.

Return type:

ndarray of shape (n_samples,), dtype=np.int64

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters. Pass only if the estimator accepts additional params in its fit method.

Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing – A MetadataRequest encapsulating routing information.

Return type:

MetadataRequest

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

plot(color=None, pointsize=None, layout='grid', palette='magma_r', X=None)

Plot the SOM neurons and their neighbourhood edges using seaborn objects.

Edges are drawn first as grey lines; nodes are drawn on top and can be colour- and size-coded by any node attribute stored in the graph.

Return type:

Plot

Parameters:
  • color ({'label', 'epoch_created', 'error', 'average_distance', 'density',) – ‘hit_count’}, optional Node attribute mapped to colour. Numeric attributes with all identical values are cast to string to avoid a degenerate continuous scale.

  • pointsize ({'label', 'epoch_created', 'error', 'average_distance',) – ‘density’, ‘hit_count’}, optional Node attribute mapped to point size.

  • layout ({'grid', 'pca'}, default 'grid') –

    Algorithm used to compute node positions.

    'grid'

    Neurons are placed at their integer SOM grid coordinates. Preserves the topological map structure.

    'pca'

    Weight vectors projected to 2-D with PCA. Node positions reflect the principal directions of variance in feature space.

  • palette (str, default 'magma_r') – Seaborn / Matplotlib colormap name applied to the colour mapping.

  • X (array-like of shape (n_samples, n_features), optional) – Training data used to fit the PCA basis when layout='pca'. When provided, PCA is fit on X and the weight vectors are projected into that space, yielding components aligned with the true data variance. When None (default), PCA is fit directly on the weight vectors.

predict(X)

Predict the closest neuron each sample in X belongs to.

Return type:

ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]

Parameters:

X ({array-like, sparse matrix} of shape (n_samples, n_features)) – New data to predict.

Returns:

labels – Contiguous cluster index of the best matching prototype.

Return type:

ndarray of shape (n_samples,)

set_output(*, transform=None)

Set output container.

Refer to the user guide for more details and sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.

Parameters:

transform ({"default", "pandas", "polars"}, default=None) –

Configure output of transform and fit_transform.

  • ”default”: Default output format of a transformer

  • ”pandas”: DataFrame output

  • ”polars”: Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:

self – Estimator instance.

Return type:

estimator instance

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

topographic_function(X)

Compute the topographic function for the SOM.

Measures topology preservation across all neighbourhood scales k. Positive k values detect fold-overs (map neighbours that are far apart in data space); negative k values detect tears (data neighbours that are far apart on the map). phi(0) = phi(-1) + phi(1).

Reference: Villmann et al., “Topology preservation in self-organizing feature maps: exact definition and measurement”, IEEE Trans. Neural Networks, 1997.

Return type:

ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]

Parameters:

X (array_like of shape (n_samples, n_features)) – Data used to compute the topographic function.

Returns:

Row 0: phi values; row 1: normalised k-axis in [-1, 1].

Return type:

ndarray of shape (2, 2 * max_dist + 1)

transform(X, y=None)

Calculate a non negative least squares mixture model of prototypes that approximate each sample.

Return type:

ndarray

Parameters:
  • X ({array-like, sparse matrix} of shape (n_samples, n_features)) – Data to transform.

  • y (Ignored.) – Not used, present here for API consistency by convention.

Returns:

  • coefficients (np.ndarray of shape (n_samples, n_protoypes)) – Coefficients of the linear regression model.

  • Reference (Teuvo Kohonen, “Description of Input Patterns by)

  • Linear Mixtures of SOM Models”, Proceedings of the 6th International

  • Workshop on Self-Organizing Maps, 2007.

Classifier

class dbgsom.SomClassifier.SomClassifier(n_iter=500, lambda_=115.0, sigma_start=None, sigma_end=None, sigma_fine=None, vertical_growth=False, decay_function='exponential', neighborhood_function='gaussian', neighborhood_cutoff=3.0, verbose=False, coarse_training_frac=0.5, random_state=None, convergence_threshold=0.001, max_neurons=None, metric='euclidean', growth_criterion='quantization_error', min_samples_vertical_growth=100, tau_2=0.5, n_jobs=1, winner_stability_threshold=0.01, pointer_search='fine', cutgauss_phase='fine', smoothing_steps=0, smoothing_epsilon=0.5)

Bases: TransformerMixin, ClassifierMixin, BaseSom

Directed Batch Growing SOM for supervised classification.

See BaseSom for all parameters.

labels_

Predicted class label of each training sample.

Type:

ndarray of shape (n_samples,)

classes_

Unique class labels seen during fit.

Type:

ndarray of shape (n_classes,)

som_

Graph containing neurons with weight, label, probabilities attributes.

Type:

networkx.Graph

weights_

Learned prototype weight vectors.

Type:

ndarray of shape (n_prototypes, n_features)

topographic_error_

Fraction of samples whose two nearest prototypes are not grid-adjacent.

Type:

float

quantization_error_

Mean distance from each training sample to its nearest prototype.

Type:

float

calculate_quantization_error(X)

Return the average distance from each sample to the nearest prototype.

Return type:

float

Parameters:

X (array_like of shape (n_samples, n_features)) – Data to quantize.

Returns:

error – Average distance from each sample to the nearest prototype.

Return type:

float

fit(X, y=None)

Train SomClassifier on labelled data.

Return type:

SomClassifier

Parameters:
  • X (array_like of shape (n_samples, n_features)) – Training data.

  • y (array_like of shape (n_samples,)) – Class labels. Required for the classifier.

Returns:

self – Trained estimator.

Return type:

SomClassifier

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters. Pass only if the estimator accepts additional params in its fit method.

Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing – A MetadataRequest encapsulating routing information.

Return type:

MetadataRequest

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

plot(color=None, pointsize=None, layout='grid', palette='magma_r', X=None)

Plot the SOM neurons and their neighbourhood edges using seaborn objects.

Edges are drawn first as grey lines; nodes are drawn on top and can be colour- and size-coded by any node attribute stored in the graph.

Return type:

Plot

Parameters:
  • color ({'label', 'epoch_created', 'error', 'average_distance', 'density',) – ‘hit_count’}, optional Node attribute mapped to colour. Numeric attributes with all identical values are cast to string to avoid a degenerate continuous scale.

  • pointsize ({'label', 'epoch_created', 'error', 'average_distance',) – ‘density’, ‘hit_count’}, optional Node attribute mapped to point size.

  • layout ({'grid', 'pca'}, default 'grid') –

    Algorithm used to compute node positions.

    'grid'

    Neurons are placed at their integer SOM grid coordinates. Preserves the topological map structure.

    'pca'

    Weight vectors projected to 2-D with PCA. Node positions reflect the principal directions of variance in feature space.

  • palette (str, default 'magma_r') – Seaborn / Matplotlib colormap name applied to the colour mapping.

  • X (array-like of shape (n_samples, n_features), optional) – Training data used to fit the PCA basis when layout='pca'. When provided, PCA is fit on X and the weight vectors are projected into that space, yielding components aligned with the true data variance. When None (default), PCA is fit directly on the weight vectors.

predict(X)

Predict class labels for samples in X.

Return type:

ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]

Parameters:

X ({array-like, sparse matrix} of shape (n_samples, n_features)) – New data to predict.

Returns:

labels – Predicted class labels for samples in X.

Return type:

ndarray of shape (n_samples,)

predict_proba(X, y=None)

Predict the probability of each class and each sample.

Return type:

ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]

Parameters:
  • X ({array-like, sparse matrix} of shape (n_samples, n_features)) – New data to predict.

  • y (Ignored. Only accepted for API compliance.)

Returns:

  • Probabilities (array of shape (n_samples, n_classes))

  • Returns the probability of the sample for each class in the model, where

  • classes are ordered as they are in self.classes_.

score(X, y, sample_weight=None)

Return accuracy on provided data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Test samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True labels for X.

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns:

score – Mean accuracy of self.predict(X) w.r.t. y.

Return type:

float

set_output(*, transform=None)

Set output container.

Refer to the user guide for more details and sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.

Parameters:

transform ({"default", "pandas", "polars"}, default=None) –

Configure output of transform and fit_transform.

  • ”default”: Default output format of a transformer

  • ”pandas”: DataFrame output

  • ”polars”: Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:

self – Estimator instance.

Return type:

estimator instance

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') SomClassifier

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

topographic_function(X)

Compute the topographic function for the SOM.

Measures topology preservation across all neighbourhood scales k. Positive k values detect fold-overs (map neighbours that are far apart in data space); negative k values detect tears (data neighbours that are far apart on the map). phi(0) = phi(-1) + phi(1).

Reference: Villmann et al., “Topology preservation in self-organizing feature maps: exact definition and measurement”, IEEE Trans. Neural Networks, 1997.

Return type:

ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]

Parameters:

X (array_like of shape (n_samples, n_features)) – Data used to compute the topographic function.

Returns:

Row 0: phi values; row 1: normalised k-axis in [-1, 1].

Return type:

ndarray of shape (2, 2 * max_dist + 1)

transform(X, y=None)

Calculate a non negative least squares mixture model of prototypes that approximate each sample.

Return type:

ndarray

Parameters:
  • X ({array-like, sparse matrix} of shape (n_samples, n_features)) – Data to transform.

  • y (Ignored.) – Not used, present here for API consistency by convention.

Returns:

  • coefficients (np.ndarray of shape (n_samples, n_protoypes)) – Coefficients of the linear regression model.

  • Reference (Teuvo Kohonen, “Description of Input Patterns by)

  • Linear Mixtures of SOM Models”, Proceedings of the 6th International

  • Workshop on Self-Organizing Maps, 2007.