User Guide ===================== Both ``SomClassifier`` and ``SomVQ`` implement the scikit-learn API and can be used as drop-in replacements for other scikit-learn estimators, including full compatibility with ``Pipeline``, ``GridSearchCV``, and ``cross_val_score``. Key Parameters -------------- Both estimators share the following most important parameters: - ``spreading_factor`` (default 0.5) — controls the growing threshold. Higher values produce more neurons and finer resolution; lower values produce fewer neurons. - ``max_neurons`` (default 100) — hard upper limit on the number of neurons. - ``n_iter`` (default 500) — maximum number of training epochs. - ``metric`` (default ``"euclidean"``) — distance metric; ``"cosine"`` is also supported. Classification -------------- .. code-block:: python from dbgsom import SomClassifier from sklearn.datasets import load_digits from sklearn.model_selection import train_test_split X, y = load_digits(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42) clf = SomClassifier(spreading_factor=0.5, max_neurons=80) clf.fit(X_train, y_train) clf.score(X_test, y_test) .. code-block:: pycon >>> 0.9333... .. code-block:: python clf.predict(X_test) .. code-block:: pycon >>> array([0, 1, 8, ..., 8, 9, 6]) .. code-block:: python clf.predict_proba(X_test) # class probability per sample Clustering / Vector Quantization --------------------------------- .. code-block:: python from dbgsom import SomVQ from sklearn.datasets import load_digits X, y = load_digits(return_X_y=True) som = SomVQ(spreading_factor=0.5, max_neurons=80) labels = som.fit_predict(X) # fit and assign cluster labels in one step .. code-block:: python som.quantization_error_ # average distance from samples to their prototype som.topographic_error_ # fraction of samples with topographic errors som.n_iter_ # number of epochs actually used Transform --------- Both estimators implement ``transform()``, which represents each sample as a sparse non-negative linear combination of the prototype weight vectors. This yields an ``(n_samples, n_prototypes)`` coefficient matrix useful for downstream tasks. .. code-block:: python coefs = som.transform(X) # shape (n_samples, n_prototypes) Reference: Teuvo Kohonen, *Description of Input Patterns by Linear Mixtures of SOM Models*, 2007. scikit-learn Integration ------------------------ Because both estimators follow the scikit-learn API, they work with standard tools: .. code-block:: python from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler pipe = Pipeline([ ("scaler", StandardScaler()), ("som", SomVQ(spreading_factor=0.5, max_neurons=80)), ]) pipe.fit(X)