iCLUTO

About iCLUTO

iCLUTO stands for improved CLUstering TOolkit. It provides various functions that help with data analysis in Break Junction experiment. It clusters conductance traces (i.e. vectors).

iCLUTO uses both main machine learning approches:

  • unsupervised machine learning
  • supervised machine learning

Supervised requires a labled dataset, which is usually very hard to obtain. On the other hand an unsupervised approach requires only conductance traces. More on both approaches in Algorithms

Installing and Running iCLUTO

As of now, you can run icluto cluster for running Unsupervised clustering.

How to install

Fedora

On Fedora Workstation one might install "Development Tools" group using dnf:

sudo dnf group install development-tools

the mentioned group does not have g++ and python3-devel so another step is required:

sudo dnf install g++ libstdc++-devel python3-devel

For users

iCLUTO runs on Python 3.9 or newer.

Download the .whl package HERE.

Create a virtual environment venv.

python3 -m venv venv

Activate

source venv/bin/activate

Install the package using pip

pip3 install PATH/TO/icluto-*.whl

One can verify that iCluto installed successfully with

icluto --help

Tested on: - Fedora Python 3.13.9 - Ubuntu Python 3.12.3

iCluto installation for developers

We are using Poetry as dependency and package management. Get Poetry by running

curl -sSL https://install.python-poetry.org | python3 -

Get iCLUTO from Gitlab

git clone https://gitlab.fel.cvut.cz/klimtoli/icluto-cli.git

Go to iCluto's directory

cd icluto-cli

and run Poetry install

poetry install

How to run

Make sure your virtual environment is activated.

Run unsupervised clustering

Run with

icluto cluster -cfg config_file.yaml [--no-plot]

Use --no-plot to disable plotting of histograms.

An example of config file in YAML

# Comments begins with # and everything after # is ignored
# Example on how a config file is structured.
traces:
  # If loading .txt files, one can list them using dashes
  # - /Users/oliverklimt/uni/BJM/data_raw/2204IVC.txt
  # - /Users/oliverklimt/uni/BJM/data_raw/2204IVC4.txt
  # Files .npy, .bin can be loaded directly
  - /Users/oliverklimt/uni/BJM/icluto-cli/experiment/new_iter/2204_tmp_filtered.npy
output folder: ./out

# Performs tunneling current analysis
analyze: True

# Saving only labels can save space
save:
  traces: False
  labels: True
  filtered out traces: True
  good traces: True
  format: "npy" # "bin", "txt" or "npy"

plot: True
plots:
  size x: 9 # in inches
  size y: 9 # in inches
  unify after limit: True # Conductance values after LIMIT (typically 1e-6) are replaced by the limit value.
  histograms:
    number of bins x: 30
    number of bins y: 30

features:
  - type: histogram
    histogram vector length: 350
    PCA dim: 32
  - type: traces
    max length: "auto"
    PCA dim: 64

k-means:
  run: True
  k-min: 2
  k-max: 5
  n-init: 10 # number of initializations of k means algorithm

BIRCH:
  run: False
 # TODO: BIRCH params

dbscan:
  run: True
  sweep epsilon start: 0.01
  sweep epsilon stop: 2
  sweep number of points: 3 # has to be an integer not float!
  min cluster size: 35

hdbscan:
  run: False
  min cluster size: 40