Morphelia

Morphological single cell analysis in python

This is a short introduction into the base functionality of Morphelia.

[ ]:
%load_ext autoreload
%autoreload 2

import numpy as np
import scanpy as sc
import pandas as pd
import anndata as ad

import morphelia
from morphelia.tools import LoadPlate

Step 1: Load Cellprofiler output into an annotated dataset

If used “ExportToSpreadSheet” module, the output directory contains a .csv-file for every mask with a row per sample and features as columns. We want to merge the information about nuclei, cells and cytoplasm (that we got from their respective mask) and store morphological features along with cell annotations (like plate name, well, field, etc.).

AnnData is a popular library for single cell analysis, typically single-cell RNA-sequencing. It uses HDF5 as dataformat. More information here: https://anndata.readthedocs.io/en/stable/index.html

[8]:
# path to plate directory
path = '../data/cp_output/'

plate = LoadPlate(path,
                  obj_sfx=".txt",   # --> suffix
                  obj_delimiter="\t",   # --> delimiter of values
                  treat_file="Treatment")   # --> name of treatment file

plate.load()    # --> merge and load to pandas
plate = plate.to_anndata()  # --> convert to anndata

Let’s visualize our AnnData object. We have 477 cells and 2961 features. The observations (obs) indicate our annotations.

[4]:
plate
[4]:
AnnData object with n_obs × n_vars = 477 × 2961
    obs: 'ImageNumber', 'ObjectNumber', 'Metadata_Col', 'Metadata_Field', 'Metadata_FileLocation', 'Metadata_Frame', 'Metadata_Row', 'Metadata_Series', 'Metadata_Well', 'Cells_AreaShape_BoundingBoxArea', 'Cells_AreaShape_BoundingBoxMaximum_X', 'Cells_AreaShape_BoundingBoxMaximum_Y', 'Cells_AreaShape_BoundingBoxMinimum_X', 'Cells_AreaShape_BoundingBoxMinimum_Y', 'Cells_AreaShape_Center_X', 'Cells_AreaShape_Center_Y', 'Cells_AreaShape_EulerNumber', 'Cells_Children_Cytoplasm_Count', 'Cells_Children_Nuc_1_Count', 'Cells_Children_Nuclei_Count', 'Cells_Children_Primarieswithoutborder_Count', 'Cells_Children_Primarieswithoutborder_Count.1', 'Cells_Location_CenterMassIntensity_X_Actin', 'Cells_Location_CenterMassIntensity_X_Brightfield', 'Cells_Location_CenterMassIntensity_X_DNA', 'Cells_Location_CenterMassIntensity_X_Desmin', 'Cells_Location_CenterMassIntensity_X_PC', 'Cells_Location_CenterMassIntensity_Y_Actin', 'Cells_Location_CenterMassIntensity_Y_Brightfield', 'Cells_Location_CenterMassIntensity_Y_DNA', 'Cells_Location_CenterMassIntensity_Y_Desmin', 'Cells_Location_CenterMassIntensity_Y_PC', 'Cells_Location_CenterMassIntensity_Z_Actin', 'Cells_Location_CenterMassIntensity_Z_Brightfield', 'Cells_Location_CenterMassIntensity_Z_DNA', 'Cells_Location_CenterMassIntensity_Z_Desmin', 'Cells_Location_CenterMassIntensity_Z_PC', 'Cells_Location_Center_X', 'Cells_Location_Center_Y', 'Cells_Location_Center_Z', 'Cells_Location_MaxIntensity_X_Actin', 'Cells_Location_MaxIntensity_X_Brightfield', 'Cells_Location_MaxIntensity_X_DNA', 'Cells_Location_MaxIntensity_X_Desmin', 'Cells_Location_MaxIntensity_X_PC', 'Cells_Location_MaxIntensity_Y_Actin', 'Cells_Location_MaxIntensity_Y_Brightfield', 'Cells_Location_MaxIntensity_Y_DNA', 'Cells_Location_MaxIntensity_Y_Desmin', 'Cells_Location_MaxIntensity_Y_PC', 'Cells_Location_MaxIntensity_Z_Actin', 'Cells_Location_MaxIntensity_Z_Brightfield', 'Cells_Location_MaxIntensity_Z_DNA', 'Cells_Location_MaxIntensity_Z_Desmin', 'Cells_Location_MaxIntensity_Z_PC', 'Cells_Neighbors_FirstClosestObjectNumber_20', 'Cells_Neighbors_FirstClosestObjectNumber_Adjacent', 'Cells_Neighbors_NumberOfNeighbors_20', 'Cells_Neighbors_NumberOfNeighbors_Adjacent', 'Cells_Neighbors_SecondClosestObjectNumber_20', 'Cells_Neighbors_SecondClosestObjectNumber_Adjacent', 'Cells_Number_Object_Number', 'Cells_Parent_Nuc_10', 'Cells_Parent_Primarieswithoutborder', 'Primarieswithoutborder_AreaShape_BoundingBoxArea', 'Primarieswithoutborder_AreaShape_BoundingBoxMaximum_X', 'Primarieswithoutborder_AreaShape_BoundingBoxMaximum_Y', 'Primarieswithoutborder_AreaShape_BoundingBoxMinimum_X', 'Primarieswithoutborder_AreaShape_BoundingBoxMinimum_Y', 'Primarieswithoutborder_AreaShape_Center_X', 'Primarieswithoutborder_AreaShape_Center_Y', 'Primarieswithoutborder_AreaShape_EulerNumber', 'Primarieswithoutborder_Children_Cells_Count', 'Primarieswithoutborder_Children_Cytoplasm_Count', 'Primarieswithoutborder_Children_Primaries_Count', 'Primarieswithoutborder_Location_CenterMassIntensity_X_Actin', 'Primarieswithoutborder_Location_CenterMassIntensity_X_Brightfield', 'Primarieswithoutborder_Location_CenterMassIntensity_X_DNA', 'Primarieswithoutborder_Location_CenterMassIntensity_X_Desmin', 'Primarieswithoutborder_Location_CenterMassIntensity_X_PC', 'Primarieswithoutborder_Location_CenterMassIntensity_Y_Actin', 'Primarieswithoutborder_Location_CenterMassIntensity_Y_Brightfield', 'Primarieswithoutborder_Location_CenterMassIntensity_Y_DNA', 'Primarieswithoutborder_Location_CenterMassIntensity_Y_Desmin', 'Primarieswithoutborder_Location_CenterMassIntensity_Y_PC', 'Primarieswithoutborder_Location_CenterMassIntensity_Z_Actin', 'Primarieswithoutborder_Location_CenterMassIntensity_Z_Brightfield', 'Primarieswithoutborder_Location_CenterMassIntensity_Z_DNA', 'Primarieswithoutborder_Location_CenterMassIntensity_Z_Desmin', 'Primarieswithoutborder_Location_CenterMassIntensity_Z_PC', 'Primarieswithoutborder_Location_Center_X', 'Primarieswithoutborder_Location_Center_Y', 'Primarieswithoutborder_Location_Center_Z', 'Primarieswithoutborder_Location_MaxIntensity_X_Actin', 'Primarieswithoutborder_Location_MaxIntensity_X_Brightfield', 'Primarieswithoutborder_Location_MaxIntensity_X_DNA', 'Primarieswithoutborder_Location_MaxIntensity_X_Desmin', 'Primarieswithoutborder_Location_MaxIntensity_X_PC', 'Primarieswithoutborder_Location_MaxIntensity_Y_Actin', 'Primarieswithoutborder_Location_MaxIntensity_Y_Brightfield', 'Primarieswithoutborder_Location_MaxIntensity_Y_DNA', 'Primarieswithoutborder_Location_MaxIntensity_Y_Desmin', 'Primarieswithoutborder_Location_MaxIntensity_Y_PC', 'Primarieswithoutborder_Location_MaxIntensity_Z_Actin', 'Primarieswithoutborder_Location_MaxIntensity_Z_Brightfield', 'Primarieswithoutborder_Location_MaxIntensity_Z_DNA', 'Primarieswithoutborder_Location_MaxIntensity_Z_Desmin', 'Primarieswithoutborder_Location_MaxIntensity_Z_PC', 'Primarieswithoutborder_Number_Object_Number', 'Primarieswithoutborder_Parent_Cells', 'Primarieswithoutborder_Parent_Nuc_10', 'Cytoplasm_AreaShape_BoundingBoxArea', 'Cytoplasm_AreaShape_BoundingBoxMaximum_X', 'Cytoplasm_AreaShape_BoundingBoxMaximum_Y', 'Cytoplasm_AreaShape_BoundingBoxMinimum_X', 'Cytoplasm_AreaShape_BoundingBoxMinimum_Y', 'Cytoplasm_AreaShape_Center_X', 'Cytoplasm_AreaShape_Center_Y', 'Cytoplasm_AreaShape_EulerNumber', 'Cytoplasm_Location_CenterMassIntensity_X_Actin', 'Cytoplasm_Location_CenterMassIntensity_X_Brightfield', 'Cytoplasm_Location_CenterMassIntensity_X_DNA', 'Cytoplasm_Location_CenterMassIntensity_X_Desmin', 'Cytoplasm_Location_CenterMassIntensity_X_PC', 'Cytoplasm_Location_CenterMassIntensity_Y_Actin', 'Cytoplasm_Location_CenterMassIntensity_Y_Brightfield', 'Cytoplasm_Location_CenterMassIntensity_Y_DNA', 'Cytoplasm_Location_CenterMassIntensity_Y_Desmin', 'Cytoplasm_Location_CenterMassIntensity_Y_PC', 'Cytoplasm_Location_CenterMassIntensity_Z_Actin', 'Cytoplasm_Location_CenterMassIntensity_Z_Brightfield', 'Cytoplasm_Location_CenterMassIntensity_Z_DNA', 'Cytoplasm_Location_CenterMassIntensity_Z_Desmin', 'Cytoplasm_Location_CenterMassIntensity_Z_PC', 'Cytoplasm_Location_Center_X', 'Cytoplasm_Location_Center_Y', 'Cytoplasm_Location_MaxIntensity_X_Actin', 'Cytoplasm_Location_MaxIntensity_X_Brightfield', 'Cytoplasm_Location_MaxIntensity_X_DNA', 'Cytoplasm_Location_MaxIntensity_X_Desmin', 'Cytoplasm_Location_MaxIntensity_X_PC', 'Cytoplasm_Location_MaxIntensity_Y_Actin', 'Cytoplasm_Location_MaxIntensity_Y_Brightfield', 'Cytoplasm_Location_MaxIntensity_Y_DNA', 'Cytoplasm_Location_MaxIntensity_Y_Desmin', 'Cytoplasm_Location_MaxIntensity_Y_PC', 'Cytoplasm_Location_MaxIntensity_Z_Actin', 'Cytoplasm_Location_MaxIntensity_Z_Brightfield', 'Cytoplasm_Location_MaxIntensity_Z_DNA', 'Cytoplasm_Location_MaxIntensity_Z_Desmin', 'Cytoplasm_Location_MaxIntensity_Z_PC', 'Cytoplasm_Number_Object_Number', 'Cytoplasm_Parent_Cells', 'Cytoplasm_Parent_Primarieswithoutborder', 'Metadata_Treatment', 'Metadata_Concentration', 'Metadata_Unit'

The annotations are stored as pandas DataFrame, all other values are stored as a Numpy array. We can get the annotations and values as following:

[5]:
# annotations
plate.obs.head()
[5]:
ImageNumber ObjectNumber Metadata_Col Metadata_Field Metadata_FileLocation Metadata_Frame Metadata_Row Metadata_Series Metadata_Well Cells_AreaShape_BoundingBoxArea ... Cytoplasm_Location_MaxIntensity_Z_Brightfield Cytoplasm_Location_MaxIntensity_Z_DNA Cytoplasm_Location_MaxIntensity_Z_Desmin Cytoplasm_Location_MaxIntensity_Z_PC Cytoplasm_Number_Object_Number Cytoplasm_Parent_Cells Cytoplasm_Parent_Primarieswithoutborder Metadata_Treatment Metadata_Concentration Metadata_Unit
0 1 1 3 6 NaN 0 E 0 E3 9044 ... 0.0 0.0 0.0 0.0 1 1 1 Pe 0.001 mg/ml
1 1 2 3 6 NaN 0 E 0 E3 16093 ... 0.0 0.0 0.0 0.0 2 2 2 Pe 0.001 mg/ml
2 1 3 3 6 NaN 0 E 0 E3 15456 ... 0.0 0.0 0.0 0.0 3 3 3 Pe 0.001 mg/ml
3 1 4 3 6 NaN 0 E 0 E3 15232 ... 0.0 0.0 0.0 0.0 4 4 4 Pe 0.001 mg/ml
4 1 5 3 6 NaN 0 E 0 E3 14940 ... 0.0 0.0 0.0 0.0 5 5 5 Pe 0.001 mg/ml

5 rows × 157 columns

[6]:
# values
plate.X
[6]:
array([[4.92600000e+03, 2.03454280e+00, 8.00544560e-01, ...,
        8.68238322e-03, 1.29114211e-01, 6.12741292e-01],
       [9.32800000e+03, 1.90278614e+00, 7.68818259e-01, ...,
        8.51453468e-03, 9.40871313e-02, 6.02670312e-01],
       [9.46200000e+03, 2.28528094e+00, 8.81704628e-01, ...,
        8.34668521e-03, 6.98977634e-02, 5.88120878e-01],
       ...,
       [5.29400000e+03, 1.79038346e+00, 8.93598557e-01, ...,
        1.79789420e-02, 2.43095294e-01, 7.64938593e-01],
       [3.34600000e+03, 1.45021796e+00, 7.79904485e-01, ...,
        1.30769815e-02, 1.53440908e-01, 7.59849727e-01],
       [5.78600000e+03, 2.22282672e+00, 9.32412803e-01, ...,
        1.34393834e-02, 1.65487900e-01, 7.34824896e-01]], dtype=float32)
[7]:
# feature names
plate.var
[7]:
Cells_AreaShape_Area
Cells_AreaShape_Compactness
Cells_AreaShape_Eccentricity
Cells_AreaShape_EquivalentDiameter
Cells_AreaShape_Extent
...
Cytoplasm_Intensity_UpperQuartileIntensity_Actin
Cytoplasm_Intensity_UpperQuartileIntensity_Brightfield
Cytoplasm_Intensity_UpperQuartileIntensity_DNA
Cytoplasm_Intensity_UpperQuartileIntensity_Desmin
Cytoplasm_Intensity_UpperQuartileIntensity_PC

2961 rows × 0 columns

AnnData was build together with Scanpy (short for single cell analysis in python), another useful library we use. It contains functions for preprocessing, analysis and plotting. Even if a lot of functions were originally build for single-cell RNA-sequencing, we can also use them for image-based profiling. More information here: https://scanpy.readthedocs.io/en/stable/index.html

Let’s use Scanpy to plot some feature distributions.

[11]:
# list feature names
plate.var_names
[11]:
Index(['Cells_AreaShape_Area', 'Cells_AreaShape_Compactness',
       'Cells_AreaShape_Eccentricity', 'Cells_AreaShape_EquivalentDiameter',
       'Cells_AreaShape_Extent', 'Cells_AreaShape_FormFactor',
       'Cells_AreaShape_MajorAxisLength', 'Cells_AreaShape_MaxFeretDiameter',
       'Cells_AreaShape_MaximumRadius', 'Cells_AreaShape_MeanRadius',
       ...
       'Cytoplasm_Intensity_StdIntensity_Actin',
       'Cytoplasm_Intensity_StdIntensity_Brightfield',
       'Cytoplasm_Intensity_StdIntensity_DNA',
       'Cytoplasm_Intensity_StdIntensity_Desmin',
       'Cytoplasm_Intensity_StdIntensity_PC',
       'Cytoplasm_Intensity_UpperQuartileIntensity_Actin',
       'Cytoplasm_Intensity_UpperQuartileIntensity_Brightfield',
       'Cytoplasm_Intensity_UpperQuartileIntensity_DNA',
       'Cytoplasm_Intensity_UpperQuartileIntensity_Desmin',
       'Cytoplasm_Intensity_UpperQuartileIntensity_PC'],
      dtype='object', length=2961)
[14]:
# violing plot
sc.pl.violin(plate, ['Cells_AreaShape_Area', 'Primarieswithoutborder_AreaShape_Area'],
             jitter=0.4)
../_images/tutorials_morphelia_intro_13_0.png
[18]:
sc.pl.scatter(plate, x='Cells_AreaShape_Area', y='Primarieswithoutborder_AreaShape_Area',
              color='Cells_Intensity_MeanIntensity_Desmin')

../_images/tutorials_morphelia_intro_14_0.png

Finally, the anndata object can be stored in h5ad-format, a custom AnnData HDF5-format.

[3]:
plate.write('../data/plate_raw.h5ad')
C:\Users\amarx\Anaconda3\envs\morphelia\lib\site-packages\anndata\_core\anndata.py:1228: FutureWarning: The `inplace` parameter in pandas.Categorical.reorder_categories is deprecated and will be removed in a future version. Reordering categories will always return a new Categorical object.
  c.reorder_categories(natsorted(c.categories), inplace=True)
... storing 'Metadata_Row' as categorical
C:\Users\amarx\Anaconda3\envs\morphelia\lib\site-packages\anndata\_core\anndata.py:1228: FutureWarning: The `inplace` parameter in pandas.Categorical.reorder_categories is deprecated and will be removed in a future version. Reordering categories will always return a new Categorical object.
  c.reorder_categories(natsorted(c.categories), inplace=True)
... storing 'Metadata_Well' as categorical
C:\Users\amarx\Anaconda3\envs\morphelia\lib\site-packages\anndata\_core\anndata.py:1228: FutureWarning: The `inplace` parameter in pandas.Categorical.reorder_categories is deprecated and will be removed in a future version. Reordering categories will always return a new Categorical object.
  c.reorder_categories(natsorted(c.categories), inplace=True)
... storing 'Metadata_Treatment' as categorical
C:\Users\amarx\Anaconda3\envs\morphelia\lib\site-packages\anndata\_core\anndata.py:1228: FutureWarning: The `inplace` parameter in pandas.Categorical.reorder_categories is deprecated and will be removed in a future version. Reordering categories will always return a new Categorical object.
  c.reorder_categories(natsorted(c.categories), inplace=True)
... storing 'Metadata_Unit' as categorical

Step 2: Preprocessing

This includes the following steps:

  • filter dead cells and cells that have Nan-values for important features

    • dead cells a low Cell Area - Nuclei Area ratio

  • drop features that contain Nan-values

  • drop features that are duplicates of other features

  • drop invariant features

  • normalize the data

[22]:
# filter dead cells
plate = morphelia.pp.filter_debris(plate, show=True, verbose=True,
                                   max_quot=6)
7 cells filtered
../_images/tutorials_morphelia_intro_18_1.png
[23]:
# variables that should not contain any nan values
not_nan = ['Cells_AreaShape_Area', 'Cytoplasm_AreaShape_Area', 'Primarieswithoutborder_AreaShape_Area',
           'Cytoplasm_AreaShape_FormFactor']

len_before = len(plate)
plate = plate[~np.isnan(plate[:, not_nan].X).any(axis=1), :].copy()
plate = plate[~np.isinf(plate[:, not_nan].X).any(axis=1), :].copy()
print(f"{len_before - len(plate)} cells with nan values dropped")
0 cells with nan values dropped
[24]:


plate = morphelia.pp.drop_nan(plate, verbose=True) plate = morphelia.pp.drop_duplicates(plate, verbose=True) plate = morphelia.pp.drop_invariant(plate, verbose=True)
Dropped 317 duplicated features: Index(['Cells_Granularity_10_Actin.1', 'Cells_Granularity_10_Brightfield.1',
       'Cells_Granularity_10_DNA.1', 'Cells_Granularity_10_Desmin.1',
       'Cells_Granularity_10_PC.1', 'Cells_Granularity_11_Actin.1',
       'Cells_Granularity_11_Brightfield.1', 'Cells_Granularity_11_DNA.1',
       'Cells_Granularity_11_Desmin.1', 'Cells_Granularity_11_PC.1',
       ...
       'Primarieswithoutborder_Granularity_9_Desmin.1',
       'Primarieswithoutborder_Granularity_9_PC.1',
       'Primarieswithoutborder_RadialDistribution_ZernikePhase_Actin_0_0',
       'Primarieswithoutborder_RadialDistribution_ZernikePhase_Brightfield_0_0',
       'Primarieswithoutborder_RadialDistribution_ZernikePhase_DNA_0_0',
       'Primarieswithoutborder_RadialDistribution_ZernikePhase_DNA_2_0',
       'Primarieswithoutborder_RadialDistribution_ZernikePhase_Desmin_0_0',
       'Primarieswithoutborder_RadialDistribution_ZernikePhase_PC_0_0',
       'Cytoplasm_Intensity_MinIntensityEdge_DNA',
       'Cytoplasm_Intensity_MinIntensity_DNA'],
      dtype='object', length=317)
Dropped 1 invariant features: Index(['Cells_RadialDistribution_ZernikePhase_Actin_0_0'], dtype='object')
[25]:
plate = morphelia.pp.normalize(plate, method='standard',
                               by=None,
                               verbose=True)

Step 3: Feature extraction

[26]:
plate = morphelia.ft.drop_noise(plate, verbose=True)

Drop 2349 noisy features: Index(['Primarieswithoutborder_RadialDistribution_ZernikePhase_Brightfield_9_1',
       'Cells_RadialDistribution_ZernikePhase_DNA_8_8',
       'Primarieswithoutborder_RadialDistribution_ZernikePhase_Desmin_5_5',
       'Cells_RadialDistribution_ZernikePhase_PC_7_1',
       'Primarieswithoutborder_RadialDistribution_ZernikePhase_PC_4_2',
       'Primarieswithoutborder_RadialDistribution_ZernikePhase_Actin_1_1',
       'Cells_RadialDistribution_ZernikePhase_PC_5_3',
       'Cells_RadialDistribution_ZernikePhase_PC_8_6',
       'Primarieswithoutborder_RadialDistribution_ZernikePhase_Actin_9_9',
       'Cells_RadialDistribution_ZernikePhase_Desmin_5_3',
       ...
       'Primarieswithoutborder_Texture_SumVariance_PC_10_03_256',
       'Cytoplasm_Intensity_IntegratedIntensity_PC',
       'Primarieswithoutborder_Texture_SumVariance_PC_10_00_256',
       'Primarieswithoutborder_Texture_SumVariance_PC_3_03_256',
       'Primarieswithoutborder_Texture_SumVariance_PC_3_02_256',
       'Primarieswithoutborder_Texture_SumVariance_PC_3_01_256',
       'Cells_Intensity_IntegratedIntensity_PC',
       'Primarieswithoutborder_Texture_SumVariance_PC_3_00_256',
       'Cytoplasm_AreaShape_Area', 'Cells_AreaShape_Area'],
      dtype='object', length=2349)
[27]:
plate = morphelia.ft.drop_near_zero_variance(plate, verbose=True)

Iterating over features: 100%|██████████| 294/294 [00:02<00:00, 102.81it/s]
Drop 19 features with low variance: ['Primarieswithoutborder_Granularity_13_DNA', 'Primarieswithoutborder_Granularity_23_DNA', 'Cells_Granularity_23_Brightfield', 'Primarieswithoutborder_Granularity_28_DNA', 'Cells_Granularity_29_DNA', 'Cells_Granularity_26_PC', 'Primarieswithoutborder_RadialDistribution_ZernikePhase_Brightfield_2_0', 'Cells_RadialDistribution_ZernikePhase_Actin_2_0', 'Primarieswithoutborder_Granularity_26_DNA', 'Cells_Granularity_26_DNA', 'Cells_RadialDistribution_ZernikePhase_Desmin_2_0', 'Cells_Granularity_23_DNA', 'Primarieswithoutborder_Granularity_3_Brightfield', 'Cells_Granularity_28_DNA', 'Cells_RadialDistribution_ZernikePhase_DNA_2_0', 'Primarieswithoutborder_Granularity_23_PC', 'Primarieswithoutborder_Granularity_29_DNA', 'Primarieswithoutborder_Granularity_23_Brightfield', 'Primarieswithoutborder_Granularity_26_PC']

[28]:
plate = morphelia.ft.drop_outlier(plate, thresh=10, verbose=True)
Drop 24 features with outlier values: Index(['Cells_Texture_AngularSecondMoment_PC_20_01_256',
       'Cells_Texture_AngularSecondMoment_Actin_20_01_256',
       'Cells_Texture_AngularSecondMoment_Desmin_20_01_256',
       'Cells_Texture_AngularSecondMoment_Brightfield_20_01_256',
       'Primarieswithoutborder_Texture_AngularSecondMoment_DNA_10_01_256',
       'Primarieswithoutborder_Texture_AngularSecondMoment_Brightfield_10_01_256',
       'Primarieswithoutborder_Texture_Variance_Desmin_3_03_256',
       'Primarieswithoutborder_Texture_Variance_Desmin_10_00_256',
       'Primarieswithoutborder_Texture_Variance_Desmin_3_02_256',
       'Primarieswithoutborder_Texture_Variance_Desmin_3_01_256',
       'Primarieswithoutborder_Texture_Variance_Desmin_3_00_256',
       'Primarieswithoutborder_Texture_Variance_Desmin_10_02_256',
       'Primarieswithoutborder_Texture_Variance_Desmin_10_01_256',
       'Primarieswithoutborder_Texture_Variance_Desmin_10_03_256',
       'Primarieswithoutborder_Texture_SumVariance_Desmin_10_03_256',
       'Primarieswithoutborder_Texture_Contrast_Desmin_10_00_256',
       'Primarieswithoutborder_Texture_Contrast_Desmin_10_01_256',
       'Primarieswithoutborder_Texture_SumVariance_Desmin_10_02_256',
       'Primarieswithoutborder_Texture_Contrast_Desmin_10_02_256',
       'Primarieswithoutborder_Texture_SumVariance_Desmin_3_03_256',
       'Primarieswithoutborder_Texture_SumVariance_Desmin_3_02_256',
       'Primarieswithoutborder_Texture_SumVariance_Desmin_3_01_256',
       'Primarieswithoutborder_Texture_SumVariance_Desmin_3_00_256',
       'Primarieswithoutborder_Texture_Contrast_Desmin_10_03_256'],
      dtype='object')
[29]:
plate = morphelia.ft.drop_highly_correlated(plate, verbose=True, show=True)
Dropped 178 features: Index(['Primarieswithoutborder_Texture_InfoMeas1_Desmin_3_00_256',
       'Primarieswithoutborder_Texture_InfoMeas1_Desmin_10_02_256',
       'Primarieswithoutborder_Texture_InfoMeas1_Desmin_3_02_256',
       'Primarieswithoutborder_Texture_InfoMeas1_Desmin_3_01_256',
       'Primarieswithoutborder_Texture_InfoMeas1_Desmin_10_01_256',
       'Primarieswithoutborder_Texture_InfoMeas1_Desmin_10_00_256',
       'Cells_Texture_InfoMeas1_Desmin_3_01_256',
       'Cells_Texture_InfoMeas1_Desmin_3_03_256',
       'Primarieswithoutborder_Texture_InfoMeas1_Desmin_3_03_256',
       'Cells_Texture_InfoMeas1_Desmin_10_02_256',
       ...
       'Cells_Texture_SumVariance_Desmin_10_01_256',
       'Cells_Texture_Contrast_Desmin_20_03_256',
       'Cells_Texture_SumVariance_Desmin_10_00_256',
       'Cells_Texture_SumVariance_Desmin_10_02_256',
       'Cells_Intensity_IntegratedIntensity_Actin',
       'Cells_Texture_SumVariance_Desmin_3_03_256',
       'Cells_Texture_SumVariance_Desmin_3_01_256',
       'Cells_Texture_SumVariance_Desmin_3_02_256',
       'Cells_Texture_SumVariance_Desmin_3_00_256',
       'Cells_Intensity_IntegratedIntensity_Brightfield'],
      dtype='object', length=178)
[30]:
plate
[30]:
AnnData object with n_obs × n_vars = 468 × 73
    obs: 'ImageNumber', 'ObjectNumber', 'Metadata_Col', 'Metadata_Field', 'Metadata_FileLocation', 'Metadata_Frame', 'Metadata_Row', 'Metadata_Series', 'Metadata_Well', 'Cells_AreaShape_BoundingBoxArea', 'Cells_AreaShape_BoundingBoxMaximum_X', 'Cells_AreaShape_BoundingBoxMaximum_Y', 'Cells_AreaShape_BoundingBoxMinimum_X', 'Cells_AreaShape_BoundingBoxMinimum_Y', 'Cells_AreaShape_Center_X', 'Cells_AreaShape_Center_Y', 'Cells_AreaShape_EulerNumber', 'Cells_Children_Cytoplasm_Count', 'Cells_Children_Nuc_1_Count', 'Cells_Children_Nuclei_Count', 'Cells_Children_Primarieswithoutborder_Count', 'Cells_Children_Primarieswithoutborder_Count.1', 'Cells_Location_CenterMassIntensity_X_Actin', 'Cells_Location_CenterMassIntensity_X_Brightfield', 'Cells_Location_CenterMassIntensity_X_DNA', 'Cells_Location_CenterMassIntensity_X_Desmin', 'Cells_Location_CenterMassIntensity_X_PC', 'Cells_Location_CenterMassIntensity_Y_Actin', 'Cells_Location_CenterMassIntensity_Y_Brightfield', 'Cells_Location_CenterMassIntensity_Y_DNA', 'Cells_Location_CenterMassIntensity_Y_Desmin', 'Cells_Location_CenterMassIntensity_Y_PC', 'Cells_Location_CenterMassIntensity_Z_Actin', 'Cells_Location_CenterMassIntensity_Z_Brightfield', 'Cells_Location_CenterMassIntensity_Z_DNA', 'Cells_Location_CenterMassIntensity_Z_Desmin', 'Cells_Location_CenterMassIntensity_Z_PC', 'Cells_Location_Center_X', 'Cells_Location_Center_Y', 'Cells_Location_Center_Z', 'Cells_Location_MaxIntensity_X_Actin', 'Cells_Location_MaxIntensity_X_Brightfield', 'Cells_Location_MaxIntensity_X_DNA', 'Cells_Location_MaxIntensity_X_Desmin', 'Cells_Location_MaxIntensity_X_PC', 'Cells_Location_MaxIntensity_Y_Actin', 'Cells_Location_MaxIntensity_Y_Brightfield', 'Cells_Location_MaxIntensity_Y_DNA', 'Cells_Location_MaxIntensity_Y_Desmin', 'Cells_Location_MaxIntensity_Y_PC', 'Cells_Location_MaxIntensity_Z_Actin', 'Cells_Location_MaxIntensity_Z_Brightfield', 'Cells_Location_MaxIntensity_Z_DNA', 'Cells_Location_MaxIntensity_Z_Desmin', 'Cells_Location_MaxIntensity_Z_PC', 'Cells_Neighbors_FirstClosestObjectNumber_20', 'Cells_Neighbors_FirstClosestObjectNumber_Adjacent', 'Cells_Neighbors_NumberOfNeighbors_20', 'Cells_Neighbors_NumberOfNeighbors_Adjacent', 'Cells_Neighbors_SecondClosestObjectNumber_20', 'Cells_Neighbors_SecondClosestObjectNumber_Adjacent', 'Cells_Number_Object_Number', 'Cells_Parent_Nuc_10', 'Cells_Parent_Primarieswithoutborder', 'Primarieswithoutborder_AreaShape_BoundingBoxArea', 'Primarieswithoutborder_AreaShape_BoundingBoxMaximum_X', 'Primarieswithoutborder_AreaShape_BoundingBoxMaximum_Y', 'Primarieswithoutborder_AreaShape_BoundingBoxMinimum_X', 'Primarieswithoutborder_AreaShape_BoundingBoxMinimum_Y', 'Primarieswithoutborder_AreaShape_Center_X', 'Primarieswithoutborder_AreaShape_Center_Y', 'Primarieswithoutborder_AreaShape_EulerNumber', 'Primarieswithoutborder_Children_Cells_Count', 'Primarieswithoutborder_Children_Cytoplasm_Count', 'Primarieswithoutborder_Children_Primaries_Count', 'Primarieswithoutborder_Location_CenterMassIntensity_X_Actin', 'Primarieswithoutborder_Location_CenterMassIntensity_X_Brightfield', 'Primarieswithoutborder_Location_CenterMassIntensity_X_DNA', 'Primarieswithoutborder_Location_CenterMassIntensity_X_Desmin', 'Primarieswithoutborder_Location_CenterMassIntensity_X_PC', 'Primarieswithoutborder_Location_CenterMassIntensity_Y_Actin', 'Primarieswithoutborder_Location_CenterMassIntensity_Y_Brightfield', 'Primarieswithoutborder_Location_CenterMassIntensity_Y_DNA', 'Primarieswithoutborder_Location_CenterMassIntensity_Y_Desmin', 'Primarieswithoutborder_Location_CenterMassIntensity_Y_PC', 'Primarieswithoutborder_Location_CenterMassIntensity_Z_Actin', 'Primarieswithoutborder_Location_CenterMassIntensity_Z_Brightfield', 'Primarieswithoutborder_Location_CenterMassIntensity_Z_DNA', 'Primarieswithoutborder_Location_CenterMassIntensity_Z_Desmin', 'Primarieswithoutborder_Location_CenterMassIntensity_Z_PC', 'Primarieswithoutborder_Location_Center_X', 'Primarieswithoutborder_Location_Center_Y', 'Primarieswithoutborder_Location_Center_Z', 'Primarieswithoutborder_Location_MaxIntensity_X_Actin', 'Primarieswithoutborder_Location_MaxIntensity_X_Brightfield', 'Primarieswithoutborder_Location_MaxIntensity_X_DNA', 'Primarieswithoutborder_Location_MaxIntensity_X_Desmin', 'Primarieswithoutborder_Location_MaxIntensity_X_PC', 'Primarieswithoutborder_Location_MaxIntensity_Y_Actin', 'Primarieswithoutborder_Location_MaxIntensity_Y_Brightfield', 'Primarieswithoutborder_Location_MaxIntensity_Y_DNA', 'Primarieswithoutborder_Location_MaxIntensity_Y_Desmin', 'Primarieswithoutborder_Location_MaxIntensity_Y_PC', 'Primarieswithoutborder_Location_MaxIntensity_Z_Actin', 'Primarieswithoutborder_Location_MaxIntensity_Z_Brightfield', 'Primarieswithoutborder_Location_MaxIntensity_Z_DNA', 'Primarieswithoutborder_Location_MaxIntensity_Z_Desmin', 'Primarieswithoutborder_Location_MaxIntensity_Z_PC', 'Primarieswithoutborder_Number_Object_Number', 'Primarieswithoutborder_Parent_Cells', 'Primarieswithoutborder_Parent_Nuc_10', 'Cytoplasm_AreaShape_BoundingBoxArea', 'Cytoplasm_AreaShape_BoundingBoxMaximum_X', 'Cytoplasm_AreaShape_BoundingBoxMaximum_Y', 'Cytoplasm_AreaShape_BoundingBoxMinimum_X', 'Cytoplasm_AreaShape_BoundingBoxMinimum_Y', 'Cytoplasm_AreaShape_Center_X', 'Cytoplasm_AreaShape_Center_Y', 'Cytoplasm_AreaShape_EulerNumber', 'Cytoplasm_Location_CenterMassIntensity_X_Actin', 'Cytoplasm_Location_CenterMassIntensity_X_Brightfield', 'Cytoplasm_Location_CenterMassIntensity_X_DNA', 'Cytoplasm_Location_CenterMassIntensity_X_Desmin', 'Cytoplasm_Location_CenterMassIntensity_X_PC', 'Cytoplasm_Location_CenterMassIntensity_Y_Actin', 'Cytoplasm_Location_CenterMassIntensity_Y_Brightfield', 'Cytoplasm_Location_CenterMassIntensity_Y_DNA', 'Cytoplasm_Location_CenterMassIntensity_Y_Desmin', 'Cytoplasm_Location_CenterMassIntensity_Y_PC', 'Cytoplasm_Location_CenterMassIntensity_Z_Actin', 'Cytoplasm_Location_CenterMassIntensity_Z_Brightfield', 'Cytoplasm_Location_CenterMassIntensity_Z_DNA', 'Cytoplasm_Location_CenterMassIntensity_Z_Desmin', 'Cytoplasm_Location_CenterMassIntensity_Z_PC', 'Cytoplasm_Location_Center_X', 'Cytoplasm_Location_Center_Y', 'Cytoplasm_Location_MaxIntensity_X_Actin', 'Cytoplasm_Location_MaxIntensity_X_Brightfield', 'Cytoplasm_Location_MaxIntensity_X_DNA', 'Cytoplasm_Location_MaxIntensity_X_Desmin', 'Cytoplasm_Location_MaxIntensity_X_PC', 'Cytoplasm_Location_MaxIntensity_Y_Actin', 'Cytoplasm_Location_MaxIntensity_Y_Brightfield', 'Cytoplasm_Location_MaxIntensity_Y_DNA', 'Cytoplasm_Location_MaxIntensity_Y_Desmin', 'Cytoplasm_Location_MaxIntensity_Y_PC', 'Cytoplasm_Location_MaxIntensity_Z_Actin', 'Cytoplasm_Location_MaxIntensity_Z_Brightfield', 'Cytoplasm_Location_MaxIntensity_Z_DNA', 'Cytoplasm_Location_MaxIntensity_Z_Desmin', 'Cytoplasm_Location_MaxIntensity_Z_PC', 'Cytoplasm_Number_Object_Number', 'Cytoplasm_Parent_Cells', 'Cytoplasm_Parent_Primarieswithoutborder', 'Metadata_Treatment', 'Metadata_Concentration', 'Metadata_Unit'
    uns: 'duplicated_feats', 'invariant_feats', 'noisy_feats', 'near_zero_variance_feats', 'outlier_feats', 'highly_correlated'

We reduced the features from 2961 to 73.

Step 4: Downstream Analysis

This includes for example: * Principal Component Analysis * Manifold Learning (UMAP, t-SNE)

[31]:
# calculate PCA
sc.tl.pca(plate)

[32]:
# plot the variance ratio
morphelia.pl.pca_variance_ratio(plate)

../_images/tutorials_morphelia_intro_31_0.png

The first 20 compoinents cover ~95% of the variance.

[33]:
# calculate k-nn graph
sc.pp.neighbors(plate, n_neighbors=8, n_pcs=20)
[34]:
# calculate UMAP
sc.tl.umap(plate)
[35]:
# Plot UMAP
sc.pl.umap(plate, color="Metadata_Treatment")

../_images/tutorials_morphelia_intro_35_0.png

Other

Aggregation

Often, only aggregated profiles, so-called consensus profiles, are used for downstream analysis. Aggregation can be done per well or treatment by mean or median aggregation.

[40]:

plate_agg = morphelia.pp.aggregate(plate, by='Metadata_Well') plate_agg
C:\Users\amarx\Anaconda3\envs\morphelia\lib\site-packages\anndata\_core\anndata.py:120: ImplicitModificationWarning: Transforming to str index.
  warnings.warn("Transforming to str index.", ImplicitModificationWarning)
[40]:
AnnData object with n_obs × n_vars = 2 × 73
    obs: 'ImageNumber', 'ObjectNumber', 'Metadata_Col', 'Metadata_Field', 'Metadata_FileLocation', 'Metadata_Frame', 'Metadata_Row', 'Metadata_Series', 'Metadata_Well', 'Cells_AreaShape_BoundingBoxArea', 'Cells_AreaShape_BoundingBoxMaximum_X', 'Cells_AreaShape_BoundingBoxMaximum_Y', 'Cells_AreaShape_BoundingBoxMinimum_X', 'Cells_AreaShape_BoundingBoxMinimum_Y', 'Cells_AreaShape_Center_X', 'Cells_AreaShape_Center_Y', 'Cells_AreaShape_EulerNumber', 'Cells_Children_Cytoplasm_Count', 'Cells_Children_Nuc_1_Count', 'Cells_Children_Nuclei_Count', 'Cells_Children_Primarieswithoutborder_Count', 'Cells_Children_Primarieswithoutborder_Count.1', 'Cells_Location_CenterMassIntensity_X_Actin', 'Cells_Location_CenterMassIntensity_X_Brightfield', 'Cells_Location_CenterMassIntensity_X_DNA', 'Cells_Location_CenterMassIntensity_X_Desmin', 'Cells_Location_CenterMassIntensity_X_PC', 'Cells_Location_CenterMassIntensity_Y_Actin', 'Cells_Location_CenterMassIntensity_Y_Brightfield', 'Cells_Location_CenterMassIntensity_Y_DNA', 'Cells_Location_CenterMassIntensity_Y_Desmin', 'Cells_Location_CenterMassIntensity_Y_PC', 'Cells_Location_CenterMassIntensity_Z_Actin', 'Cells_Location_CenterMassIntensity_Z_Brightfield', 'Cells_Location_CenterMassIntensity_Z_DNA', 'Cells_Location_CenterMassIntensity_Z_Desmin', 'Cells_Location_CenterMassIntensity_Z_PC', 'Cells_Location_Center_X', 'Cells_Location_Center_Y', 'Cells_Location_Center_Z', 'Cells_Location_MaxIntensity_X_Actin', 'Cells_Location_MaxIntensity_X_Brightfield', 'Cells_Location_MaxIntensity_X_DNA', 'Cells_Location_MaxIntensity_X_Desmin', 'Cells_Location_MaxIntensity_X_PC', 'Cells_Location_MaxIntensity_Y_Actin', 'Cells_Location_MaxIntensity_Y_Brightfield', 'Cells_Location_MaxIntensity_Y_DNA', 'Cells_Location_MaxIntensity_Y_Desmin', 'Cells_Location_MaxIntensity_Y_PC', 'Cells_Location_MaxIntensity_Z_Actin', 'Cells_Location_MaxIntensity_Z_Brightfield', 'Cells_Location_MaxIntensity_Z_DNA', 'Cells_Location_MaxIntensity_Z_Desmin', 'Cells_Location_MaxIntensity_Z_PC', 'Cells_Neighbors_FirstClosestObjectNumber_20', 'Cells_Neighbors_FirstClosestObjectNumber_Adjacent', 'Cells_Neighbors_NumberOfNeighbors_20', 'Cells_Neighbors_NumberOfNeighbors_Adjacent', 'Cells_Neighbors_SecondClosestObjectNumber_20', 'Cells_Neighbors_SecondClosestObjectNumber_Adjacent', 'Cells_Number_Object_Number', 'Cells_Parent_Nuc_10', 'Cells_Parent_Primarieswithoutborder', 'Primarieswithoutborder_AreaShape_BoundingBoxArea', 'Primarieswithoutborder_AreaShape_BoundingBoxMaximum_X', 'Primarieswithoutborder_AreaShape_BoundingBoxMaximum_Y', 'Primarieswithoutborder_AreaShape_BoundingBoxMinimum_X', 'Primarieswithoutborder_AreaShape_BoundingBoxMinimum_Y', 'Primarieswithoutborder_AreaShape_Center_X', 'Primarieswithoutborder_AreaShape_Center_Y', 'Primarieswithoutborder_AreaShape_EulerNumber', 'Primarieswithoutborder_Children_Cells_Count', 'Primarieswithoutborder_Children_Cytoplasm_Count', 'Primarieswithoutborder_Children_Primaries_Count', 'Primarieswithoutborder_Location_CenterMassIntensity_X_Actin', 'Primarieswithoutborder_Location_CenterMassIntensity_X_Brightfield', 'Primarieswithoutborder_Location_CenterMassIntensity_X_DNA', 'Primarieswithoutborder_Location_CenterMassIntensity_X_Desmin', 'Primarieswithoutborder_Location_CenterMassIntensity_X_PC', 'Primarieswithoutborder_Location_CenterMassIntensity_Y_Actin', 'Primarieswithoutborder_Location_CenterMassIntensity_Y_Brightfield', 'Primarieswithoutborder_Location_CenterMassIntensity_Y_DNA', 'Primarieswithoutborder_Location_CenterMassIntensity_Y_Desmin', 'Primarieswithoutborder_Location_CenterMassIntensity_Y_PC', 'Primarieswithoutborder_Location_CenterMassIntensity_Z_Actin', 'Primarieswithoutborder_Location_CenterMassIntensity_Z_Brightfield', 'Primarieswithoutborder_Location_CenterMassIntensity_Z_DNA', 'Primarieswithoutborder_Location_CenterMassIntensity_Z_Desmin', 'Primarieswithoutborder_Location_CenterMassIntensity_Z_PC', 'Primarieswithoutborder_Location_Center_X', 'Primarieswithoutborder_Location_Center_Y', 'Primarieswithoutborder_Location_Center_Z', 'Primarieswithoutborder_Location_MaxIntensity_X_Actin', 'Primarieswithoutborder_Location_MaxIntensity_X_Brightfield', 'Primarieswithoutborder_Location_MaxIntensity_X_DNA', 'Primarieswithoutborder_Location_MaxIntensity_X_Desmin', 'Primarieswithoutborder_Location_MaxIntensity_X_PC', 'Primarieswithoutborder_Location_MaxIntensity_Y_Actin', 'Primarieswithoutborder_Location_MaxIntensity_Y_Brightfield', 'Primarieswithoutborder_Location_MaxIntensity_Y_DNA', 'Primarieswithoutborder_Location_MaxIntensity_Y_Desmin', 'Primarieswithoutborder_Location_MaxIntensity_Y_PC', 'Primarieswithoutborder_Location_MaxIntensity_Z_Actin', 'Primarieswithoutborder_Location_MaxIntensity_Z_Brightfield', 'Primarieswithoutborder_Location_MaxIntensity_Z_DNA', 'Primarieswithoutborder_Location_MaxIntensity_Z_Desmin', 'Primarieswithoutborder_Location_MaxIntensity_Z_PC', 'Primarieswithoutborder_Number_Object_Number', 'Primarieswithoutborder_Parent_Cells', 'Primarieswithoutborder_Parent_Nuc_10', 'Cytoplasm_AreaShape_BoundingBoxArea', 'Cytoplasm_AreaShape_BoundingBoxMaximum_X', 'Cytoplasm_AreaShape_BoundingBoxMaximum_Y', 'Cytoplasm_AreaShape_BoundingBoxMinimum_X', 'Cytoplasm_AreaShape_BoundingBoxMinimum_Y', 'Cytoplasm_AreaShape_Center_X', 'Cytoplasm_AreaShape_Center_Y', 'Cytoplasm_AreaShape_EulerNumber', 'Cytoplasm_Location_CenterMassIntensity_X_Actin', 'Cytoplasm_Location_CenterMassIntensity_X_Brightfield', 'Cytoplasm_Location_CenterMassIntensity_X_DNA', 'Cytoplasm_Location_CenterMassIntensity_X_Desmin', 'Cytoplasm_Location_CenterMassIntensity_X_PC', 'Cytoplasm_Location_CenterMassIntensity_Y_Actin', 'Cytoplasm_Location_CenterMassIntensity_Y_Brightfield', 'Cytoplasm_Location_CenterMassIntensity_Y_DNA', 'Cytoplasm_Location_CenterMassIntensity_Y_Desmin', 'Cytoplasm_Location_CenterMassIntensity_Y_PC', 'Cytoplasm_Location_CenterMassIntensity_Z_Actin', 'Cytoplasm_Location_CenterMassIntensity_Z_Brightfield', 'Cytoplasm_Location_CenterMassIntensity_Z_DNA', 'Cytoplasm_Location_CenterMassIntensity_Z_Desmin', 'Cytoplasm_Location_CenterMassIntensity_Z_PC', 'Cytoplasm_Location_Center_X', 'Cytoplasm_Location_Center_Y', 'Cytoplasm_Location_MaxIntensity_X_Actin', 'Cytoplasm_Location_MaxIntensity_X_Brightfield', 'Cytoplasm_Location_MaxIntensity_X_DNA', 'Cytoplasm_Location_MaxIntensity_X_Desmin', 'Cytoplasm_Location_MaxIntensity_X_PC', 'Cytoplasm_Location_MaxIntensity_Y_Actin', 'Cytoplasm_Location_MaxIntensity_Y_Brightfield', 'Cytoplasm_Location_MaxIntensity_Y_DNA', 'Cytoplasm_Location_MaxIntensity_Y_Desmin', 'Cytoplasm_Location_MaxIntensity_Y_PC', 'Cytoplasm_Location_MaxIntensity_Z_Actin', 'Cytoplasm_Location_MaxIntensity_Z_Brightfield', 'Cytoplasm_Location_MaxIntensity_Z_DNA', 'Cytoplasm_Location_MaxIntensity_Z_Desmin', 'Cytoplasm_Location_MaxIntensity_Z_PC', 'Cytoplasm_Number_Object_Number', 'Cytoplasm_Parent_Cells', 'Cytoplasm_Parent_Primarieswithoutborder', 'Metadata_Treatment', 'Metadata_Concentration', 'Metadata_Unit', 'Metadata_Cellnumber'
    obsm: 'X_pca', 'X_umap'

Feature clustering

[53]:
feature_tags = {'AreaShape': 1,
                'Intensity': 2,
                'Texture': 3,
                'Granularity': 4}

feature_classes = []
for feat in plate.var_names:
    if any(tag in feat for tag in feature_tags.keys()):
        for tag in feature_tags.keys():
            if tag in feat:
                feature_classes.append(feature_tags[tag])
                break
    else:
        feature_classes.append(0)

[63]:
# make new transposed anndata object
obs = pd.DataFrame({'name': plate.var_names,
                    'feature_class': feature_classes})
obs['feature_class'] = obs['feature_class'].astype('category')
var = pd.DataFrame(index=np.arange(len(plate)))

feat_data = ad.AnnData(X=plate.X.T, obs=obs, var=var)
feat_data
C:\Users\amarx\Anaconda3\envs\morphelia\lib\site-packages\anndata\_core\anndata.py:120: ImplicitModificationWarning: Transforming to str index.
  warnings.warn("Transforming to str index.", ImplicitModificationWarning)
[63]:
AnnData object with n_obs × n_vars = 73 × 468
    obs: 'name', 'feature_class'
[64]:
# calculate PCA
sc.tl.pca(feat_data)

[65]:
# plot the variance ratio
morphelia.pl.pca_variance_ratio(feat_data)

../_images/tutorials_morphelia_intro_42_0.png

The first 20 compoinents cover ~95% of the variance.

[66]:
# calculate k-nn graph
sc.pp.neighbors(feat_data, n_neighbors=8, n_pcs=20)
[67]:
# calculate UMAP
sc.tl.umap(feat_data)
[68]:
# Plot UMAP
sc.pl.umap(feat_data, color="feature_class")
../_images/tutorials_morphelia_intro_46_0.png
[ ]: