Project Icon

CellViT

基于Vision Transformer的细胞核分割与分类模型

CellViT是一种基于Vision Transformer的深度学习方法,用于数字化组织样本中的细胞核自动实例分割。该项目结合了预训练的Vision Transformer编码器和U-Net架构,在PanNuke数据集上取得了领先性能。通过引入加权采样策略,CellViT提高了对复杂细胞实例的识别能力。它能够快速处理千兆像素级全切片图像,并可与QuPath等软件集成,为后续分析提供定位化的深度特征。

Python 3.9.7 Code style: black Flake8 Status CodeFactor Visitors PWC



CellViT: Vision Transformers for Precise Cell Segmentation and Classification


Update 08.08.2023:

:bangbang: We fixed a severe training bug and uploaded new checkpoints. Please make sure to pull all changes and redownload your CellViT checkpoints to get the best results :bangbang:

:ballot_box_with_check: Improved reproducability by providing config and log files for best models (CellViT-SAM-H and CellViT-256) and adopted PanNuke inference script for an easier evaluation

:ballot_box_with_check: Inference speed improved by x100 for postprocessing, added new preprocessing with CuCIM speedup

:ballot_box_with_check: Fixed bug in postprocessing that may insert doubled cells during cell-detection

:ballot_box_with_check: Added batch-size and mixed-precision options to inference cli to support RAM limited GPUs

:ballot_box_with_check: Extended configuration and added sweep configuration


Hörst, F., Rempe, M., Heine, L., Seibold, C., Keyl, J., Baldini, G., Ugurel, S., Siveke, J., Grünwald, B., Egger, J., & Kleesiek, J. (2023). CellViT: Vision Transformers for precise cell segmentation and classification. https://doi.org/10.48550/ARXIV.2306.15350

This repository contains the code implementation of CellViT, a deep learning-based method for automated instance segmentation of cell nuclei in digitized tissue samples. CellViT utilizes a Vision Transformer architecture and achieves state-of-the-art performance on the PanNuke dataset, a challenging nuclei instance segmentation benchmark.

If you intend to use anything from this repo, citation of the original publication given above is necessary

Key Features

  • State-of-the-Art Performance: CellViT outperforms existing methods for nuclei instance segmentation by a substantial margin, delivering superior results on the PanNuke dataset:
    • Mean panoptic quality: 0.51
    • F1-detection score: 0.83
  • Vision Transformer Encoder: The project incorporates pre-trained Vision Transformer (ViT) encoders, which are known for their effectiveness in various computer vision tasks. This choice enhances the segmentation performance of CellViT.
  • U-Net Architecture: CellViT adopts a U-Net-shaped encoder-decoder network structure, allowing for efficient and accurate nuclei instance segmentation. The network architecture facilitates both high-level and low-level feature extraction for improved segmentation results.
  • Weighted Sampling Strategy: To enhance the performance of CellViT, a novel weighted sampling strategy is introduced. This strategy improves the representation of challenging nuclei instances, leading to more accurate segmentation results.
  • Fast Inference on Gigapixel WSI: The framework provides fast inference results by utilizing a large inference patch size of $1024 \times 1024$ pixels, in contrast to the conventional $256$-pixel-sized patches. This approach enables efficient analysis of Gigapixel Whole Slide Images (WSI) and generates localizable deep features that hold potential value for downstream tasks. We provide a fast inference pipeline with connection to current Viewing Software such as QuPath

Visualization

Example

Installation

  1. Clone the repository: git clone https://github.com/TIO-IKIM/CellViT.git

  2. Create a conda environment with Python 3.9.7 version and install conda requirements: conda env create -f environment.yml. You can change the environment name by editing the name tag in the environment.yaml file. This step is necessary, as we need to install Openslide with binary files. This is easier with conda. Otherwise, installation from source needs to be performed and packages installed with pi

  3. Activate environment: conda activate cellvit_env

  4. Install torch (>=2.0) for your system, as described here. Preferred version is 2.0, see optional_dependencies for help. You can find all version here: https://pytorch.org/get-started/previous-versions/

  5. Install optional dependencies pip install -r optional_dependencies.txt to get a speedup using NVIDIA-Clara and CuCIM for preprocessing during inference. Please select your CUDA versions. Help for installing cucim can be found online. Note Error: cannot import name CuImage from cucim If you get this error, install cucim from conda to get all binary files. First remove your previous dependeny with pip uninstall cupy-cuda117 and reinstall with conda install -c rapidsai cucim inside your conda environment. This process is time consuming, so you should be patient. Also follow their official guideline.

FAQ: Environment problems

ResolvePackageNotFound: -gcc

  • Fix: Comment out the gcc package in the environment.yml file

ResolvePackageNotFound: -libtiff==4.5.0=h6adf6a1_2, -openslide==3.4.1=h7773abc_6

  • Fix: Remove the version hash from environment.yml file, such that:
    ...
    dependencies:
      ...
      - libtiff=4.5.0
      - openslide=3.4.1
    
    pip:
    ...
    

PyDantic Validation Errors for the CLI

Please install the pydantic version specified (pydantic==1.10.4), otherwise validation errors could occur for the CLI.

Usage:

Project Structure

We are currently using the following folder structure:

├── base_ml               # Basic Machine Learning Code: CLI, Trainer, Experiment, ...
├── cell_segmentation     # Cell Segmentation training and inference files
│   ├── datasets          # Datasets (PyTorch)
│   ├── experiments       # Specific Experiment Code for different experiments
│   ├── inference         # Inference code for experiment statistics and plots
│   ├── trainer           # Trainer functions to train networks
│   ├── utils             # Utils code
│   └── run_xxx.py        # Run file to start an experiment
├── configs               # Config files
│   ├── examples          # Example config files with explanations
│   └── python            # Python configuration file for global Python settings
├── datamodel             # Datamodels of WSI, Patientes etc. (not ML specific)
├── docs                  # Documentation files (in addition to this main README.md)
├── models                # Machine Learning Models (PyTorch implementations)
│   ├── encoders          # Encoder networks (see ML structure below)
│   ├── pretrained        # Checkpoint of important pretrained models (needs to be downloaded from Google drive)
│   └── segmentation      # CellViT Code
├── preprocessing         # Preprocessing code
│   └── patch_extraction  # Code to extract patches from WSI

Training

The CLI for a ML-experiment to train the CellViT-Network is as follows (here the run_cellvit.py script is used):

usage: run_cellvit.py [-h] --config CONFIG [--gpu GPU] [--sweep | --agent AGENT | --checkpoint CHECKPOINT]

Start an experiment with given configuration file.

optional arguments:
  -h, --help            show this help message and exit
  --gpu GPU             Cuda-GPU ID (default: None)
  --sweep               Starting a sweep. For this the configuration file must be structured according to WandB sweeping. Compare
                        https://docs.wandb.ai/guides/sweeps and https://community.wandb.ai/t/nested-sweep-configuration/3369/3 for further
                        information. This parameter cannot be set in the config file! (default: False)
  --agent AGENT         Add a new agent to the sweep. Please pass the sweep ID as argument in the way entity/project/sweep_id, e.g.,
                        user1/test_project/v4hwbijh. The agent configuration can be found in the WandB dashboard for the running sweep in
                        the sweep overview tab under launch agent. Just paste the entity/project/sweep_id given there. The provided config
                        file must be a sweep config file.This parameter cannot be set in the config file! (default: None)
  --checkpoint CHECKPOINT
                        Path to a PyTorch checkpoint file. The file is loaded and continued to train with the provided settings. If this is
                        passed, no sweeps are possible. This parameter cannot be set in the config file! (default: None)

required named arguments:
  --config CONFIG       Path to a config file (default: None)

The important file is the configuration file, in which all paths are set, the model configuration is given and the hyperparameters or sweeps are defined. For each specific run file, there exists an example file in the ./configs/examples/cell_segmentation folder with the same naming as well as a configuration file that explains how to run WandB sweeps for hyperparameter search. All metrics defined in your trainer are logged to WandB. The WandB configuration needs to be set up in the configuration file, but also turned off by the user.

An example config file is given here with explanations here. For sweeps, we provide a sweep example file train_cellvit_sweep.yaml.

Pre-trained ViT models for training initialization can be downloaded from Google Drive: ViT-Models. Please check out the corresponding licenses before distribution and further usage! Note: We just used the teacher models for ViT-256.

:exclamation: If your training crashes at some point, you can continue from a checkpoint

Dataset preparation

We use a customized dataset structure for the PanNuke and the MoNuSeg dataset. The dataset structures are explained in pannuke.md and monuseg.md documentation files. We also provide preparation scripts in the cell_segmentation/datasets/ folder.

Evaluation

In our paper, we did not (!) use early stopping, but rather train all models for 130 to eliminate selection bias but have the largest possible database for training. Therefore, evaluation neeeds to be performed with the latest_checkpoint.pth model and not the best early stopping model. We provide to script to create evaluation results: inference_cellvit_experiment.py for PanNuke and inference_cellvit_monuseg.py for MoNuSeg.

:exclamation: We recently adapted the evaluation code and added a tag to the config files to select which checkpoint needs to be used. Please make sure to use the right checkpoint and select the appropriate dataset magnification.

Inference

Model checkpoints can be downloaded here:

License: Apache 2.0 with Commons Clause

Proved checkpoints have been trained on 90% of the data from all folds with the settings described in the publication.

Steps

The following steps are necessary for preprocessing:

  1. Prepare WSI with our preprocessing pipeline
  2. Run inference with the inference/cell_detection.py script

Results are stored at preprocessing locations

1. Preprocessing

In our Pre-Processing pipeline, we are able to extract quadratic patches from detected tissue areas, load annotation files (.json) and apply color normlizations. We make use of the popular OpenSlide library, but extended it with the RAPIDS cuCIM framework for an x8 speedup in patch-extraction. The documentation for the preprocessing can be found here.

Preprocessing is necessary to extract patches for our inference pipeline. We use squared patches of size 1024 pixels with an overlap of 64 px.

Please make sure that you select the following properties for our CellViT inference

ParameterValue
patch_size1024
patch_overlap6.25

Resulting Dataset Structure

In general, the folder structure for a preprocessed dataset looks like this: The aim of pre-processing is to create one dataset per WSI in the following structure:

WSI_Name
├── annotation_masks      # thumbnails of extracted annotation masks
│   ├── all_overlaid.png  # all with same dimension as the thumbnail
│   ├── tumor.png
│   └── ...  
├── context               # context patches, if extracted
│   ├── 2                 # subfolder for each scale
│   │   ├── WSI_Name_row1_col1_context_2.png
│   │   ├── WSI_Name_row2_col1_context_2.png
│   │   └── ...
│   └── 4
│   │   ├── WSI_Name_row1_col1_context_2.png
│   │   ├── WSI_Name_row2_col1_context_2.png
│   │   └── ...
├── masks                 # Mask (numpy) files for each patch -> optional folder for segmentation
│   ├── WSI_Name_row1_col1.npy
│   ├── WSI_Name_row2_col1.npy
│   └── ...
├── metadata              # Metadata files for each patch
│   ├── WSI_Name_row1_col1.yaml
│   ├── WSI_Name_row2_col1.yaml
│   └── ...
├── patches               # Patches as .png files
│   ├── WSI_Name_row1_col1.png
│   ├── WSI_Name_row2_col1.png
│   └── ...
├── thumbnails            # Different kind of thumbnails
│   ├── thumbnail_mpp_5.png
│   ├── thumbnail_downsample_32.png
│   └── ...
├── tissue_masks          # Tissue mask images for checking
│   ├── mask.png          # all with same dimension as the thumbnail
│   ├── mask_nogrid.png
│   └── tissue_grid.png
├── mask.png              # tissue mask with green grid  
├── metadata.yaml         # WSI metdata for patch extraction
├── patch_metadata.json   # Patch metadata of WSI merged in one file
└── thumbnail.png         # WSI thumbnail

The cell detection and segmentation results are stored in a newly created cell_detection

项目侧边栏1项目侧边栏2
推荐项目
Project Cover

豆包MarsCode

豆包 MarsCode 是一款革命性的编程助手,通过AI技术提供代码补全、单测生成、代码解释和智能问答等功能,支持100+编程语言,与主流编辑器无缝集成,显著提升开发效率和代码质量。

Project Cover

AI写歌

Suno AI是一个革命性的AI音乐创作平台,能在短短30秒内帮助用户创作出一首完整的歌曲。无论是寻找创作灵感还是需要快速制作音乐,Suno AI都是音乐爱好者和专业人士的理想选择。

Project Cover

有言AI

有言平台提供一站式AIGC视频创作解决方案,通过智能技术简化视频制作流程。无论是企业宣传还是个人分享,有言都能帮助用户快速、轻松地制作出专业级别的视频内容。

Project Cover

Kimi

Kimi AI助手提供多语言对话支持,能够阅读和理解用户上传的文件内容,解析网页信息,并结合搜索结果为用户提供详尽的答案。无论是日常咨询还是专业问题,Kimi都能以友好、专业的方式提供帮助。

Project Cover

阿里绘蛙

绘蛙是阿里巴巴集团推出的革命性AI电商营销平台。利用尖端人工智能技术,为商家提供一键生成商品图和营销文案的服务,显著提升内容创作效率和营销效果。适用于淘宝、天猫等电商平台,让商品第一时间被种草。

Project Cover

吐司

探索Tensor.Art平台的独特AI模型,免费访问各种图像生成与AI训练工具,从Stable Diffusion等基础模型开始,轻松实现创新图像生成。体验前沿的AI技术,推动个人和企业的创新发展。

Project Cover

SubCat字幕猫

SubCat字幕猫APP是一款创新的视频播放器,它将改变您观看视频的方式!SubCat结合了先进的人工智能技术,为您提供即时视频字幕翻译,无论是本地视频还是网络流媒体,让您轻松享受各种语言的内容。

Project Cover

美间AI

美间AI创意设计平台,利用前沿AI技术,为设计师和营销人员提供一站式设计解决方案。从智能海报到3D效果图,再到文案生成,美间让创意设计更简单、更高效。

Project Cover

稿定AI

稿定设计 是一个多功能的在线设计和创意平台,提供广泛的设计工具和资源,以满足不同用户的需求。从专业的图形设计师到普通用户,无论是进行图片处理、智能抠图、H5页面制作还是视频剪辑,稿定设计都能提供简单、高效的解决方案。该平台以其用户友好的界面和强大的功能集合,帮助用户轻松实现创意设计。

投诉举报邮箱: service@vectorlightyear.com
@2024 懂AI·鲁ICP备2024100362号-6·鲁公网安备37021002001498号