awesome-single-cell
List of software packages (and the people developing these methods) for single-cell data analysis, including RNA-seq, ATAC-seq, etc. Contributions welcome...
Citation
Contents
- awesome-single-cell
- Citation
- Contents
- Software packages
- RNA-seq
- Quality control
- Gene regulatory network identification
- Immune receptor profiling
- Marker and differential gene expression identification
- Cell clustering
- Dimension reduction
- Archetypal analysis
- Count modelling and normalization
- Batch-effect removal
- Cell projection and unimodal integration
- Simulation
- Pseudotime and trajectory inference
- Cell type identification and classification
- Doublet Identification
- Cell subsampling
- Feature (Gene) imputation
- Copy number analysis
- Variant calling
- Epigenomics
- Multi-assay data integration
- Rare cell detection
- Cellular interactions
- Other applications
- Spatial transcriptomics
- Tutorials and workflows
- Web portals, apps, and databases
- Journal articles of general interest
- Similar lists and collections
- People
Software packages
RNA-seq
- alevin-fry - [Rust] - 🐟 Rapid, accurate and memory-frugal preprocessing of single-cell and single-nucleus RNA-seq data.
- anchor - [Python] - ⚓ Find bimodal, unimodal, and multimodal features in your data
- ascend - [R] - ascend is an R package comprised of fast, streamlined analysis functions optimized to address the statistical challenges of single cell RNA-seq. The package incorporates novel and established methods to provide a flexible framework to perform filtering, quality control, normalization, dimension reduction, clustering, differential expression and a wide-range of plotting.
- bigSCale - [matlab] - An analytical framework for big-scale single cell data.
- bonvoyage - [Python] - 📐 Transform percentage-based units into a 2d space to evaluate changes in distribution with both magnitude and direction.
- bustools - [C++] - A suite of tools for manipulating BUS files for single cell RNA-Seq pre-processing. bustools can be used to error correct barcodes, collapse UMIs, produce gene count or transcript compatibility count matrices, and is useful for many other tasks.
- ccRemover - [R] - Removes the Cell-Cycle Effect from Single-Cell RNA-Sequencing Data. Identifying and removing the cell-cycle effect from single-cell RNA-Sequencing data.
- celda - [R] - A suite of Bayesian hierarchical models and supporting functions to perform clustering of cells and genes for count data generated by scRNA-seq. Celda: a Bayesian model to perform co-clustering of genes into modules and cells into subpopulations using single-cell RNA-seq data. The package also includes DecontX.
- Cell_BLAST - [Python] - A BLAST-like toolkit for scRNA-seq data querying and automated annotation.
- CellCNN - [Python] - Representation Learning for detection of phenotype-associated cell subsets
- CellRanger - [Linux Binary] - Cell Ranger is a set of analysis pipelines that process Chromium single-cell RNA-seq output to align reads, generate gene-cell matrices and perform clustering and gene expression analysis. Software requires registration with 10xgenomics.
- cellTree - [R] - Cell population analysis and visualization from single cell RNA-seq data using a Latent Dirichlet Allocation model.
- clusterExperiment - [R] - Functions for running and comparing many different clusterings of single-cell sequencing data. Meant to work with SCONE and slingshot.
- Clustergrammer - [Python, JavaScript] - Interative web-based heatmap for visualizing and analyzing high dimensional biological data, including single-cell RNA-seq. Clustergrammer can be used within a Jupyter notebook as an interative widget that can be shared using GitHub and NBviewer, see example notebook.
- Clustergrammer2 - [Python, JavaScript] - Interative WebGL web-based heatmap for visualizing and analyzing single-cell high-dimensional and location-based biological data. Clustergrammer can be used within a Jupyter notebook as an interative widget that can be shared using GitHub and NBviewer, see case studies.
- CountClust - [R] - Functions for fitting Grade-of-Membership models, also known as "Topic models", to RNA-seq counts. These models generalize clustering methods to allow that each cell may belong to more than one cluster/topic.
- countsimQC - [R] - Compare characteristics of one or more synthetic (e.g., RNA-seq) count matrices to a real count matrix, possibly the one based on which the synthetic data sets were generated.
- cyclum - [python] - Cyclum is a novel AutoEncoder approach that characterizes circular trajectories in the high-dimensional gene expression space. Applying Cyclum to removing cell-cycle effects leads to substantially improved delineations of cell subpopulations, which is useful for establishing various cell atlases and studying tumor heterogeneity. bioRxiv
- CytoGuide - [C++,D3] - CyteGuide: Visual Guidance for Hierarchical Single-Cell Analysis
- DecontX - [R] - DecontX is a Bayesian method to automatically estimate and remove read contamination in individual cells from scRNA-seq experiments even without learning any information from empty cell barcodes (identified by cell calling for droplet-based methods). Decontamination of ambient RNA in single-cell RNA-seq with DecontX. Included in package celda.
- DESCEND - [R] - DESCEND deconvolves the true gene expression distribution across cells for UMI scRNA-seq counts. It provides estimates of several distribution based statistics (five distribution measurements and the coefficients of covariates (such as batches or cell size)).
- DeLorean - [R] - Bayesian pseudotime estimation algorithm that uses Gaussian processes to model gene expression profiles and provides a full posterior for the pseudotimes.
- dittoSeq - [R] - Bioconductor package offering user friendly visualization tools for single-cell and Bulk RNA Sequencing. Color blindness friendly by default; novice coder friendly; highly customizable and powerful enough to build publication-ready figures; universal in that it works directly with Seurat, SingleCellExperiment, and SummarizedExperiment objects and has import capabilities for edgeR DGElists.
- dropkick - [Python] - Automated cell filtering for single-cell RNA sequencing data.
- dynamo - [Python] - Inclusive model of expression dynamics with scSLAM-seq and multiomics, vector field reconstruction and potential landscape mapping.
- embeddr - [R] - Embeddr creates a reduced dimensional representation of the gene space using a high-variance gene correlation graph and laplacian eigenmaps. It then fits a smooth pseudotime trajectory using principal curves.
- Falco - [AWS cloud] - Falco: A quick and flexible single-cell RNA-seq processing framework on the cloud.
- FastProject - [Python] - Signature analysis on low-dimensional projections of single-cell expression data.
- flotilla - [Python] - Reproducible machine learning analysis of gene expression and alternative splicing data
- GPfates - [Python] - Model transcriptional cell fates as mixtures of Gaussian Processes
- GSEApy - [Python] - GSEApy: Gene Set Enrichment Analysis in Python. GSEApy is a Python/Rust implementation for GSEA and wrapper for Enrichr. GSEApy can be used for RNA-seq, ChIP-seq, Microarray data. It can be used for convenient GO enrichment and to produce publication quality figures in python.
- HocusPocus - [R] - Basic PCA-based workflow for analysis and plotting of single cell RNA-seq data.
- HTSeq - [Python] - A Python library to facilitate programmatic analysis of data from high-throughput sequencing (HTS) experiments. A popular component of
HTSeq
ishtseq-count
, a script to quantify gene expression in bulk and single-cell RNA-Seq and similar experiments. - IA-SVA - [R] - Iteratively Adjusted Surrogate Variable Analysis (IA-SVA) is a statistical framework to uncover hidden sources of variation even when these sources are correlated with the biological variable of interest. IA-SVA provides a flexible methodology to i) identify a hidden factor for unwanted heterogeneity while adjusting for all known factors; ii) test the significance of the putative hidden factor for explaining the variation in the data; and iii), if significant, use the estimated factor as an additional known factor in the next iteration to uncover further hidden factors.
- ICGS - [Python] - Iterative Clustering and Guide-gene Selection (Olsson et al. Nature 2016). Identify discrete, transitional and mixed-lineage states from diverse single-cell transcriptomics platforms. Integrated FASTQ pseudoalignment /quantification (Kallisto), differential expression, cell-type prediction and optional cell cycle exclusion analyses. Specialized methods for processing BAM and 10X Genomics spares matrix files. Associated single-cell splicing PSI methods (MultIPath-PSI). Apart of the AltAnalyze toolkit along with accompanying visualization methods (e.g., heatmap, t-SNE, SashimiPlots, network graphs). Easy-to-use graphical user and commandline interfaces.
- ivis - [Python or R] - Structure-preserving dimensionality reduction in single-cell datasets.
- kallisto - [C++] - kallisto is a program for quantifying abundances of transcripts or genes from bulk or single-cell RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. It is based on pseudoalignment for rapidly determining the compatibility of reads with targets, without the need for alignment.
- kb-python - [Python] -
kb-python
is a python package for processing single-cell RNA-sequencing. It wraps thekallisto
|bustools
single-cell RNA-seq command line tools in order to unify multiple processing workflows. - knn-smoothing - [python or R or matlab] - The algorithm is based on the observation that across protocols, the technical noise exhibited by UMI-filtered scRNA-Seq data closely follows Poisson statistics. Smoothing is performed by first identifying the nearest neighbors of each cell in a step-wise fashion, based on variance-stabilized and partially smoothed expression profiles, and then aggregating their transcript counts.
- mfa - [R] - Probabilistic modeling of bifurcations in single-cell gene expression data using a Bayesian mixture of factor analyzers
- M3Drop - [R] - Michaelis-Menten Modelling of Dropouts for scRNASeq.
- MetaCell - [R, C++] - Analysis of single cell RNA-seq data by computing partitions of a cell similarity graph into small homogeneous groups of cells called metacells.
- MIMOSCA - [python] - A repository for the design and analysis of pooled single cell RNA-seq perturbation experiments (Perturb-seq).
- Monocle - [R] - Differential expression and time-series analysis for single-cell RNA-Seq.
- Muscat - [R] - muscat (Multi-sample multi-group scRNA-seq analysis tools ) provides various methods for Differential State (DS) analyses in multi-sample, multi-group, multi-(cell-)subpopulation scRNA-seq data.
- netSmooth - [R] - netSmooth is a network-diffusion based method that uses priors for the covariance structure of gene expression profiles on scRNA-seq experiments in order to smooth expression values. We demonstrate that netSmooth improves clustering results of scRNA-seq experiments from distinct cell populations, time-course experiments, and cancer genomics.
- NetworkInference - [Julia] - Fast implementation of single-cell network inference algorithms: Gene Regulatory Network Inference from Single-Cell Data Using Multivariate Information Measures
- nimfa - [Python] - Nimfa is a Python scripting library which includes a number of published matrix factorization algorithms, initialization methods, quality and performance measures and facilitates the combination of these to produce new strategies. The library represents a unified and efficient interface to matrix factorization algorithms and methods.
- novoSpaRc - [Python] - Predict locations of single cells in space by solely using single-cell RNA sequencing data.