Project Icon





A general python framework for visual object tracking and video object segmentation, based on PyTorch.

:fire: One tracking paper accepted at WACV 2024! 👇

:fire: One tracking paper accepted at WACV 2023! 👇

:fire: One tracking paper accepted at ECCV 2022! 👇


TaMOs, RTS, ToMP, KeepTrack, LWL, KYS, PrDiMP, DiMP and ATOM Trackers

Official implementation of the TaMOs (WACV 2024), RTS (ECCV 2022), ToMP (CVPR 2022), KeepTrack (ICCV 2021), LWL (ECCV 2020), KYS (ECCV 2020), PrDiMP (CVPR 2020), DiMP (ICCV 2019), and ATOM (CVPR 2019) trackers, including complete training code and trained models.

Tracking Libraries

Libraries for implementing and evaluating visual trackers. It includes

  • All common tracking and video object segmentation datasets.
  • Scripts to analyse tracker performance and obtain standard performance scores.
  • General building blocks, including deep networks, optimization, feature extraction and utilities for correlation filter tracking.

Training Framework: LTR

LTR (Learning Tracking Representations) is a general framework for training your visual tracking networks. It is equipped with

  • All common training datasets for visual object tracking and segmentation.
  • Functions for data sampling, processing etc.
  • Network modules for visual tracking.
  • And much more...

Model Zoo

The tracker models trained using PyTracking, along with their results on standard tracking benchmarks are provided in the model zoo.


The toolkit contains the implementation of the following trackers.

TaMOs (WACV 2024)

[Paper] [Raw results] [Models] [Training Code] [Tracker Code]

Official implementation of TaMOs. TaMOs is the first generico object tracker to tackle the problem of tracking multiple generic object at once. It uses a shared model predictor consisting of a Transformer in order to produce multiple target models (one for each specified target). It achieves sub-linear run-time when tracking multiple objects and outperforms existing single object trackers when running one instance for each target separately. TaMOs serves as the baseline tracker for the new large-scale generic object tracking benchmark LaGOT (see here) that contains multiple annotated target objects per sequence.


RTS (ECCV 2022)

[Paper] [Raw results] [Models] [Training Code] [Tracker Code]

Official implementation of RTS. RTS is a robust, end-to-end trainable, segmentation-centric pipeline that internally works with segmentation masks instead of bounding boxes. Thus, it can learn a better target representation that clearly differentiates the target from the background. To achieve the necessary robustness for challenging tracking scenarios, a separate instance localization component is used to condition the segmentation decoder when producing the output mask.


ToMP (CVPR 2022)

[Paper] [Raw results] [Models] [Training Code] [Tracker Code]

Official implementation of ToMP. ToMP employs a Transformer-based model prediction module in order to localize the target. The model predictor is further extended to estimate a second set of weights that are applied for accurate bounding box regression. The resulting tracker ToMP relies on training and on test frame information in order to predict all weights transductively.


KeepTrack (ICCV 2021)

[Paper] [Raw results] [Models] [Training Code] [Tracker Code]

Official implementation of KeepTrack. KeepTrack actively handles distractor objects to continue tracking the target. It employs a learned target candidate association network, that allows to propagate the identities of all target candidates from frame-to-frame. To tackle the problem of lacking groundtruth correspondences between distractor objects in visual tracking, it uses a training strategy that combines partial annotations with self-supervision.


LWL (ECCV 2020)

[Paper] [Raw results] [Models] [Training Code] [Tracker Code]

Official implementation of the LWL tracker. LWL is an end-to-end trainable video object segmentation architecture which captures the current target object information in a compact parametric model. It integrates a differentiable few-shot learner module, which predicts the target model parameters using the first frame annotation. The learner is designed to explicitly optimize an error between target model prediction and a ground truth label. LWL further learns the ground-truth labels used by the few-shot learner to train the target model. All modules in the architecture are trained end-to-end by maximizing segmentation accuracy on annotated VOS videos.

LWL overview figure

KYS (ECCV 2020)

[Paper] [Raw results] [Models] [Training Code] [Tracker Code]

Official implementation of the KYS tracker. Unlike conventional frame-by-frame detection based tracking, KYS propagates valuable scene information through the sequence. This information is used to achieve an improved scene-aware target prediction in each frame. The scene information is represented using a dense set of localized state vectors. These state vectors are propagated through the sequence and combined with the appearance model output to localize the target. The network is learned to effectively utilize the scene information by directly maximizing tracking performance on video segments KYS overview figure

PrDiMP (CVPR 2020)

[Paper] [Raw results] [Models] [Training Code] [Tracker Code]

Official implementation of the PrDiMP tracker. This work proposes a general formulation for probabilistic regression, which is then applied to visual tracking in the DiMP framework. The network predicts the conditional probability density of the target state given an input image. The probability density is flexibly parametrized by the neural network itself. The regression network is trained by directly minimizing the Kullback-Leibler divergence.

DiMP (ICCV 2019)

[Paper] [Raw results] [Models] [Training Code] [Tracker Code]

Official implementation of the DiMP tracker. DiMP is an end-to-end tracking architecture, capable of fully exploiting both target and background appearance information for target model prediction. It is based on a target model prediction network, which is derived from a discriminative learning loss by applying an iterative optimization procedure. The model prediction network employs a steepest descent based methodology that computes an optimal step length in each iteration to provide fast convergence. The model predictor also includes an initializer network that efficiently provides an initial estimate of the model weights.

DiMP overview figure

ATOM (CVPR 2019)

[Paper] [Raw results] [Models] [Training Code] [Tracker Code]

Official implementation of the ATOM tracker. ATOM is based on (i) a target estimation module that is trained offline, and (ii) target classification module that is trained online. The target estimation module is trained to predict the intersection-over-union (IoU) overlap between the target and a bounding box estimate. The target classification module is learned online using dedicated optimization techniques to discriminate between the target object and background.

ATOM overview figure

ECO/UPDT (CVPR 2017/ECCV 2018)

[Paper] [Models] [Tracker Code]

An unofficial implementation of the ECO tracker. It is implemented based on an extensive and general library for complex operations and Fourier tools. The implementation differs from the version used in the original paper in a few important aspects.

  1. This implementation uses features from vgg-m layer 1 and resnet18 residual block 3.
  2. As in our later UPDT tracker, seperate filters are trained for shallow and deep features, and extensive data augmentation is employed in the first frame.
  3. The GMM memory module is not implemented, instead the raw projected samples are stored.

Please refer to the official implementation of ECO if you are looking to reproduce the results in the ECO paper or download the raw results.

Associated trackers

We list associated trackers that can be found in external repositories.

E.T.Track (WACV 2023)

[Paper] [Code]

Official implementation of E.T.Track. E.T.Track utilized our proposed Exemplar Transformer, a transformer module utilizing a single instance level attention layer for realtime visual object tracking. E.T.Track is up to 8x faster than other transformer-based models, and consistently outperforms competing lightweight trackers that can operate in realtime on standard CPUs.



Clone the GIT repository.

git clone

Clone the submodules.

In the repository directory, run the commands:

git submodule update --init  

Install dependencies

Run the installation script to install all the dependencies. You need to provide the conda install path (e.g. ~/anaconda3) and the name for the created conda environment (here pytracking).

bash conda_install_path pytracking

This script will also download the default networks and set-up the environment.

Note: The install script has been tested on an Ubuntu 18.04 system. In case of issues, check the detailed installation instructions.

Windows: (NOT Recommended!) Check these installation instructions.

Let's test it!

Activate the conda environment and run the script pytracking/ to run ATOM using the webcam input.

conda activate pytracking
cd pytracking
python dimp dimp50    

What's next?

pytracking - for implementing your tracker

ltr - for training your tracker


Main Contributors

Guest Contributors


Project Cover


豆包 MarsCode 是一款革命性的编程助手,通过AI技术提供代码补全、单测生成、代码解释和智能问答等功能,支持100+编程语言,与主流编辑器无缝集成,显著提升开发效率和代码质量。

Project Cover


Suno AI是一个革命性的AI音乐创作平台,能在短短30秒内帮助用户创作出一首完整的歌曲。无论是寻找创作灵感还是需要快速制作音乐,Suno AI都是音乐爱好者和专业人士的理想选择。

Project Cover



Project Cover


Kimi AI助手提供多语言对话支持,能够阅读和理解用户上传的文件内容,解析网页信息,并结合搜索结果为用户提供详尽的答案。无论是日常咨询还是专业问题,Kimi都能以友好、专业的方式提供帮助。

Project Cover



Project Cover


探索Tensor.Art平台的独特AI模型,免费访问各种图像生成与AI训练工具,从Stable Diffusion等基础模型开始,轻松实现创新图像生成。体验前沿的AI技术,推动个人和企业的创新发展。

Project Cover



Project Cover



Project Cover



@2024 懂AI·鲁ICP备2024100362号-6·鲁公网安备37021002001498号