AnimationGPT
AnimationGPT is a project focused on generating combat style character animations based on text. This project is trained on the MotionGPT and has produced the first character animation dataset dedicated to combat styles, named CombatMotion, which comes with textual descriptions.
Compare to current text-to-motion dataset
Dataset | Motions | Texts | Style | Source |
---|---|---|---|---|
KIT-ML | 3,911 | 6,278 | Daily Life | Motion Capture |
HumanML3D | 14,616 | 44,970 | Daily Life | Motion Capture |
Motion-X | 81,084 | 81,084 | Daily Life | Video Reconstruction |
CMP | 8,700 | 26,100 | Combat | Game |
CMR | 14,883 | 14,883 | Combat | Game |
Compared to the current text-to-motion datasets, CombatMotion has the following characteristics:
- Derived from game assets.
- Features a fighting style, where the animation style in action games tends to be concentrated, and the types of actions are biased.
- More detailed textual annotations.
Combat Motion Dataset
Pipline
-
Obtain game assets in FBX format, redirect them to SMPL, and read the coordinates of human body joints (refer to Fbx2SMPL);
-
Add textual annotations. For each animation, manually annotate it from the following aspects: action type, weapon type, attack type, locational words, power descriptor words, speed descriptor words, and confusion descriptor words. A partial list of terms is shown below:
Action type Weapon type Attack type Locative words Power Speed Fuzzy Idle Bare Hand Left-Handed In-Place Light-Weighted Swift Piercing Get Hit Sacred Seal Right-Handed Towards Left Steady Relative Fast Slash Death Fist One-Handed Towards Right Heavy-Weighted Uniform Speed Blunt … … … … … … … Then, use GPT-4 to combine these annotations into sentences.
The diagram above outlines our annotation process. Initially, we fill in seven key descriptive words based on the characteristics of the animation, followed by writing posture description sentences. Subsequently, we use a large language model to integrate these elements into several complete natural language sentences. Finally, we select the sentence that best meets our requirements as the annotation result.
-
Process the animation and annotated data into a format compatible with HumanML3D.
CombatMotionProcessed Dataset(CMP)
Download: google drive
CombatMotionProcessed(CMP) is a refined dataset that, in terms of character animation, retains 8,700 high-quality animations with a strong fighting style. In terms of textual annotations, we provide three text annotations for each animation: a concise description, a concise description with sensory details, and a detailed description.
Taking CMP008388
as an example, its corresponding text annotations are:
weapon attack a man holding a Katana,executing a Charged Heavy Attack,Dual Wielding,root motion get Forward, Steady,Powerful and Relative Slow,First slow then fast,Cleanly.
weapon attack a man holding a Katana,executing a Charged Heavy Attack,Dual Wielding,root motion get Forward, Steady,Powerful and Relative Slow,First slow then fast,Cleanly,which make a sense of Piercing,Wide Open,Charged,Accumulating strength.
The character grips the wedge with both hands and charges for a powerful strike. They firmly lower their body, twist to the left, lunge forward with a bow step, and stab with the sword held in both hands.
CombatMotionRaw Dataset(CMR)
Download: google drive
CombatMotionRaw (CMR) is an unrefined dataset containing 14,883 animation entries (CMP is a subset of CMR), but each animation is only provided with one textual annotation. Moreover, the textual annotations in CMR consist of simple concatenations of annotated words. It was found during project development that models trained with this type of annotation performed poorly, thus this format was ultimately not adopted.
Example of textual annotation:
weapon attack curved sword curved greatsword right-handed one-handed charged heavy attack forward steady powerful charged accumulating strength cleanly first slow then fast slash smooth and coherent wide open featherlike roundabout lean over and twist your waist to the left step forward with your right leg store your right hand from the left back swing it diagonally downward and swing two circles.
CMR has a richer set of animation data, unfortunately, the annotations are not detailed enough. You can read the textual annotations from the dataset yourself and refine them.
Model and Evaluation
Here are models trained on the CMP dataset using different algorithms:
- MotionGPT Model:google drive
- MLD Model:google drive
- MDM Model:google drive
Evaluation on CMP
Metric | MotionGPT | MLD | MDM |
---|---|---|---|
Matching Score↓ | 5.426 ± 0.017 | 5.753 ± 0.019 | 7.220 ± 0.018 |
Matching Score (Ground Truth)↓ | 5.166 ± 0.012 | 5.177 ± 0.018 | 5.179 ± 0.013 |
R_precision (top 1)↑ | 0.044 ± 0.002 | 0.048 ± 0.002 | 0.030 ± 0.001 |
R_precision (top 2)↑ | 0.084 ± 0.003 | 0.089 ± 0.003 | 0.063 ± 0.002 |
R_precision (top 3)↑ | 0.122 ± 0.003 | 0.126 ± 0.003 | 0.096 ± 0.002 |
R_precision (top 1)(Ground Truth)↑ | 0.050 ± 0.002 | 0.051 ± 0.002 | 0.053 ± 0.002 |
R_precision (top 2)(Ground Truth)↑ | 0.094 ± 0.002 | 0.095 ± 0.003 | 0.097 ± 0.003 |
R_precision (top 3)(Ground Truth)↑ | 0.133 ± 0.003 | 0.134 ± 0.004 | 0.136 ± 0.004 |
FID↓ | 0.531 ± 0.018 | 1.240 ± 0.036 | 40.395 ± 0.424 |
Diversity→ | 5.143 ± 0.052 | 5.269 ± 0.044 | 3.364 ± 0.080 |
Diversity (Ground Truth)→ | 5.188 ± 0.070 | 5.200 ± 0.049 | 5.191 ± 0.036 |
MultiModality ↑ | 1.793 ± 0.094 | 2.618 ± 0.115 | 2.463 ± 0.102 |
Tutorial
-
If you need to train a model, please download the CMP dataset. Then, follow the tutorials for MotionGPT or other text-to-motion algorithms to set up the environment and train your model.
-
If you only need to use the AGPT model trained on the CMP dataset, please follow these steps:
-
Set up the environment
Our experimental environment is Ubuntu 22.04, NVIDIA GeForce RTX 4090, and CUDA 11.8
git clone https://github.com/OpenMotionLab/MotionGPT.git cd MotionGPT conda create python=3.10 --name mgpt conda activate mgpt pip install -r requirements.txt python -m spacy download en_core_web_sm mkdir deps cd deps bash prepare/prepare_t5.sh bash prepare/download_t2m_evaluators.sh
-
Download the CMP dataset
Unzip the dataset into the
datasets/humanml3d
directory.. └── humanml3d ├── new_joint_vecs ├── new_joints └── texts
-
Generate animations using the model
-
git clone https://github.com/fyyakaxyy/AnimationGPT.git
-
Copy the
tools
folder andconfig_AGPT.yaml
into theMotionGPT
directory -
Download the AGPT model, place it in the
MotionGPT
directory -
Save the prompt in
input.txt
-
Run
python demo.py --cfg ./config_AGPT.yaml --example ./input.txt
The generated result is
id_out.npy
, stored inresults/mgpt/debug--AGPT/
-
-
File format conversion
- Convert the generated npy files to mp4 files: modify the file path in
tools/animation.py
, then run:python animation.py
- Convert the generated npy files to bvh files: modify the file path in
tools/npy2bvh/joints2bvh.py
, then run:python joints2bvh.py
Note: The code for npy2bvh is sourced from Momask
- Convert the generated npy files to mp4 files: modify the file path in
-
Windows10 Tutorial
Use the AGPT model trained on the CMP dataset under Windows10:
-
When configuring the environment for MotionGPT (step 1), some packages may still be missing after using python=3.10.6 and installing requirements.txt, just follow the instructions to install them manually.
-
Windows file path separator and linux are different, some path errors need to be changed to the Win system separator, such as the separator
'/'
change toos.sep
in the config.py -
Convert the generated npy files to mp4 files under python=3.10 environment may report errors. The matplotlib library requires version 3.3.3, but the minimum supported library version of cp310 is 3.5.0. If you use a library version higher than 3.5.0, you will encounter the following error:
ax.lines = [] AttributeError: can't set attribute
,ax.collections = [] AttributeError: can't set attribute
,ani.save "ValueError: unknown file extension: .mp4
.
If you encounter only the first two errors when executing with matplotlib>=3.5.0, you can refer to this issue https://github.com/GuyTevet/motion-diffusion-model/issues/6.
If you are also experiencing unrecognized mp4 files, you need to additionally download ffmpeg, unzip it and modify these contents in tools/animation.py
:
import matplotlib.pyplot as plt
plt.rcParams['animation.ffmpeg_path'] = r'D:\\ffmpeg\\bin\\ffmpeg.exe' #ffmpeg floder
from mpl_toolkits.mplot3d import Axes3D
If you have successfully generated a video file after resolving the error, but the video only has a white screen, please try switching to another python version to do the npy file format conversion. tools/requirements.txt
provides the necessary dependencies for python=3.9.19 to work properly.
-
The following problems may be encountered when converting the generated npy files to bvh files
-
Some packages are missing or numpy is reporting errors. Prioritize using python=3.9.19 and install the dependencies in
tools/requirements.txt.
-
tools/npy2bvh/joints2bvh.py
is missing some package imports. Add this code:import matplotlib import matplotlib.pyplot as plt from mpl_toolkits.mplot3d.art3d import Poly3DCollection import mpl_toolkits.mplot3d.axes3d as p3
-
No such file or directory: './visualization/data/template.bvh'
. Modify the following path to use the commented out version:self.template = BVH.load('./visualization/data/template.bvh', need_quater=True) #self.template = BVH.load(os.path.dirname(__file__) + '\\visualization\\data\\template.bvh', need_quater=True)
-
index 1 is out of bounds for axis 1 with size 1
. Make sure there is no_in.npy
file in the path of the file you want to convert, just keep_out.npy
to solve the problem.
-
Suggestions
During the process of dataset creation and model training/tuning, you might encounter some issues in aspects like textual annotations, model training, and data augmentation. Based on our experience, we offer the following suggestions:
Model Training Crashes Due to Errors in Textual Annotations
If you process data using the HumanML3D pipeline, you might encounter the following issues, which can lead to model training crashes:
- The textual description contains Chinese characters or Chinese punctuation.
- Some words fail to be successfully annotated with part-of-speech tags.
- Certain mathematical symbols, such as the degree symbol "°", are recognized as abnormal characters.
Exploration of Textual Annotations
- Adding descriptions of root motion direction in the annotated text can help the model learn directional words.
- Adding frame number information to the annotated text does not enable the model to learn how to control the duration (or number of frames) of generation.
- The more detailed the textual annotations and the greater the number of different annotations for the same animation, the better the performance of the model.
Mixed Training
Mixing the HumanML3D, KIT-ML, and CMP datasets for model training can result in significant improvements in evaluation metrics.