Contents Menu Expand Light mode Dark mode Auto light/dark mode
OFASys documentation
OFASys documentation

Getting Started

  • OFASys
  • Installation
  • Usage in 15 minutes

How-To Guides

  • Define a Task
  • Train a Task
  • Add a Custom Module

Conceptual Guides

  • Philosophy
  • Core Concepts

Task Gallery

  • Text-Only Tasks
  • Image-Related Tasks
  • Box-Related Tasks
  • Video-Related Tasks
  • Audio-Related Tasks
  • Structural Language Tasks
  • Motion-Related Tasks

API Reference

  • Model
  • Adaptor
  • Preprocessor
  • Criterion
  • Task
  • Instruction
  • Generator
  • Utils
  • Hub Interface
Back to top
Edit this page

Motion-Related Tasks#

Text to Motion Generation#

Task Introduction#

OFASys is compatible with training objectives other than autoregressive language modeling. We now describe how OFASys implements denoising diffusion probabilistic modeling (DDPM), for the task of text-to-motion synthesis.

Here we assume the dataset is a table, where each sample record contains two fields. Field “mocap” of modality “MOTION” is a BVH file containing motion capture data, while field “text” of modality “TEXT” is a text sentence describing the captured motion, e.g., “a person walks four steps backward”. Similarly, we can replace “text” with other modalities such as “[AUDIO:…]”, “[IMAGE:…]”, “[VIDEO:…]” to implement various kinds of conditional synthesis tasks. And we can simply replace the text with an empty string to implement the task of unconditional motion synthesis, aka., motion prediction.

Default Template#

motion capture: [TEXT:text] -> [MOTION:bvh_frames,preprocess=motion_6d,adaptor=motion_6d]

Usage#

Prepare the model, instruction, and text prompts for text-to-motion generation.

import torch
from ofasys import OFASys
# This checkpoint is for demonstration purposes only, and does not represent the final quality of any project or product.
# The checkpoint is for research only, and commercial use is prohibited.
model = OFASys.from_pretrained('http://ofasys.oss-cn-zhangjiakou.aliyuncs.com/model_hub/single_task_motion.pt')
if torch.cuda.is_available():
    model = model.cuda()
instruction = 'motion capture: [TEXT:text] -> [MOTION:bvh_frames,preprocess=motion_6d,adaptor=motion_6d]'
prompts = [
    {'text': 'run then jump'},
    {'text': 'run then jump like dancing'},
]

Example 1: Inference without classifier-free guidance. This usage is much simpler and more concise than the classifier-free guided approach described later (see Example 2). However, the generated results tend to correlate poorly with the text prompts. It is thus NOT recommended.

output = model.inference(instruction, data=prompts)
output[0].save_as_gif('run_then_jump__no_guide.gif')

Example 2 (recommended): Inference with classifier-free guidance enabled. It uses an experimental API for negative prompting and classifier-free guidance. Classifier-free guidance is implemented by providing the NULL condition (i.e., an empty text) as the negative prompt.

guided_prompts = []
for p in prompts:
    guided_prompts.append(p)
    guided_prompts.append({'text': ''})  # The negative prompt, or an empty string for classifier-free guidance.
# This API requires the positive and negative prompts be in the same batch, so assert batch_size % 2 == 0.
output = model.inference(instruction, data=guided_prompts, guidance_weight=3.0, batch_size=2)
output = output[::2]
output[0].save_as_gif('run_then_jump__guided.gif')
output[1].save_as_gif('run_then_jump_like_dancing__guided.gif')
output[0].save_as_bvh('run_then_jump__guided.bvh')  # Export the result in the BVH format for Blender.

CASE#

The saved result “run_then_jump__guided.gif” should look like below:

https://ofasys.oss-cn-zhangjiakou.aliyuncs.com/examples/run_then_jump_guided.gif

The saved result “run_then_jump_like_dancing__guided.gif” should look like below:

https://ofasys.oss-cn-zhangjiakou.aliyuncs.com/examples/run_then_jump_like_dancing_guided.gif

The saved result in the BVH file format, “run_then_jump__guided.bvh”, can be imported into a 3D animation software such as Blender for rendering:

Video
Next
Model
Previous
Structural Language Tasks
Copyright © 2022, OFA-Sys
Made with Sphinx and @pradyunsg's Furo
On this page
  • Motion-Related Tasks
    • Text to Motion Generation
      • Task Introduction
      • Default Template
      • Usage
      • CASE