awesome-gesture_generation

Awesome Gesture Generation Awesome

Continuing editing (Not finished yet)

The goal of project is focus on Audio-driven Gesture Generation with output is 3D keypoints gesture.
Input: Audio, Text, Gesture, ..etc. -> Output: Gesture Motion

Gesture Generation is the process of generating gestures from speech or text. The goal of Gesture Generation is to generate gestures that are natural, realistic, and appropriate for the given context. The generated gestures can be used to animate virtual characters, robots, or embodied conversational agents.

ACM CCS: • Human-centered computing → Human computer interaction (HCI).

Paper by Folder : 📁/survey   📁/approach   📁/papers   📁/dataset   📁/books

Table of Contents


Main resource


1. Survey

Comprehensive preview

Survey review

Evaluation survey

GENEA Challenge


2. Papers

2.1 Relation of speech and gesture

2.2 GENEA

GENEA 2024

GENEA Workshop 2024 - ICMI 2024 Accepted papers [Homepage]
Paper 🏆
Gesture Area Coverage to Assess Gesture Expressiveness and Human-Likeness 🏆

GENEA 2023

GENEA Challenge 2023 [Homepage]
Method (Team*) Paper Video 🏆
FineMotion 【ICMI 2023】The FineMotion entry to the GENEA Challenge 2023: DeepPhase for conversational gestures generation [paper] [youtube]  
Gesture Motion Graphs 【ICMI 2023】Gesture Motion Graphs for Few-Shot Speech-Driven Gesture Reenactment [paper] [youtube]  
Diffusion-based 【ICMI 2023】(SG) Diffusion-based co-speech gesture generation using joint text and audio representation [paper] [youtube]
UEA Digital Humans 【ICMI 2023】The UEA Digital Humans entry to the GENEA Challenge 2023 [paper] ; [JonathanPWindle/UEA-DH-GENEA23] [youtube]  
FEIN-Z 【ICMI 2023】FEIN-Z: Autoregressive Behavior Cloning for Speech-Driven Gesture Generation [paper] [youtube]  
DiffuseStyleGesture+ 【ICMI 2023】(SF) The DiffuseStyleGesture+ entry to the GENEA Challenge 2023 [paper] [youtube] 🏆
Discrete Diffusion 【ICMI 2023】Discrete Diffusion for Co-Speech Gesture Synthesis [paper] [youtube]  
KCL-SAIR 【ICMI 2023】The KCL-SAIR team’s entry to the GENEA Challenge 2023 Exploring Role-based Gesture Generation in Dyadic Interactions: Listener vs. Speaker [paper] [youtube]  
Gesture Generation 【ICMI 2023】Gesture Generation with Diffusion Models Aided by Speech Activity Information [paper] [youtube]  
Co-Speech Gesture Generation 【ICMI 2023】Co-Speech Gesture Generation via Audio and Text Feature Engineering [paper] [youtube]  
DiffuGesture 【ICMI 2023】DiffuGesture: Generating Human Gesture From Two-person Dialogue With Diffusion Models [paper] [youtube]  
KU-ISPL 【ICMI 2023】The KU-ISPL entry to the GENEA Challenge 2023-A Diffusion Model for Co-speech Gesture generation [paper] [youtube]  
GENEA Workshop 2023 - ICMI 2023 Accepted papers [Homepage]
Papers Video 🏆
【ICMI 2023】 MultiFacet A Multi-Tasking Framework for Speech-to-Sign Language Generation [paper]    
【ICMI 2023】 Look What I Made It Do - The ModelIT Method for Manually Modeling Nonverbal Behavior of Socially Interactive Agents [paper]    
【ICMI 2023】 A Methodology for Evaluating Multimodal Referring Expression Generation for Embodied Virtual Agents [paper]    
【ICMI 2023】 Towards the generation of synchronized and believable non-verbal facial behaviors of a talking virtual agent [paper]; [aldelb/non_verbal_facial_animation]   🏆

GENEA 2022

GENEA Challenge 2022 - Accepted papers [Homepage]
Team (Method) Paper Video 🏆
DeepMotion 【ICMI 2022】The DeepMotion entry to the GENEA Challenge 2022 [paper] [youtube]  
DSI 【ICMI 2022】Hybrid Seq2Seq Architecture for 3D Co-Speech Gesture Generation [paper] [youtube]  
FineMotion 【ICMI 2022】ReCell: replicating recurrent cell for auto-regressive pose generation [paper] [FineMotion/GENEA_2022] [youtube]  
Forgerons 【ICMI 2022】Ubisoft Exemplar-based Stylized Gesture Generation from Speech: An Entry to the GENEA Challenge 2022 [paper] [youtube]  
GestureMaster 【ICMI 2022】GestureMaster: Graph-based Speech-driven Gesture Generation [paper] [youtube]  
IVI Lab 【ICMI 2022】The IVI Lab entry to the GENEA Challenge 2022 – A Tacotron2 Based Method for Co-Speech Gesture Generation With Locality-Constraint Attention Mechanism [paper] [Tacotron2-SpeechGesture] [youtube] 🏆
ReprGesture 【ICMI 2022】The ReprGesture entry to the GENEA Challenge 2022 [paper] [YoungSeng/ReprGesture] [youtube]  
TransGesture 【ICMI 2022】TransGesture: Autoregressive Gesture Generation with RNN-Transducer [paper] [youtube]  
UEA Digital Humans 【ICMI 2022】UEA Digital Humans entry to the GENEA Challenge 2022 [paper] [UEA/GENEA22] [youtube]  
GENEA Workshop 2022 - ICMI 2022 Accepted papers [Homepage]
Papers Video 🏆
【ICMI 2022】 Understanding Interviewees’ Perceptions and Behaviour towards Verbally and Non-verbally Expressive Virtual Interviewing Agents [paper] [youtube]  
【ICMI 2022】 Emotional Respiration Speech Dataset [paper] [youtube]  
【ICMI 2022】 Automatic facial expressions, gaze direction and head movements generation of a virtual agent [paper] [youtube] 🏆
【ICMI 2022】 Can you tell that I’m confused? An overhearer study for German backchannels by an embodied agent [paper] [youtube]  

GENEA 2021

GENEA Challenge 2021 - ICMI 2021 Accepted papers [Homepage]
Papers Video 🏆
【ICMI 2021】 Probabilistic Human-like Gesture Synthesis from Speech using GRU-based WGAN [paper] [wubowen416/gesture-generation-using-WGAN] [youtube] 🏆
【ICMI 2021】 Influence of Movement Energy and Affect Priming on the Perception of Virtual Characters Extroversion and Mood [paper]  
【ICMI 2021】 Crossmodal clustered contrastive learning: Grounding of spoken language to gesture [paper] [dondongwon/CC_NCE_GENEA] [youtube]  

GENEA 2020

GENEA Challenge 2020 - Accepted papers [Homepage]
Papers Video
【IVA 2020】 The StyleGestures entry to the GENEA Challenge 2020 [paper] ; [[simonalexanderson/StyleGestures]] [youtube]
【IVA 2020】 The FineMotion entry to the GENEA Challenge 2020 [paper] ; [FineMotion/GENEA_2020] [youtube]
【IVA 2020】 Double-DCCCAE: Estimation of Sequential Body Motion Using Wave-Form - AlltheSmooth [paper] [youtube]
【IVA 2020】 CGVU: Semantics-guided 3D Body Gesture Synthesis [paper] [youtube]
【IVA 2020】 Interpreting and Generating Gestures with Embodied Human Computer Interactions [paper] [youtube]
【IVA 2020】 The Nectec Gesture Generation System entry to the GENEA Challenge 2020 [paper] [youtube]

2024

2023


2022


2021


2020


2019


2018


Before 2017


Others


3. Approachs

3.1 Rule Base approach


3.2 Selected data-driven approach

3.2.a Statistical approach

3.2.b Deep learning approach

This section is – not accurate –> continue edditing


4. Pipelines

-


5. Learning Objective

Full name Description
Adversarial Loss (Adv) Used in Generative Adversarial Networks (GANs), this loss function pits a generator network against a discriminator network, with the goal of the generator producing samples that can fool the discriminator into thinking they are real.
Categorical Cross Entropy (CCE) A common loss function used in multi-class classification tasks, where the goal is to minimize the difference between the predicted and true class labels.
Cross-modal Cluster Noise Contrastive Estimation (CC-NCE) Used in multimodal learning to learn joint representations across different modalities, this loss function maximizes the similarity between matching modalities while minimizing the similarity between non-matching modalities.
Edge Transition Cost (ETC) Used in graph-based image segmentation, this loss function measures the similarity between adjacent pixels in an image to preserve the coherence and smoothness of segmented regions.
Expectation Maximization (EM) Used for maximum likelihood estimation when dealing with incomplete or missing data, this algorithm involves computing the expected likelihood of the missing data and updating model parameters to maximize the likelihood of the observed data given the expected values.
Geodesic Distance (GeoD) Used in deep learning for image segmentation, this loss function penalizes the discrepancy between the predicted segmentation map and the ground truth, while also considering the spatial relationships between different image regions.
Wasserstein-GAN Gradient Penalty (WGAN-GP) An extension of the Wasserstein GAN algorithm that adds a gradient penalty term to the loss function, used to enforce the Lipschitz continuity constraint and ensure stability during training.
Hamming Distance (Hamm) Used in information theory, this metric measures the number of positions at which two strings differ.
Huber Loss (Huber) A robust loss function used in regression tasks that is less sensitive to outliers than the Mean Squared Error (MSE) loss.
Imitation Reward (IR) Used in imitation learning to train a model to mimic the behavior of an expert agent, by providing a reward signal based on how closely the model’s behavior matches that of the expert.
Kullback–Leibler Divergence (KL) Used to measure the difference between two probability distributions, this loss function is commonly used in probabilistic models and deep learning for regularization and training.
L2 Distance (L2) Measures the Euclidean distance between two points in space, commonly used in regression tasks.
Mean Absolute Error (MAE) A loss function used in regression tasks that measures the average difference between the predicted and true values.
Maximum Likelihood Estimation (MLE) A statistical method used to estimate the parameters of a probability distribution that maximize the likelihood of observing the data.
Mean Squared Error (MSE) A common loss function used in regression tasks that measures the average squared difference between the predicted and true values.
Negative Log-likelihood (NLL) Used in probabilistic models to maximize the likelihood of the observed data by minimizing the negative log-likelihood.
Structural Similarity Index Measure (SIMM) Used in image processing to measure the similarity between two images based on their luminance, contrast, and structural content.
Task Reward (TR) Used in reinforcement learning to provide a reward signal to an agent based on its performance in completing a given task.
Variance (Var) A statistical metric used to measure the variability of a set of data points around their mean.
Within-cluster Sum of Squares (WCSS) Used in cluster analysis to measure the variability of data points within a cluster by computing the sum of squared distances between each data point and the mean of the cluster.

5. Metric Evaluation

Evaluation aspects

Metric (Description) Body tier Type 2020 2021 2022 2023
FNA (Full-body Natural Motion ) 🧍 🧍‍♂️        
FBT (Full-body Text-based ) 🧍 📃        
FSA (Full-body Custom by Teams ) 🧍        
FSB (Full-body Custom by Teams ) 🧍 ⚙️        
FSC (Full-body Custom by Teams ) 🧍 ⚙️        
FSD (Full-body Custom by Teams ) 🧍 ⚙️        
FSF (Full-body Custom by Teams ) 🧍 ⚙️        
FSG (Full-body Custom by Teams ) 🧍 ⚙️        
FSH (Full-body Custom by Teams ) 🧍 ⚙️        
FSI (Full-body Custom by Teams ) 🧍 ⚙️        
UNA (Upper-body Natural Motion ) 🧑‍🦲 🧍‍♂️        
UBA (Upper-body Audio-based ) 🧑‍🦲 🔊        
UBT (Upper-body Text-based ) 🧑‍🦲 📃        
USJ (Upper-body Custom by Teams) 🧑‍🦲 ⚙️        
USK (Upper-body Custom by Teams) 🧑‍🦲 ⚙️        
USL (Upper-body Custom by Teams) 🧑‍🦲 ⚙️        
USM (Upper-body Custom by Teams) 🧑‍🦲 ⚙️        
USN (Upper-body Custom by Teams) 🧑‍🦲 ⚙️        
USO (Upper-body Custom by Teams) 🧑‍🦲 ⚙️        
USP (Upper-body Custom by Teams) 🧑‍🦲 ⚙️        
USQ (Upper-body Custom by Teams) 🧑‍🦲 ⚙️        

Objective metrics

3.1 Average acceleration and jerk

3.2 Comparing speed histograms

3.3 Canonical correlation analysis

3.4 Fréchet gesture distance

3.5 System ranking comparison


4. Datasets

Dataset Modalities Type Download Paper
IEMOCAP 🚶, 🔊, 📃, 🤯 👥 sail.usc.edu/iemocap [paper]
Creative-IT 🚶, 🔊, 📃, 🤯 👥 sail.usc.edu/CreativeIT  
Gesture-Speech Dataset 🚶, 🔊 👤 dropbox  
CMU Panoptic 🚶, 🔊, 📃 👥 domedb.perception.cmu [paper]
Speech-Gesture 🚶, 🔊 👤 amirbar/speech2gesture [paper]
TED Dataset [homepage] 🚶, 🔊 👤 youtube-gesture-dataset  
Talking With Hands ([github]) 🚶, 🔊 👥 facebookresearch/TalkingWithHands32M [paper]
PATS ([homepage], [github]) 🚶, 🔊, 📃 👤 chahuja.com/pats [paper]
Trinity Speech-Gesture I 🚶, 🔊, 📃 👤 Trinity Speech-Gesture I  
Trinity Speech-Gesture II 🚶, 🔊, 🎞️ 👤 Trinity Speech GestureII  
Speech-Gesture 3D extension 🚶, 🔊 👤 nextcloud.mpi-klsb  
Talking With Hands GENEA Extension 🚶, 🔊, 📃 👥 zenodo/6998231 [paper]
SaGA 🚶, 🔊, ℹ️ 👥 phonetik.uni-muenchen [paper]
SaGA++ 🚶, 🔊, ℹ️ 👥 zenodo/6546229  
ZEGGS Dataset [youtube] 🚶, 🔊 👤 ubisoft-laforge-ZeroEGGS [paper]
BEAT Dataset ([homepage] [homepage], [github]) 🚶, 🔊, 📃, ℹ️, 🤯 👥, 👤 github.io/BEAT [paper]
InterAct homepage 🚶, 🔊, 📃 👥 hku-cg.github.io [paper]

2022 GENEA Challenge


5. Toolkit

9. Playlist & Talks

GENEA

GENEA 2023 Playlist

GENEA 2022 Playlist

GENEA 2021 Playlist

GENEA 2020 Playlist

SIGGRAPH

ACM SIGGRAPH MIG 2019 Playlist

10. Code

11. Books


PapersWithCode Ranking

Contributing GitHub

Your contributions are always welcome! Please take a look at the contribution guidelines first.

License GitHub

This project is licensed under the MIT License - see the LICENSE.md file for details.

Created by OpenHuman

OpenHuman.ai - Open Store for Realistic Digital Human