Semantics-aware human motion generation from audio instructions

Semantics-aware human motion generation from audio instructions

Recent advances in interactive technologies have highlighted the prominence of audio signals for semantic encoding. This paper explores a new task, where audio signals are used as conditioning inputs to generate motions that align with the semantics of the audio. Unlike text-based interactions, audi...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zi-An Wang, Shihao Zou, Shiyao Yu, Mingyuan Zhang, Chao Dong
Format:	Article
Language:	English
Published:	Elsevier 2025-06-01
Series:	Graphical Models
Subjects:	Human motion generation Multimodal learning Masked generative model Audio-conditioned generation
Online Access:	http://www.sciencedirect.com/science/article/pii/S1524070325000153
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Multimodal diffusion framework for collaborative text image audio generation and applications
by: Junhua Wang, et al.
Published: (2025-07-01)

Authenticity at Risk: Key Factors in the Generation and Detection of Audio Deepfakes
by: Alba Martínez-Serrano, et al.
Published: (2025-01-01)

Survey of deep fake audio generation and detection techniques
by: ZENG Zhiping, et al.
Published: (2025-01-01)

Multi-Level Feature Dynamic Fusion Neural Radiance Fields for Audio-Driven Talking Head Generation
by: Wenchao Song, et al.
Published: (2025-01-01)

Multi-channel neural audio decorrelation using generative adversarial networks
by: Carlotta Anemüller, et al.
Published: (2024-11-01)

Audio-Driven Facial Animation with Deep Learning: A Survey
by: Diqiong Jiang, et al.
Published: (2024-10-01)

Motion Tactics Model Combining Multiple Tactics to Represent Individual Differences in Human Running Motion
by: Masaki Kitagawa, et al.
Published: (2025-03-01)

DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis
by: Zeeshan Ahmad, et al.
Published: (2025-01-01)

UniMotion-DM: Uniform Text-Motion Generation and Editing via Diffusion Model
by: Song Lin, et al.
Published: (2024-01-01)

Lipsynthesis incorporating audio-visual synchronisation
by: Cong JIN, et al.
Published: (2023-09-01)

Automatic recognition and representation of text in the form of audio stream
by: L. V. Serebryanaya, et al.
Published: (2021-10-01)

Anuran call synthesis with diffusion models for enhanced bioacoustic classification under data scarcity
by: José Sebastián Ñungo Manrique, et al.
Published: (2025-12-01)

Automatic text generation system for endangered languages based on conditional generative adversarial networks
by: Zhong Luo
Published: (2025-12-01)

Conditional Generation of Building Bubble Diagrams Based on Stochastic Differential Equations
by: Zhiwen Wei, et al.
Published: (2025-01-01)

KeyMPs: One-Shot Vision-Language Guided Motion Generation by Sequencing DMPs for Occlusion-Rich Tasks
by: Edgar Anarossi, et al.
Published: (2025-01-01)

Analysis of Tax Morale and Tax Awareness in the Context of Generations Theory
by: Mustafa Topsakal, et al.
Published: (2023-11-01)

FakeMusicCaps: A Dataset for Detection and Attribution of Synthetic Music Generated via Text-to-Music Models
by: Luca Comanducci, et al.
Published: (2025-07-01)

RADAR: Reasoning AI-Generated Image Detection for Semantic Fakes
by: Haochen Wang, et al.
Published: (2025-07-01)

Probe-Assisted Fine-Grained Control for Non-Differentiable Features in Symbolic Music Generation
by: Rafik Hachana, et al.
Published: (2025-01-01)

Advanced deep learning for masked individual surveillance
by: Mohamed Elhoseny, et al.
Published: (2024-01-01)

Semantic Series as a Discourse Analysis Tool (Discourse about Generations)
by: N. V. Orlova
Published: (2021-07-01)

AVCaps: An Audio-Visual Dataset With Modality-Specific Captions
by: Parthasaarathy Sudarsanam, et al.
Published: (2025-01-01)

Continuous Satellite Image Generation from Standard Layer Maps Using Conditional Generative Adversarial Networks
by: Arminas Šidlauskas, et al.
Published: (2024-12-01)

Adaptive Context-Aware Generative Adversarial Network for Low-quality Image Enhancement
by: Xingyu Pan, et al.
Published: (2025-06-01)

Simultaneous text and gesture generation for social robots with small language models
by: Alessio Galatolo, et al.
Published: (2025-05-01)

Evaluation of Organizational Cynicism on the Basis of Generations: According to Generations What is the Target of Cynicism?
by: Yasemin Torun, et al.
Published: (2015-10-01)

GENERATIONAL CLASSIFIER OF MODERN RUSSIAN SOCIETY
by: A. V. Milekhin, et al.
Published: (2021-02-01)

Generational Values of Generation Y: Survey of Ukrainian Senoir School Pupils and Students
by: Tetyana Blyznyuk
Published: (2017-10-01)

Incorporating Multimodal Directional Interpersonal Synchrony into Empathetic Response Generation
by: Jingyu Quan, et al.
Published: (2025-01-01)

ProT-GFDM: A generative fractional diffusion model for protein generation
by: Xiao Liang, et al.
Published: (2025-01-01)

Depression detection based on dual path DCGAN data generation and classification-regression network
by: LU Jingxue, et al.
Published: (2025-01-01)

Caption Alignment and Structure-Aware Attention for Scientific Table-to-Text Generation
by: Jian Wu, et al.
Published: (2024-01-01)

Theoretical and methodological approaches to the study of generations: a comparative analysis
by: V. S. Novikova
Published: (2025-01-01)

Modeling and Motion Control of Underwater Snake Robot
by: Peijuan LI, et al.
Published: (2024-12-01)

From text to motion: grounding GPT-4 in a humanoid robot “Alter3”
by: Takahide Yoshida, et al.
Published: (2025-05-01)

GC4MRec: Generative-Contrastive for Multimodal Recommendation
by: Lei Wang, et al.
Published: (2025-03-01)

Automatic Timbre Transformation Using Enhanced Diffusion Model
by: Cheng-Han Wu, et al.
Published: (2025-01-01)

Analysis of the influence of selected audio pre-processing stages on accuracy of speaker language recognition
by: Олеся Барковська, et al.
Published: (2023-12-01)

Analysis of the influence of selected audio pre-processing stages on accuracy of speaker language recognition
by: Olesia Barkovska, et al.
Published: (2023-12-01)

Investigating the Use of Generative Adversarial Networks-Based Deep Learning for Reducing Motion Artifacts in Cardiac Magnetic Resonance
by: Ma ZP, et al.
Published: (2025-02-01)