Sai Shashank Kalakonda, a dual-degree student working with Dr. Ravi Kiran Sarvadevabhatla gave an oral presentation on Action-GPT: Leveraging Large-scale Language Models for Improved and Generalized Action Generation at IEEE International Conference on Multimedia and Expo (ICME) held in Brisbane, Australia from 10 – 14 July.
Research work as explained by the authors – Sai Shashank Kalakonda, Shubh Maheshwari, TCS Innovation Labs and Ravi Kiran Sarvadevabhatla:
We introduce Action-GPT,
- A plug and play framework for incorporating Large Language Models (LLMs) into text-based action generation models
- By carefully crafting prompts for LLMs, we generate richer and fine-grained descriptions of the action.
- We show that utilizing these detailed descriptions instead of the original action phrases leads to better alignment of text and motion spaces.
- Our experiments show qualitative and quantitative improvement in the quality of synthesized motions produced by recent text-to-motion models.
- Code, pretrained models and sample videos will be made available.
Full paper: https://actiongpt.github.io/