Pranav Manu supervised by Prof. Narayanan P J received his Master of Science – Dual Degree in Electronics and Communication Engineering (ECD). Here’s a summary of his research work on On the Democratization of Realistic 3D Head Avatar Generation and Reconstruction:
The increasing interest in Augmented Reality (AR) and Virtual Reality (VR) media formats has significantly driven the demand for photorealistic human head avatars. An accurate digital representation of the head is becoming essential to facilitate immersive communication and enable telepresence, a need particularly underscored by the recent global shift to remote interactions [15]. Beyond telepresence, highly realistic facial avatars play a vital role in the entertainment industry, from modifying actors’ appearances to driving virtual characters in movies and games. Creating a truly realistic-looking head avatar remains a formidable challenge. Traditional capture methods often require expensive setups, such as multi-camera light stages [3, 4], demanding significant expertise for capture and post-processing. Furthermore, accurately modeling the complex interaction of light with various facial materials like skin, hair, and eyes is non-trivial [6, 5, 2, 12]. For artificially generated avatars, the process typically involves laborious manual modeling by skilled artists using specialized software. The ”uncanny valley” phenomenon highlights the human sensitivity to even minor inaccuracies in facial appearance, making the pursuit of photorealism critical for believable digital replicas. To democratize the creation of realistic head avatars for widespread adoption in telepresence and AR/VR applications, there is a pressing need for methods that are both photorealistic and accessible, moving beyond expensive capture rigs or time-consuming manual artistry. This thesis addresses this challenge by exploring two distinct but complementary avenues: efficient digital replica generation from text and accessible capture-based reconstruction. Our initial approach focuses on the efficient creation of digital replicas via a text-conditioned generation method for textured neural parametric head avatars [10]. Leveraging advancements in 2D diffusion models [13], we aim to generate realistic head avatars from text descriptions in a fast, feed-forward manner, circumvent ing the iterative optimization bottleneck and ”Janus problem” often associated with Score Distillation Sampling-based methods [11]. While existing text-to-3D methods using parametric models [7, 14, 16] mitigate structural issues, they can still be slow. Our proposed single feed-forward pipeline significantly speeds up asset generation. However, traditional texture-based approaches can suffer from baked-in lighting effects, limiting their realism in novel environments and highlighting the need for methods that capture accurate material properties. To address the capture-based creation of relightable avatars accessibly, our following work proposes a method leveraging only a smartphone. While recent methods explore smartphone-based avatar capture [1, 8, 17], they often do not yield relightable results or necessitate extensive pre-training data. We introduce LightHeadEd [9], a novel smartphone-based capture strategy utilizing cheap polaroid filters to help disentangle diffuse and specular reflections, acting as a scalable alternative to traditional light stages. Complementing this, we propose a novel and efficient head representation based on 2D Gaussian splats, specifically designed for real-time rendering and relightability, making it highly suitable for applications like telepresence. We conduct thorough evaluations of our proposed text-based generation and capture-based reconstruction methods on both public datasets and a novel dataset recorded using our DuoPolo capture setup. Our results demonstrate that our methods achieve state-of-the-art performance in their respective domains. We also analyze the limitations of our approaches and outline promising future research directions, aiming for a substantial impact on accelerating progress in accessible and high-fidelity 3D head generation and reconstruction.
June 2025