Multi-Face Speaker Isolation & Idle State for Non-Speakers | Voters

Multi-Face Speaker Isolation & Idle State for Non-Speakers
Las Venturas
Multi-Face Speaker Isolation & Idle State for Non-Speakers
(Description)
Context: We are developing a narrative video generation platform using the HeyGen API. Our content frequently involves photos with two subjects in a single frame (e.g., partners, colleagues, or interview scenarios).
The Problem: Currently, when we upload a photo with two faces, the API (v2 / Avatar 4.0) automatically selects a "dominant" face to animate. We have no control over this selection.
Wrong Speaker: Sometimes the wrong person speaks, or the animation switches unpredictably between faces.
Unnatural Static State: If we crop the image to isolate one face, we lose the context. If we keep the full image, the non-speaking person freezes like a static JPEG, which creates an "uncanny valley" effect next to the animated character.
Our Requirement: We need a native API parameter to control "Who Speaks" and "Who Listens" within a single multi-face image.
Proposed Solution:
Selection Parameter: Please allow a face_index or roi parameter to designate the Active Speaker programmatically.
Idle/Listening Mode (Critical): The non-selected face(s) should NOT be completely static. They need "Idle Animation" (blinking, slight head movement) to look alive, but their lips must remain closed (audio-sync disabled) while the other person speaks.
Why is this important? This feature is essential for realistic multi-character storytelling. Without "Speaker Isolation" and "Active Listening" states, we cannot produce professional-grade dialogue videos from single photos.
Thank you.
A day ago