Avatar IV API Enhancements (audio_url and elevenlabs model) | Voters

Avatar IV API Enhancements (audio_url and elevenlabs model)
Huu Binh Nguyen
Hi HeyGen team! 👋
I've been working extensively with the Avatar IV API and would love to see two specific enhancements that would greatly improve the developer experience and functionality:
🎵 External Audio File Support (Like Talking Photo API)
Current State:
The Talking Photo API already supports external audio files beautifully with this structure:
json{
"voice": {
"type": "audio",
"audio_url": "https://example.com/audio.mp3"
}
}
Request:
Please add the same external audio file support to Avatar IV API. Currently, when trying to use audio with Avatar IV, the API still requires a script field even when providing audio configuration, which seems inconsistent with the Talking Photo behavior.
Use Cases:
Using pre-recorded professional voiceovers
Multi-language content with native speakers
Custom audio processing/effects
Consistent audio branding across videos
Suggested Implementation:
Allow Avatar IV to accept the same audio voice configuration as Talking Photo:
json{
"voice": {
"type": "audio",
"audio_url": "https://example.com/audio.mp3"
// OR
"audio_asset_id": "your_uploaded_asset_id"
}
}
🌍 Enhanced ElevenLabs Voice Configuration
Current State:
ElevenLabs voices in Avatar IV seem to have limited configuration options compared to what ElevenLabs API offers natively.
Request:
Please add support for additional ElevenLabs parameters:
Language/Model Selection: Allow specifying ElevenLabs models (e.g., eleven_multilingual_v2, eleven_turbo_v2)
Language Code: Enable language-specific optimization for non-English content
Voice Settings: Expose stability, similarity_boost, and style parameters
Suggested API Structure:
json{
"voice": {
"type": "text",
"voice_id": "elevenlabs_voice_id", 
"provider": "elevenlabs",
"language": "es",
"model": "eleven_multilingual_v2",
"voice_settings": {
"stability": 0.75,
"similarity_boost": 0.75,
"style": 0.5
}
}
}
Benefits:
Better multilingual support
Improved voice quality control
Consistency with ElevenLabs best practices
More granular voice customization
Why These Features Matter
Both of these enhancements would:
✅ Improve API consistency across HeyGen products
✅ Enable more professional use cases
✅ Better serve international/multilingual developers
✅ Align with industry standards (ElevenLabs native capabilities)
Community Impact
I believe these features would benefit many developers in the community who are building:
International content platforms
Professional video production tools
Educational applications
Marketing automation systems
Would love to hear the team's thoughts on this! Is this something that could be considered for a future release?
Thanks for all the amazing work on HeyGen! 🚀
September 5, 2025