Avatar IV API Enhancements (audio_url and elevenlabs model)
H
Huu Binh Nguyen
Hi HeyGen team! đź‘‹
I've been working extensively with the Avatar IV API and would love to see two specific enhancements that would greatly improve the developer experience and functionality:
- 🎵 External Audio File Support (Like Talking Photo API)
Current State:
The Talking Photo API already supports external audio files beautifully with this structure:
json{
"voice": {
"type": "audio",
"audio_url": "https://example.com/audio.mp3"
}
}
Request:
Please add the same external audio file support to Avatar IV API. Currently, when trying to use audio with Avatar IV, the API still requires a script field even when providing audio configuration, which seems inconsistent with the Talking Photo behavior.
Use Cases:
Using pre-recorded professional voiceovers
Multi-language content with native speakers
Custom audio processing/effects
Consistent audio branding across videos
Suggested Implementation:
Allow Avatar IV to accept the same audio voice configuration as Talking Photo:
json{
"voice": {
"type": "audio",
"audio_url": "https://example.com/audio.mp3"
// OR
"audio_asset_id": "your_uploaded_asset_id"
}
}
- 🌍 Enhanced ElevenLabs Voice Configuration
Current State:
ElevenLabs voices in Avatar IV seem to have limited configuration options compared to what ElevenLabs API offers natively.
Request:
Please add support for additional ElevenLabs parameters:
Language/Model Selection: Allow specifying ElevenLabs models (e.g., eleven_multilingual_v2, eleven_turbo_v2)
Language Code: Enable language-specific optimization for non-English content
Voice Settings: Expose stability, similarity_boost, and style parameters
Suggested API Structure:
json{
"voice": {
"type": "text",
"voice_id": "elevenlabs_voice_id",
"provider": "elevenlabs",
"language": "es",
"model": "eleven_multilingual_v2",
"voice_settings": {
"stability": 0.75,
"similarity_boost": 0.75,
"style": 0.5
}
}
}
Benefits:
Better multilingual support
Improved voice quality control
Consistency with ElevenLabs best practices
More granular voice customization
Why These Features Matter
Both of these enhancements would:
âś… Improve API consistency across HeyGen products
âś… Enable more professional use cases
âś… Better serve international/multilingual developers
âś… Align with industry standards (ElevenLabs native capabilities)
Community Impact
I believe these features would benefit many developers in the community who are building:
International content platforms
Professional video production tools
Educational applications
Marketing automation systems
Would love to hear the team's thoughts on this! Is this something that could be considered for a future release?
Thanks for all the amazing work on HeyGen! 🚀