uncloseai.
XTTS v2
Self-host — Not on the public endpoint. Clone the repo to use this engine.
What It Does
This is the high-fidelity voice cloning engine. Give it a 6-second audio sample of anyone's voice, and it will speak in that voice across 16 languages. It automatically detects the language of your input text.
Built on the Coqui TTS project — one of the most important open-source TTS efforts that was abandoned when the company shut down. We rescued it. The raccoons dug deep for this one.
Best for audiobooks, character voices, personalized assistants, or any time you want the output to sound like a specific person. Requires a GPU with about 4GB of VRAM.
Example
Once self-hosted and enabled, it works through the same OpenAI-compatible API:
from openai import OpenAI
client = OpenAI(
api_key="not-needed",
base_url="http://localhost:8000/v1"
)
# HD cloning uses tts-1-hd model
client.audio.speech.create(
model="tts-1-hd",
voice="my_voice",
input="This voice was cloned from a six-second recording. The original Coqui project may be gone, but the raccoons kept it alive."
).stream_to_file("hd-cloned.mp3")
Clone Your Own Voice
# Record 10 seconds of clean audio
ffmpeg -f alsa -i default -ac 1 -ar 22050 -t 10 -y my_voice.wav
# Clean up background noise
ffmpeg -i my_voice.wav \
-af "highpass=f=200, lowpass=f=3000, afftdn=nf=25" \
-ac 1 -ar 22050 my_voice_clean.wav
# Copy to voices directory
cp my_voice_clean.wav ~/uncloseai-speech/voices/me.wav
Then add it to config/voice_to_speaker.yaml:
tts-1-hd:
my_voice:
model: xtts
speaker: voices/me.wav
language: en
Technical Details
- Languages: 16 with automatic detection
- Voice cloning: From 6-second audio samples
- Hardware: GPU required (~4GB VRAM)
- Upstream: XTTS v2 / Coqui TTS (community-maintained)
Self-Hosting
git clone https://git.unturf.com/engineering/unturf/uncloseai-speech.git
cd uncloseai-speech
make deploy
make voices-xtts
Uncomment the tts-1-hd section in voice_to_speaker.default.yaml and restart.