uncloseai.
Open-Source Text-to-Speech
Raccoon Mission: Rescuing Abandoned TTS Models
uncloseai-speech is our community-driven initiative to rescue, preserve, and unify abandoned text-to-speech models into a single, resilient, self-hostable API.
"Why raccoons?" Because like raccoons, we dig through the digital dumpsters of abandoned GitHub repos and archived projects, rescuing valuable open-source TTS models that organizations have left behind. We give them a new home, maintain them, and make them accessible to everyone.
- Zero API Keys: No registration, no tracking, no rate limits on your own infrastructure
- OpenAI-Compatible: Drop-in replacement — change one URL and you're running
- Six TTS Engines: Each with different strengths, all behind the same API
- 42 Built-In Cloned Voices: 41 from LibriSpeech public domain + 1 self-recorded, 10-language support
- Self-Hostable: Docker compose, Makefile-driven, runs on your hardware
- AGPL v3 Licensed: Keeps TTS libre forever
What's Live Right Now
Our public endpoint runs F5-TTS — a flow-matching voice-cloning engine that benchmarks faster and produces cleaner clones than Qwen3-TTS on the same reference audio. 42 distinct human voices: 41 cloned from the LibriSpeech public domain corpus plus the operator's own self-recorded voice (foxhop, voice 42). Speaks 10 languages natively, first audio arrives in under 100 milliseconds.
# Female voice
curl https://speech.ai.unturf.com/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"input": "We rescue abandoned text-to-speech models and give them a new home. No API keys, no tracking, just open source voices for everyone.",
"voice": "aria"
}' > aria.mp3
# Male voice
curl https://speech.ai.unturf.com/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"input": "Like raccoons digging through digital dumpsters, we find the best open source TTS models that big companies left behind, and we make them accessible to everyone.",
"voice": "atlas"
}' > atlas.mp3
# Voice 42 — the operator's own self-recorded voice
# (slot 41 intentionally empty, in honor of Douglas Adams)
curl https://speech.ai.unturf.com/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"input": "This voice is self-recorded. I am number forty-two, and slot forty-one is intentionally empty, in honor of Douglas Adams.",
"voice": "foxhop"
}' > foxhop.mp3
from openai import OpenAI
client = OpenAI(
api_key="not-needed",
base_url="https://speech.ai.unturf.com/v1"
)
client.audio.speech.create(
voice="aria",
input="Raccoon mission. We dig through abandoned repos and rescue the best open source speech models before they disappear."
).stream_to_file("aria.mp3")
client.audio.speech.create(
voice="atlas",
input="Six engines, one API. Self-host it, clone any voice, speak ten languages. No vendor lock-in, no API keys, no limits."
).stream_to_file("atlas.mp3")
Browse Voices & Models
Every voice has a name, a gender, and a personality. The API tells you exactly what's available.
→ See all voices → See all engines
The Six Engines
Six TTS engines, each rescued from a different corner of open source, all running behind the same OpenAI-compatible API. Our public endpoint runs F5-TTS. The other five are ready for anyone who clones the repo.
F5-TTS — Live
Flow-matching voice cloning, 42 voices, 10 languages. Model is F5-TTS_v1 + Vocos vocoder (~1.5GB), MIT-licensed, sourced from SWivid/F5-TTS on Hugging Face. Empirically faster inference and cleaner clones than Qwen3-TTS on the same reference clips. This is what's running on our public endpoint right now — called as "model":"tts-1-f5".
Qwen3-TTS — Self-host
42 cloned voices, 10 languages, voice cloning from 3-second samples. 1.7 billion parameters, 97ms first-packet latency. Still wired up in the repo ("model":"tts-1-qwen"); F5-TTS replaced it as our default after head-to-head benchmarking.
Piper TTS — Self-host
100+ English voices, CPU-only, ONNX runtime. The fastest engine in the dumpster — built for high-volume batch jobs and real-time applications where latency matters most.
XTTS v2 — Self-host
Clone any voice from a 6-second sample across 16 languages. The highest fidelity option, rescued from the Coqui TTS project. Needs a GPU with ~4GB VRAM.
Silero TTS — Self-host
148 voices across 5 languages, all running on CPU. English, Russian, German, Spanish, and French. Clean 48kHz output, no GPU required.
Kokoro TTS — Self-host
82 million parameters, 34 voices. Small enough for a Raspberry Pi or an edge device. Apache 2.0 licensed, 24kHz output.
Self-Hosting
Clone the repo, deploy, and you have your own production TTS API. All six engines are included — enable whichever ones you need.
git clone https://git.unturf.com/engineering/unturf/uncloseai-speech.git
cd uncloseai-speech
# Deploy with GPU (F5-TTS, default)
make deploy
# Or CPU-only (works anywhere, slower)
make deploy-cpu
# Download 42 cloned voice samples (LibriSpeech + foxhop self-recorded)
make voices-f5
# Test it (F5-TTS model ~1.5GB downloads automatically on first use)
make test-f5
# Enable additional engines
make voices-piper # Piper TTS
make voices-xtts # XTTS v2
make voices-silero # Silero TTS
make voices-kokoro # Kokoro TTS
For full documentation, see the uncloseai-speech repository.
Get Involved
We run a free public endpoint at https://speech.ai.unturf.com/v1 but we need help scaling:
- Donate GPU Time: Run an instance, we'll load-balance community traffic
- Host Regional Mirrors: Reduce latency for users in your region
- Integrate New Engines: StyleTTS2, Fish Speech, Chatterbox are on the roadmap
- Add Voice Samples: Expand the voice library with diverse accents
Resources
- Repository: git.unturf.com/engineering/unturf/uncloseai-speech
- API Endpoint:
https://speech.ai.unturf.com/v1 - Voices: speech.ai.unturf.com/v1/voices
- Models: speech.ai.unturf.com/v1/models
Questions & Community
Ask questions, share your deployments, or discuss TTS research below!