uncloseai.
Open-Source Text-to-Speech Infrastructure
ðĶ Raccoon Mission: Rescuing Abandoned TTS Models
uncloseai-speech is our community-driven initiative to rescue, preserve, and unify abandoned text-to-speech models into a single, resilient, self-hostable API.
"Why raccoons?" Because like raccoons, we dig through the digital dumpsters of abandoned GitHub repos and archived projects, rescuing valuable open-source TTS models that orgs have left behind. We give them a new home, maintain them, and make them accessible to everyone.
Why This Matters
Proprietary TTS APIs lock you into vendor ecosystems, charge per character, and can vanish overnight. We're building something different:
- Zero API Keys: No registration, no tracking, no rate limits on your own infrastructure
- OpenAI-Compatible: Drop-in replacement - change one URL and you're running
- Four TTS Engines: Piper (fast), XTTS (quality), Silero (CPU-friendly), Kokoro (lightweight)
- 227+ Voices: Including multilingual support across 5 languages
- Self-Hostable: Docker compose, Makefile-driven, runs on your hardware
- AGPL v3 Licensed: Keeps TTS libre forever - even network service users get source code
Try It Now (Free API)
Test our community-hosted endpoint before self-hosting:
# Auto-detect model from voice name
curl https://speech.ai.unturf.com/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"input": "Welcome to uncloseai-speech. Four TTS engines, zero API keys.",
"voice": "alloy"
}' > output.mp3
# Explicitly specify model (tts-1, tts-1-hd, tts-1-silero, tts-1-kokoro)
curl https://speech.ai.unturf.com/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "tts-1-hd",
"input": "High quality XTTS voice cloning.",
"voice": "alloy"
}' > output-hd.mp3
Auto-Detection Magic
Our voice routing is smart - no need to specify the model:
from openai import OpenAI
client = OpenAI(
api_key="not-needed", # Seriously, any string works
base_url="https://speech.ai.unturf.com/v1"
)
# Auto-detects the right engine for each voice
client.audio.speech.create(
voice="alloy", # â Piper (fast CPU)
input="Fast synthesis on CPU"
).stream_to_file("piper.mp3")
client.audio.speech.create(
voice="en_50", # â Silero (native voice name)
input="CPU-friendly multilingual"
).stream_to_file("silero.mp3")
client.audio.speech.create(
voice="af_heart", # â Kokoro (lightweight)
input="Decoder-only architecture"
).stream_to_file("kokoro.mp3")
Discover Voices
Extended /v1/voices endpoint shows all 227 voices with engine metadata:
curl https://speech.ai.unturf.com/v1/voices | jq .
Discover Models
Standard OpenAI /v1/models endpoint shows available TTS models:
curl https://speech.ai.unturf.com/v1/models | jq .
Self-Hosting
For complete installation instructions and documentation, see the uncloseai-speech repository.
Quick Start
# Clone the repository
git clone https://git.unturf.com/engineering/unturf/uncloseai-speech.git
cd uncloseai-speech
# See all available commands
make help
# Deploy locally with Docker
make local-deploy
# Download voice models (Piper + XTTS samples)
make voices
# Test it
make test
# Watch logs
make logs
That's it. You now have a production TTS API running locally.
Remote Deployment (Production)
Deploy to a GPU server for XTTS voice cloning:
# Configure deployment target
cp vars.sh.example vars.sh
# Edit vars.sh with your server details
# Deploy to remote server (syncs, builds, restarts)
make deploy
# Download all voice models
make voices
# Test remote endpoint
make test
The Four Engines
ð Piper TTS (tts-1)
- Speed: Real-time on CPU (fastest)
- Voices: 100+ English voices via LibriTTS
- Use Case: High-volume, low-latency synthesis
- Tech: ONNX runtime, 22.05kHz output
ðïļ XTTS v2 (tts-1-hd)
- Quality: Voice cloning from 6-second samples
- Languages: 16 languages with auto-detection
- Use Case: Custom voices, audiobooks, high-quality synthesis
- Tech: Coqui TTS, requires ~4GB GPU VRAM
⥠Silero TTS (tts-1-silero)
- Speed: Real-time CPU inference
- Voices: 148 voices across 5 languages (English, Russian, German, Spanish, French)
- Use Case: CPU-only servers, multilingual applications
- Tech: PyTorch, 48kHz output, actively maintained
ðŠķ Kokoro TTS (tts-1-kokoro)
- Architecture: Lightweight decoder-only (82M params)
- Voices: 34 intentionally OpenAI-themed voices (American/British English)
- Use Case: Edge devices, low-resource environments
- Tech: Apache 2.0 license, 24kHz output
Get Involved
We run a free public endpoint at https://speech.ai.unturf.com/v1 but we need help scaling:
Contribute Infrastructure
- Donate GPU Time: Run an instance, we'll load-balance community traffic
- Host Regional Mirrors: Reduce latency for users in your region
- Sponsor Server Costs: Help keep the public endpoint free for everyone
Contribute Code
- Integrate New Engines: StyleTTS2, Fish Speech, Chatterbox are on the roadmap
- Add Voice Samples: Expand XTTS voice library with diverse accents
- Improve Documentation: Write tutorials, create video walkthroughs
- Build Tools: Voice editor UI, quality benchmarking, automated testing
Resources
- Repository: git.unturf.com/engineering/unturf/uncloseai-speech
- API Endpoint:
https://speech.ai.unturf.com/v1 - Documentation: See
docs/directory in repository - Community: Comments section below, or open an issue on GitLab
Complete Example: Voice Cloning
Let's clone your voice and use it via the API:
Step 1: Record Your Voice Sample
# Record 10 seconds of clean audio (22050 Hz, mono)
ffmpeg -f alsa -i default -ac 1 -ar 22050 -t 10 -y my_voice.wav
# Clean up background noise
ffmpeg -i my_voice.wav \
-af "highpass=f=200, lowpass=f=3000, afftdn=nf=25" \
-ac 1 -ar 22050 my_voice_clean.wav
Step 2: Add to Voice Configuration
# Copy sample to voices directory
cp my_voice_clean.wav ~/uncloseai-speech/voices/me.wav
# Add to config/voice_to_speaker.yaml
cat >> config/voice_to_speaker.yaml << 'EOF'
tts-1-hd:
my_voice:
model: xtts
speaker: voices/me.wav
language: en
EOF
# Restart server
make deploy
Step 3: Use Your Voice
from openai import OpenAI
client = OpenAI(
api_key="not-needed",
base_url="https://speech.ai.unturf.com/v1"
)
# Generate speech with YOUR voice
client.audio.speech.create(
model="tts-1-hd",
voice="my_voice",
input="This is my cloned voice speaking!"
).stream_to_file("cloned_output.mp3")
That's the power of open source TTS. No API keys, no usage limits, your voice, your infrastructure.
Questions & Community
Ask questions, share your deployments, or discuss TTS research below!