Luganda TTS Audio Samples

This comparison showcases outputs from a fine-tuned VITS model, pre-trained on female Luganda speech from the Common Voice dataset and further fine-tuned on 2.8 hours of professionally recorded studio data from a single female Luganda speaker. The samples below illustrate how different checkpoints perform on the same input sentences.

Sentence	Model 1	Model 2	Model 3

Checkpoints Comparison