Checkpoints Comparison

This comparison showcases outputs from a fine-tuned VITS model, pre-trained on female Luganda speech from the Common Voice dataset and further fine-tuned on 2.8 hours of professionally recorded studio data from a single female Luganda speaker. The samples below illustrate how different checkpoints perform on the same input sentences.

Sentence Model 1 Model 2 Model 3