Page 1 of 2

Voiced Greetings

Posted: Tue Feb 18, 2020 12:51 pm
by jefetienne
Different greetings are randomly played if the player is near them (like the later TES games) based on gender, race, reputation, and class. Support for static NPCs can also be attained in the future, the only problem is that they don't provide data on certain attributes (such as race).

I have a rudimentary implementation on Github that uses .wav files, but doesn't factor in anything before selecting a greeting:

https://github.com/jefetienne/daggerfal ... be0d0d3b02

There can be a number of different voice actors who say a number of different phrases. It kind of comes out to the hundreds if we were serious about it, but if we could get it to the same greeting parity of later TES games, that would be amazing and the game feel much more like the later ones. However, for the sake of convenience, we would like to find a cross-platform C#/.NET-compatible library for text-to-speech (which we still need to find).

I might not get to working on this mod in the immediate future, but you guys are more than welcome to brainstorm on this post or fork my code and expand it even further :)

Re: Voiced Greetings

Posted: Tue Feb 18, 2020 2:00 pm
by MasonFace
However, for the sake of convenience, we would like to find a cross-platform C#/.NET-compatible library for text-to-speech (which we still need to find).
Maybe you've already come across this, but take a look at this project: Real-Time Voice Cloning

Like most other AI applications, it is written in Python, and is pretty computationally heavy, but the demo shows very good results.

They also have a commercial version that is likely much more advanced here: Resemble.AI, but it does look pretty expensive.

The goal with this would be to synthesize many unique greetings in many different voices, then save the output .wav file and use it traditionally, but I think it would also be really neat to see it used to vocalize any arbitrary written dialog in real-time.

Re: Voiced Greetings

Posted: Tue Feb 18, 2020 4:19 pm
by jefetienne
This looks incredible. Thanks for sharing.

If we do real-time voicing, I might look into using IronPython (a Python Implementation for .NET) to use the software under the Unity library.

Re: Voiced Greetings

Posted: Tue Feb 18, 2020 5:01 pm
by MasonFace
I should add that it runs off of CUDA, so it requires an Nvidia GPU. I'm not sure but maybe the commercial version can run on CPU instead?

Also, there is a place to sign up for beta access for the commercial version on that Resemble.AI website for games. I signed up for it, hoping to use it for a side project of mine.

Re: Voiced Greetings

Posted: Sat Feb 22, 2020 8:21 am
by communityus
I will try to play around with this - this month or early March. Great links!

Re: Voiced Greetings

Posted: Sun Feb 23, 2020 3:01 am
by MasonFace
I was able to get it to run this morning and started tinkering with it this afternoon.

To test it, I used the intro monologue:
Four hundred years after Tiber Septim's reign, the beginning will meet the end, and the bloody circle will close in the Empire of Tamriel. The unworthy heirs of the Septim Dynasty have allowed the bonds of the Empire to weaken and crack. Uriel Septim the Seventh cannot repair what his ancestors ignored. The provinces fight among themselves like neglected children, drunk with rebellion, and one indomitable power hides itself, but not forever.
You can hear the results here.

I recorded seven utterances using six voices from the LibriSpeech dataset provided on the GitHub project page. The seventh utterance is a clone of my voice. I can tell that it sounds very similar to my voice, but it's not quite close enough to give me that cringe I usually get when I hear my own voice played back. You know what I mean.

All-in-all, I'm pretty impressed with how human the voices sound, even though their speech patterns are still pretty robotic. It is by far the best text-to-speech synthesis I've ever used, but it definitely doesn't compete with a human performance.

There are some odd quirks about it that I found interesting. For the first five utterances, I kept the text identical. They each put strange pauses in different places and would pronounce "Uriel" differently. The last two utterances, I changed the spelling to "Yurial" instead of "Uriel" and they sounded closer to correct on the pronunciation. Some of them pronounced "Tamriel" with an "s" at the end for some reason... not sure why that happened.

Anyhow, I think this program is really neat and seems practical enough for synthesizing all the text of Daggerfall into audio recordings.

Let me know if you need help getting setup. It's a typical Python/Conda/Pytorch/Cuda/Tensorflow setup that's common with machine learning projects on GitHub... which means its a headache when any one of these volatile platforms updates its code and deprecates some method that one of the others depends on....

Or if you want me to try to generate some of the dialog, just send me a text file to synthesize.

Re: Voiced Greetings

Posted: Mon Feb 24, 2020 2:23 am
by Azteca
Wow, I am impressed. Thanks for trying it, MasonFace! They are very fast. Could you try them maybe 20 or 30% slower?
The line break thing is weird. I hope we can figure out why it’s there.

Sample from the Madness of Pelagius from UESP:
It was said that when the Argonian ambassador from Blackrose came to court, Pelagius insisted on speaking in all grunts and squeaks, as that was the Argonian's natural language.
It is known that Pelagius was obsessed with cleanliness, and many guests reported waking to the noise of an early-morning scrubdown of the Imperial Palace. The legend of Pelagius while inspecting the servants' work, suddenly defecating on the floor to give them something to do, is probably apocryphal.
When Pelagius began actually biting and attacking visitors to the Imperial Palace, it was decided to send him to a private asylum. Katariah was proclaimed regent two years after Pelagius took the throne. For the next six years, the Emperor stayed in a series of institutions and asylums
If you need more material: http://www.uesp.net/wiki/Daggerfall:Books

Re: Voiced Greetings

Posted: Mon Feb 24, 2020 4:18 pm
by MasonFace
They are very fast. Could you try them maybe 20 or 30% slower?
Yeah, I noticed that too. Unfortunately, the program doesn't allow you to control anything about the cadence of the speech. It even ignores punctuation, so commas and periods don't cause it to pause naturally. The only pause it recognizes is a line break, and it doesn't seem very consistent in how long it decides to pause.

I could slow the speed down afterwords in Audacity and add some minor pauses to make it sound more natural, but as I've found with the AI upscaling of the graphics, making manual touch-ups gets impractical as the number of files grows. :cry:
But, it certainly isn't impossible for a small team of dedicated people to divide up the work to process a couple hundred raw files.

I may try that out on the passage you provided and see how it comes out.

Re: Voiced Greetings

Posted: Tue Feb 25, 2020 3:09 am
by MasonFace
You can hear the above excerpt from the Madness of Pelagius here.

The first utterance is unedited. The second one is slowed down 20% and the levels are tweaked slightly. I lengthened pauses in a few places to make it sound just a bit more natural. I think I may have overshot on the slow down - the speaker sounds a little drunk. Maybe I should have stuck to about 10-15% speed reduction instead of 20%.

Re: Voiced Greetings

Posted: Tue Feb 25, 2020 5:38 pm
by communityus
MasonFace wrote: Tue Feb 25, 2020 3:09 am You can hear the above excerpt from the Madness of Pelagius here.

The first utterance is unedited. The second one is slowed down 20% and the levels are tweaked slightly. I lengthened pauses in a few places to make it sound just a bit more natural. I think I may have overshot on the slow down - the speaker sounds a little drunk. Maybe I should have stuck to about 10-15% speed reduction instead of 20%.
Could be passable. For a first pass certainly then touch ups. Esp. when the background of DF is such that one can kinda get away with the non AAA quality and then continue to improve upon the base. Rather than starting AAA then only one way to go - down lol. Better to be perceived improving.