I also wanted to try AI upscaling and now this...
Take a look at my
AI Upscaling Tutorial. I have in there somewhere some instructions on getting setup in a Windows 10 environment.
it often becomes a matter of finding the standard practices of getting an environment like this setup.
Unfortunately, that's part of the problem. There is no standard that I'm aware of. I haven't actually used Docker as you mentioned, but I think that would be the closest to standardization as there is. The only reason I haven't tried it out is because at one point it required Window 10 Enterprise edition, and I only have Home edition. I just checked the Docker Hub website and I don't see anything about that requirement now. I may try giving it another shot sometime. I also haven't really ever used Anaconda, even though I have it installed...
From my understanding, Docker and Anaconda help to simplify a lot of the ML environment setup, but I've been lucky in getting some of the GitHub AI/ML projects running on my PC without them. All this is also supposed to be easier in Linux, and maybe it is, but I tried with Ubuntu and didn't have much success, so I switched back to Win10.
So anyway, my instructions assume that you are using Windows 10 and have an mid-to-high tier Nvidia graphics card (I have a GeForce GTX1060 6GB and it works just fine).
I pretty much just followed the instructions in the
Quick Start section of the Project page, but I'll add some helpful hints.
You can also follow my ESRGAN installation instructions
here for more information. It will be a lot of the same steps, but some variables may be different.
Hint: You may want to look ahead at some of the downloads and get them started now. Some of them are pretty big (> 1GB). Cuda is 2.6GB, the pretrained models are about 350 MB, and the LibriSpeed dataset is really big too but I don't remember how big it is exactly although it is optional.
Step 1: Install Cuda 10 (This is omitted from the GitHub project page for some reason. Cuda is what allows the GPU to do all the computationally expensive machine learning "tensor" equations on the GPU which is MUCH faster than doing it on the CPU)
Install Cuda 10 for Windows 10
here.
The installer is a whopping 2.6GB for the Cuda toolkit. Just use the default settings and it should work okay. It will insist on installing video card drivers along with it. It's supposed to be loaded with the latest drivers, so it shouldn't harm anything.
Step 2: Install Python 3.7 for Windows (Python is the language that shoe-strings all the ML code together)
Here is the link to download
Python 3.7.6 (Windows Executable).
HINT: When installing, one of the first prompts will have a toggle box somewhere to "add python to system path" or "add python to environment variables".
Make sure it is selected!
The above image is from my ESRGAN installation tutorial using Python 3.6, so it
may look a little different from 3.7 but the concept is the same.
By default it should also install "PIP" along with it which allows you to install lost of other packages using the command prompt (terminal) very easily.
Step 3: Install PyTorch (A machine learning library)
Open a command prompt (link
here for help) and paste the following command:
then press enter to run the command. If successful, it should say that it's downloading and installing.
This is the version of PyTorch for Windows using Cuda 10, installed via PIP for Python.
Step 4: Clone the project GitHub project.
You can just download the ZIP and extract the whole project to somewhere on your computer.
Step 5: Install other package requirements (There's a bunch of other little programs that this GitHub project relies on. Instead of installing them one-by-one manually, you can just load up a txt file that lists all the packages it needs and installs them all for you in a batch.)
Using your command prompt (again, they call this a "terminal" for some reason) navigate to where you extracted the GitHub project.
For example, if you extracted the project to a directory on your hard drive "C:\ML Projects\Real-Time-Voice-Cloning, then enter the following commands:
C: (then press enter)
cd "C:\ML Projects\Real-Time-Voice-Cloning"
The "cd" is a change directory command.
If successful, it should put you in the directory that you extracted the project to.
Now when you run python commands, it will be in context to this project directory.
So now you can use the command:
pip install -r requirements.txt
If successful, it should install a bunch of packages.
I had it tell me that a package didn't install, but it all still worked anyhow... I got lucky.
Step 6: Download the pretrained models (the models are the trained "weights" between neurons that give the ML algorithm that "thinky" behavior)
Download link
here from either Google Drive or MEGA. The MEGA link is probably faster depending on where you live.
Extract it into your project folder. It has the same path structure, so the models should just fit into the right directory in there.
Step 7: (Optional) Download the LibriSpeech dataset. You can load these voices into the toolbox instead of recording your own voice.
Download
here.
Extract the contents of the compressed file anywhere in your project folder, just take note of what folder you extracted it to... that will be the <datasets_root> folder that you need to use when you launch the toolbox in order to get them loaded into the program.
Step 8: Test your configuration (the last stretch!)
In terminal, type the following command (assuming that you're still navigated in your project root folder):
python demo_cli.py
If successful, then you're good to go!
Step 9: Run the toolbox and have some fun.
In terminal, run the following command:
python demo_toolbox.py
or if you have the LibriSpeech dataset downloaded, use:
python demo_toolbox.py -d <datasets_root>
where <datasets_root> is the directory where you extracted it.
Hope that all helps.