Conda installation and training test using conda

Questo forum è dedicato agli utenti del sistema installato all'istituto ISASI del CNR costituito da:
- un sistema Nvidia DGX-A100 server
- una unità blade Dell MX-7000 con un server MX-750c
- una unità di memoria SAN a corredo dei nodi elaborazione di circa 200 TB
Post Reply
mdelcoco
Posts: 2
Joined: 15 Feb 2023, 10:16

Conda installation and training test using conda

Post by mdelcoco »

Install conda in your home

Download the conda release of interest (here the miniconda for python 3.10 has been chosen)

Code: Select all

wget https://repo.anaconda.com/miniconda/Miniconda3-py310_23.1.0-1-Linux-x86_64.sh
The use of Miniconda in preferable in order to preserve your disk space. Miniconda comes with a restricted /essential set of python packages. You can install whatever you want as you need it

# run the following command.

Code: Select all

bash Miniconda3-py310_23.1.0-1-Linux-x86_64.sh
Now simply follow the shell instructions. More precisely

Accept the licence

Code: Select all

Do you accept the license terms? [yes|no]
[no] >>> yes
Press enter in order to keep the default setting (conda will be installed in your home folder)

Code: Select all

Miniconda3 will now be installed into this location:
/home/mdelcoco/miniconda3

  - Press ENTER to confirm the location
  - Press CTRL-C to abort the installation
  - Or specify a different location below
  
Accept the following option

Code: Select all

Do you wish the installer to initialize Miniconda3
by running conda init? [yes|no]
[no] >>> yes

When the installation will be completed, log out and then log in into your dxg account.
The base conda environment (base) should be ready and the conda command should be available

Create a Conda Environment

In order to create a new environment named testenv and providing python 3.9 type:

Code: Select all

conda create -n testenv python=3.9
Than run the command to activate the environment

Code: Select all

conda activate testenv
In order to deactivate the running envirionment

Code: Select all

conda deactivate

Code: Select all

Simple training test with PyTorch
1) create and activate a conda environment
2) run:

Code: Select all

conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia

3) copy the following code in a file named train.py

Code: Select all

import torch
import torchvision
import torchvision.transforms as transforms

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

batch_size = 4

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
           
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1) # flatten all dimensions except batch
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

net = Net()
net.to(device)

import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data[0].to(device), data[1].to(device)

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
            running_loss = 0.0

print('Finished Training')
4) In the same folder run

Code: Select all

python train.py
Post Reply