Bird Call Audio Classification with a Convolutional Neural Network P2

CCN_resnet50_2

Import needed packages

These block loads the needed packages and sets up the “device” object to move computations to gpu.

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import os
%matplotlib inline
from PIL import Image
In [2]:
import torch
from torchvision import datasets, models, transforms
import torch.nn as nn
from torch.nn import functional as F
import torch.optim as optim
In [3]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

Helper function

This helper function gets bird call labels from the training file folder structure. The files are orgnized into folders by bird call labels.

In [4]:
def find_classes(dir):
    classes = os.listdir(dir)
    classes.sort()
    class_to_idx = {classes[i]:i for i in range(len(classes)) }
    idx_to_class = dict([(value, key) for key, value in class_to_idx.items()]) 
    return classes, class_to_idx, idx_to_class
classes_list, _, labels = find_classes('trn/')

Load data

This loads the image data using the torchvision package. It transforms the image to tensor for training. It then creates a training and validation data loader.

In [5]:
transform = transforms.Compose([transforms.Resize((224,224)),
                               #transforms.Grayscale(),
                               transforms.ToTensor(),
                               transforms.Normalize((0.5),(0.25))
                               ])

data_path = 'trn/'
dataset = datasets.ImageFolder(root=data_path, transform = transform)

training_dataset, validation_dataset = torch.utils.data.random_split(dataset, [(len(dataset)-round(len(dataset)*.1)), round(len(dataset)*.1)], 
                                                                     generator=torch.Generator().manual_seed(42))

training_loader = torch.utils.data.DataLoader(training_dataset, batch_size=300, shuffle=True)
validation_loader = torch.utils.data.DataLoader(validation_dataset, batch_size = 300, shuffle=False)

Tensor to image

The function below converts tensors back to images for inspection. The following block displays 20 sample images and their labels.

In [1]:
def im_convert(tensor):
    image = tensor.cpu().clone().detach().numpy()
    image = image.transpose(1, 2, 0)
    image = image * np.array((0.5, 0.5, 0.5)) + np.array((0.5, 0.5, 0.5))
    image = image.clip(0, 1)
    return image
In [7]:
dataiter = iter(training_loader)
images, label = dataiter.next()
fig = plt.figure(figsize=(25, 4))

for idx in np.arange(20):
  ax = fig.add_subplot(2, 10, idx+1, xticks=[], yticks=[])
  plt.imshow(im_convert(images[idx]))
  ax.set_title([labels[idx]])

Define model

The following block defines the model. It uses a a pretrained resnet 50 model. I modified the parameters of the model to fit my training set and class labels. I used cross entropy loss and an adaptive gradient-based optimization with a learning rate of 0.001.

I did not expect this pretrained model to do very well on my set because it was trained on color images of objects. My set has gray scale images of signal. This took 6.5 hours of wall time with a lower validation accuracy than my custom trained model shown in Part 3 with took 2.25 hours.

In [8]:
model = models.resnet50(pretrained=True).to(device)
    
for param in model.parameters():
    param.requires_grad = False   
    
model.fc = nn.Sequential(
               nn.Linear(2048, 500),
               nn.ReLU(inplace=True),
               nn.Linear(500, 265)).to(device)
In [9]:
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr = 0.001)
In [10]:
%%time 
epochs = 15
running_loss_history = []
running_corrects_history = []
val_running_loss_history = []
val_running_corrects_history = []
batch_size = 300

for e in range(epochs):
    running_loss = 0.0
    running_corrects = 0.0
    val_running_loss = 0.0
    val_running_corrects = 0.0
  
    for inputs, labels in training_loader:
        inputs = inputs.to(device)
        labels = labels.to(device)
        outputs = model(inputs)
        loss = criterion(outputs, labels)
    
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    
        _, preds = torch.max(outputs, 1)
        running_loss += loss.item()
        running_corrects += torch.sum(preds == labels.data)

    else:
        with torch.no_grad():
            for val_inputs, val_labels in validation_loader:
                val_inputs = val_inputs.to(device)
                val_labels = val_labels.to(device)
                val_outputs = model(val_inputs)
                val_loss = criterion(val_outputs, val_labels)
        
                _, val_preds = torch.max(val_outputs, 1)
                val_running_loss += val_loss.item()
                val_running_corrects += torch.sum(val_preds == val_labels.data)
      
    epoch_loss = running_loss / len(training_dataset)
    epoch_acc = running_corrects.float() / len(training_dataset)
    running_loss_history.append(epoch_loss)
    running_corrects_history.append(epoch_acc)
    
    val_epoch_loss = val_running_loss / len(validation_dataset)
    val_epoch_acc = val_running_corrects.float() / len(validation_dataset)
    val_running_loss_history.append(val_epoch_loss)
    val_running_corrects_history.append(val_epoch_acc)
    print('epoch :', (e+1))
    print('training loss: {:.4f}, acc {:.4f} '.format(epoch_loss, epoch_acc.item()))
    print('validation loss: {:.4f}, validation acc {:.4f} '.format(val_epoch_loss, val_epoch_acc.item()))
epoch : 1
training loss: 0.0132, acc 0.1976 
validation loss: 0.0116, validation acc 0.2732 
epoch : 2
training loss: 0.0108, acc 0.3098 
validation loss: 0.0106, validation acc 0.3212 
epoch : 3
training loss: 0.0100, acc 0.3541 
validation loss: 0.0101, validation acc 0.3529 
epoch : 4
training loss: 0.0094, acc 0.3812 
validation loss: 0.0097, validation acc 0.3707 
epoch : 5
training loss: 0.0090, acc 0.4022 
validation loss: 0.0094, validation acc 0.3887 
epoch : 6
training loss: 0.0087, acc 0.4192 
validation loss: 0.0092, validation acc 0.3967 
epoch : 7
training loss: 0.0085, acc 0.4340 
validation loss: 0.0090, validation acc 0.4031 
epoch : 8
training loss: 0.0082, acc 0.4468 
validation loss: 0.0089, validation acc 0.4136 
epoch : 9
training loss: 0.0080, acc 0.4573 
validation loss: 0.0088, validation acc 0.4187 
epoch : 10
training loss: 0.0079, acc 0.4665 
validation loss: 0.0087, validation acc 0.4240 
epoch : 11
training loss: 0.0077, acc 0.4749 
validation loss: 0.0085, validation acc 0.4386 
epoch : 12
training loss: 0.0076, acc 0.4823 
validation loss: 0.0084, validation acc 0.4431 
epoch : 13
training loss: 0.0074, acc 0.4911 
validation loss: 0.0083, validation acc 0.4467 
epoch : 14
training loss: 0.0073, acc 0.4964 
validation loss: 0.0082, validation acc 0.4572 
epoch : 15
training loss: 0.0072, acc 0.5042 
validation loss: 0.0084, validation acc 0.4406 
Wall time: 6h 31min 44s
In [ ]:
torch.save(model.state_dict(), "resnet_50_birdcall")
In [12]:
torch.cuda.empty_cache()

Description