2 4 Training of Convolutional Neural Networks Model
For the model, I’ve used a very simple pre-trained resNet-34 model. Since wehave two tasks to accomplish here, there are two final layers — the boundingbox regressor and the image classifier.
Prediction on Test Images
Now that we’re done with training, we can pick a random image and test ourmodel on it. Even though we had a fairly small number of training images, weend up getting a pretty decent prediction on our test image.It’ll be a fun exercise to take a real photo using your phone and test out themodel. Another interesting experiment would be to not perform any dataaugmentations and train the model and compare the two models.
Computer-Aided Diagnosis Scheme for Determining Histological Classification
of Breast Lesions on Ultrasonographic Images Using Convolutional NeuralNetworkDepartment of Electronic and Computer Engineering, Ritsumeikan University,Kusatsu, Shiga 525-8577, Japan;pj.ca.iemustir.cf@iehoyrReceived 2018 May 28; Accepted 2018 Jul 23.Licensee MDPI, Basel, Switzerland. This article is an open access articledistributed under the terms and conditions of the Creative Commons Attribution(CC BY) license (http://creativecommons.org/licenses/by/4.0/).This article has beencited byother articles in PMC.
2.3. Architecture of Convolutional Neural Networks Model
shows the architecture of our CNN model which was used in this study. Our CNNmodel was constructed from four convolutional layers, three batch-normalization layers, four pooling layers, and two fully connected layers.Each convolutional layer was followed by a rectified linear unit (ReLU). ROIswith lesions were first resized to 224 × 224 pixels and then given to theinput layer of our CNN model. The first convolutional layer generated 64feature maps with 112 × 112 pixels, using 64 filters with 7 × 7 kernels atstride 2. Once the generated feature maps passed through the normalizationlayer of the first batch, it then passed the first max pooling layer with awindow size of 3 × 3 at stride 2. The second convolutional layer used 192filters with 5 × 5 kernels at stride 2, and generated 192 feature maps with 55× 55 pixels. The generated feature maps also passed through the normalizationlayer of the second batch, then the second max pooling layer with a windowsize of 3 × 3 at stride 2. Subsequently, the third and fourth convolutionallayers with 256 filters of 3 × 3 kernels at stride 2 generated 256 featuremaps, with 27 × 27 pixels and 13 × 13 pixels. The batch normalization layerwas applied only after the third convolutional layer. The max pooling layerswith window sizes of 3 × 3 at stride 2 were employed after both the third andfourth convolutional layers. The generated feature maps from the fourthconvolutional layer was merged at two fully connected layers. Finally, theoutput layer using the softmax function outputted the likelihoods of the fourhistological classifications (invasive carcinoma, noninvasive carcinoma,fibroadenoma, and cyst).Architecture of our convolutional neural networks (CNN) model.
2.4. Training of Convolutional Neural Networks Model
Our CNN model was developed based on the open-source library Keras  onWindows 7 Professional (Intel Core i7-6700k processor with RAM 32 GB) andaccelerated by a graphic processing unit (NVIDIA GeForce 1070 with 8 GB ofmemory).A k-fold cross validation method  with k = 3 was used for the training andtesting of our CNN model. In the validation method, the 566 patients wererandomly divided into three groups so that the number of each histologicalclassification was approximately equal in each group (73 patients for invasivecarcinoma, 24 patients for noninvasive carcinoma, 60 patients forfibroadenoma, and 37 patients for cyst). One group was used as a test dataset.To assess the possibility of an overfitting of parameters in our CNN model,the remaining two groups were divided into a training dataset and validationdataset of a 90%:10% ratio. This process was repeated three times until everygroup had been used as test dataset. In this study, the number of ROIs foreach histological classification in each training dataset was unified to about2000 by using data augmentation. shows the number of training images beforeand after augmentation in each dataset.
2.5. Evaluation of Classification Performance
The classification accuracy of our CNN model was evaluated by using theensemble average from the testing datasets over the 3-fold cross validationmethod. The sensitivity , specificity , positive predictive value(PPV) , and negative predictive value (NPV)  were defined as:Here, TP (true positive) was the number of malignant lesions (invasive andnoninvasive carcinomas) correctly identified as positive, whereas TN (truenegative) was the number of benign lesions (cysts and fibroadenomas) correctlyidentified as negative. FP (false positive) was the number of benign lesionsincorrectly identified as positive, and FN (false negative) was the number ofmalignant lesions incorrectly identified as negative. It is noted that thedenominators in sensitivity and PPV were coincidentally the same (TP + FN = TP+ FP), and the denominators for specificity and NPV (TN + FP = TN + FN) werealso the same.Receiver operating characteristic (ROC) analysis  was used for analysis ofclassification performance. In the ROC analysis, the likelihood of malignancyfor each lesion was determined by adding the output values regardingprobabilities for invasive and noninvasive carcinomas in a computerizedmethod. We also calculated the area under the curve (AUC) value. Thestatistical significance of the difference in the AUC value between twocomputerized methods was tested by using the Dorfman–Berbaum–Metz method .
An important feature of the AlexNet is the use of ReLU(Rectified Linear Unit)Nonlinearity. Tanh or sigmoid activation functions used to be the usual way totrain a neural network model. AlexNet showed that using ReLU nonlinearity,deep CNNs could be trained much faster than using the saturating activationfunctions like tanh or sigmoid. The figure below from the paper shows thatusing ReLUs(solid curve), AlexNet could achieve a 25% training error rate sixtimes faster than an equivalent network using tanh(dotted curve). This wastested on the CIFAR-10 dataset.Lets see why it trains faster with the ReLUs. The ReLU function is given byf(x) = max(0,x)Above are the plots of the two functions – tanh and ReLU. The tanh functionsaturates at very high or very low values of z. At these regions, the slope ofthe function goes very close to zero. This can slow down gradient descent. Onthe other hand the ReLU function’s slope is not close to zero for higherpositive values of z. This helps the optimization to converge faster. Fornegative values of z, the slope is still zero, but most of the neurons in aneural network usually end up having positive values. ReLU wins over thesigmoid function too for the same reason.
Data Augmentation by Random Crops
In addition, cropping the original image randomly will also lead to additionaldata that is just a shifted version of the original data.The authors of AlexNet extracted random crops of size 227×227 from inside the256×256 image boundary to use as the network’s inputs. They increased the sizeof the data by a factor of 2048 using this method.Notice the four randomly cropped images look very similar but they are notexactly the same. This teaches the Neural Network that minor shifting ofpixels does not change the fact that the image is still that of a cat. Withoutdata augmentation, the authors would not have been able to use such a largenetwork because it would have suffered from substantial overfitting.
Applying Deep Learning to Logo Detection
Our client had recently set up an internal Innovation Team to champion theadoption of new technologies and spark an innovation culture within theconglomerate. They came to us with a core problem — monitoring the visibilityof the companyâs 350 brands across multiple marketing and sales channels. Akey metric they track for each brand is their share of shelf space in retailstores, which today is done by sending a small army of people to physically goto stores and count items on shelves.Why not create a more efficient process with AI? Computer vision models areeverywhere! Theyâre embedded in your phone, your doorbell, even yourmarketing materials. Together, we worked to train one of these models torecognize one of their brand logos.THE SOLUTIONAI-powered mobile app allows the brand to quickly and easily assess the brandsvisibility in thousands of product photos.
Gathering Training Data
We started out with an initial training data set of only 732 images of theproduct with logo. Through a series of data augmentation techniques, in whichwe cropped every image that had the product with logo and performed sometransformations like horizontal flip, vertical flip, decolorization, edgeenhancement, and blurring, we managed to create 10,000 examples from theoriginal 732 images. To add a bunch of true negatives, we captured severaltens of thousands of images of similar products of different brands.
Method 2: YOLOV3 for Object Detection
We also ran a Single Shot Detection (SSD) model using the YOLOV3 (shorthandfor âYou Only Look Onceâ — who said data scientists donât have a senseof humor?) framework with pre-trained weights from the Darknet53 architecture.This model doesnât just check if the image contains the product, but canalso locate the product logoâs precise position within the photo.The model is able to detect multiple class objects within the photo andidentify their location using bounding boxes.
Resultsâ¦ to classify or to recognize?
We started with image classification models, assuming that the logo would bethe dominant part of the images that were going to be analyzed. TensorFlow Hubhas pre-trained checkpoints which we used for transfer learning with ourtraining data. Initial training yielded good accuracy but was very biasedtowards the data we used. Our training data consisted mostly of close-up,posed shots of the product with the logo, so our model learned to detect thosescenarios only. The model performed poorly when we tested it with real-worldphotos, with different and awkward angles or lighting.F1 SCORE: 0.615We tried to tweak our training data and parameters but soon realized that theproblem required a different approach.The second method we tried, Object Detection, requires training data withbounding boxes labeled manually. The tedious prep work was part of the reasonwhy we chose to try Image Classification before this method. We bit the bulletand labeled a couple hundred photos using an open source tool that couldoutput to the format we needed. We trained the model using a GPU and manuallytested the checkpoints. The model performed really well even with the limitedset of labeled training images we gave it.F1 SCORE: 0.875