Coding

Real-World Python Neural Nets Tutorial (Image Classification w/ CNN) | Tensorflow & Keras

  • 00:00:00 hey how's it going everyone and welcome
  • 00:00:01 back to another video in this video
  • 00:00:04 we're gonna walk through a real-world
  • 00:00:05 example of using neural nets and so in
  • 00:00:07 it we're going to classify images of
  • 00:00:09 rock paper and scissors using a
  • 00:00:12 convolutional neural net written in
  • 00:00:13 tensorflow and Cara's this is kind of a
  • 00:00:16 follow-up to my previous video on neural
  • 00:00:17 nets so if you haven't watched that I'd
  • 00:00:19 definitely recommend checking it out and
  • 00:00:20 we walk through a lot of the kind of
  • 00:00:22 background information on how neural
  • 00:00:23 nets work and writing some simple ones
  • 00:00:26 from scratch this video also can work as
  • 00:00:28 a standalone video though so don't feel
  • 00:00:30 like you have to watch that one but we
  • 00:00:32 might move through some of the things a
  • 00:00:33 little bit quickly alright to get
  • 00:00:35 started I recommend opening up a Google
  • 00:00:37 collab file so if you go to collab
  • 00:00:39 research Google com
  • 00:00:41 you can open up a Jupiter notebook there
  • 00:00:43 and it's kind of nice that we can do
  • 00:00:45 this in browser because we can make sure
  • 00:00:47 that we all are running the same kind of
  • 00:00:49 system all right the first thing we're
  • 00:00:51 going to do is install some libraries
  • 00:00:53 that we need so let's go ahead and do in
  • 00:00:58 our Google cloud file do a pip install
  • 00:01:02 of make this slightly bigger tensorflow
  • 00:01:06 and tensorflow datasets and I'm gonna
  • 00:01:13 run this with the dash Q so I don't see
  • 00:01:15 as many logs as we normally would to
  • 00:01:17 kind of keep this notebook a little bit
  • 00:01:18 cleaner and the next thing we're going
  • 00:01:21 to do while that runs is we're going to
  • 00:01:23 now import the libraries that we're
  • 00:01:26 going to need in this tutorial so
  • 00:01:28 important necessary libraries and so
  • 00:01:32 we're gonna be using tensorflow and
  • 00:01:34 caris specifically but we'll also be
  • 00:01:36 using some other helper libraries like
  • 00:01:38 numpy and matplotlib so let's import all
  • 00:01:40 those so import matplotlib lib pipe lot
  • 00:01:47 as PLT import numpy as NP then we'll
  • 00:01:54 import tensorflow as TF and we'll import
  • 00:02:00 tensorflow datasets
  • 00:02:06 as tensorflow das TFTs and then finally
  • 00:02:12 with tensorflow – one thing will be
  • 00:02:14 helpful is just to specifically import
  • 00:02:19 the Charis library so from tensorflow
  • 00:02:22 import Kerris the next thing we're going
  • 00:02:25 to do is i'm going to show how we can
  • 00:02:29 find datasets for whatever sort of
  • 00:02:32 neural net project we want to do or
  • 00:02:34 maybe any sort of general kind of
  • 00:02:35 machine learning project and I think
  • 00:02:44 there's a couple I mean there's all
  • 00:02:46 sorts of place you can look for good
  • 00:02:47 data but one thing that I find very
  • 00:02:49 convenient is that tensorflow has a
  • 00:02:52 bunch of datasets built into their
  • 00:02:55 library so if you go to either of these
  • 00:02:57 two locations you'll be able to find
  • 00:02:59 some datasets that you can use but we
  • 00:03:04 can also see these datasets very easily
  • 00:03:07 with PFDs dot list builders we run that
  • 00:03:12 method you'll see all of these different
  • 00:03:15 lists here and so when I was prepping
  • 00:03:20 for this tutorial I ultimately did this
  • 00:03:22 and I kind of looked through what they
  • 00:03:24 offered by default here and I kind of
  • 00:03:28 kept scrolling
  • 00:03:28 the thing I didn't want to do here is I
  • 00:03:31 feel like everyone does tutorials on em
  • 00:03:32 nest and the fashion M nest so I wanted
  • 00:03:36 to do something a little bit different
  • 00:03:37 than the M nest
  • 00:03:38 which is classifying images of digits
  • 00:03:41 with a CNN or with yeah and Narol that
  • 00:03:45 so didn't want to do that
  • 00:03:47 so I kept looking and ultimately I found
  • 00:03:49 this dataset call it a
  • 00:03:52 rock-paper-scissors and I thought that
  • 00:03:54 sounded pretty cool let's see what it's
  • 00:03:55 about and so let's load that data set in
  • 00:04:01 so the first thing we can do is we can
  • 00:04:05 get some information on the data
  • 00:04:13 and to do this we can use the the tensor
  • 00:04:17 flow data sets builder to load in some
  • 00:04:19 information about a specific data set so
  • 00:04:22 we're going to do builder equals T FD s
  • 00:04:25 dot builder and then we need to take
  • 00:04:29 that example that we saw up here so that
  • 00:04:32 was rock-paper-scissors with underscore
  • 00:04:33 so I'm gonna just copy that real quick
  • 00:04:35 paste it in and then this is nice
  • 00:04:38 convenient method on builder which is
  • 00:04:41 builder dot info it will load in all
  • 00:04:44 sorts of information about the data set
  • 00:04:46 and then if we just go ahead and print
  • 00:04:48 that out we'll see some information
  • 00:04:53 about the data set we're working with so
  • 00:04:56 rock-paper-scissors images of hands
  • 00:04:59 playing rock paper scissor game that
  • 00:05:01 sounds pretty cool and it's sounds very
  • 00:05:03 like appropriate for a neural net
  • 00:05:05 tutorial so what do we have here well we
  • 00:05:09 have images that have 300 dimensions
  • 00:05:12 height of 300 with the 303 color
  • 00:05:15 channels so that means it's an RGB image
  • 00:05:18 we can learn the data type here which is
  • 00:05:21 a you ain't 8 so what that ultimately is
  • 00:05:24 telling me is that these are numbers
  • 00:05:27 each of these images are numbers from 0
  • 00:05:30 to 255 and then we also have the class
  • 00:05:33 label which is just a integer that's
  • 00:05:37 basically representing
  • 00:05:39 rock-paper-scissors then finally another
  • 00:05:41 thing that's helpful here is that gives
  • 00:05:43 us some information about the splits so
  • 00:05:45 we have 2520 images that we can use for
  • 00:05:49 training and then 372 that we'll use for
  • 00:05:53 our test set and our validation and
  • 00:05:55 there's also a URL to learn more about
  • 00:05:58 this data set so that's useful we can
  • 00:06:01 definitely utilize this but let's
  • 00:06:03 continue onward and start prepping the
  • 00:06:08 data so
  • 00:06:18 okay and to do this this step is kind of
  • 00:06:22 specific because we're using the tensor
  • 00:06:24 flow data sets library to do this really
  • 00:06:27 the same process is done with wherever
  • 00:06:29 you're loading your data set from but
  • 00:06:31 specifically because we're working with
  • 00:06:32 tensor flow data sets the way we're
  • 00:06:34 going to do this is tensorflow
  • 00:06:35 datasets dot load and now the name is
  • 00:06:40 going to be equal to rock paper scissors
  • 00:06:43 that same file that we saw before and
  • 00:06:47 the other thing we're going to access
  • 00:06:49 here is and if I actually look at this
  • 00:06:52 class method you'll see some information
  • 00:06:54 I can get split is the next key word and
  • 00:06:57 that's what we're going to grab here we
  • 00:06:59 want to separate our training data and
  • 00:07:01 our test data so we're gonna say split
  • 00:07:02 equals train here and then for the other
  • 00:07:10 one for our test data we will do the
  • 00:07:13 same thing and but this time our split
  • 00:07:22 will be equal to test and that matches
  • 00:07:27 up with what we found when we looked up
  • 00:07:29 the information on this data set so we
  • 00:07:31 can go ahead and run that and one thing
  • 00:07:36 I will note is that there's an
  • 00:07:38 additional option we can add to one of
  • 00:07:40 our previous cells if you don't want all
  • 00:07:43 this information to pop up and kind of
  • 00:07:45 mess up our the neatness of our data set
  • 00:07:48 and we can go ahead and pass in we when
  • 00:07:53 we imported the library if we passed in
  • 00:08:00 disabled progress bar what we'll see
  • 00:08:04 when we run that again and I can show
  • 00:08:08 you that TFTs disabled progress bar and
  • 00:08:12 when we run this command again instead
  • 00:08:15 of seeing all this these bars and all
  • 00:08:17 this extra information they'll just run
  • 00:08:19 neatly okay so now we've loaded in our
  • 00:08:23 data I think the next thing that's
  • 00:08:25 gonna be helpful for us to do is
  • 00:08:27 actually see some examples of what we're
  • 00:08:29 working with because it's really hard to
  • 00:08:31 start writing a neural net unless you've
  • 00:08:33 actually seen some images so let's go
  • 00:08:36 ahead and and show some examples and so
  • 00:08:42 with the tensorflow datasets library we
  • 00:08:47 can do this as follows we can do tensor
  • 00:08:51 flow datasets thought show examples and
  • 00:08:55 this is going to be using two things
  • 00:08:57 you're going to have the dataset info as
  • 00:08:59 well as the specific data set so the
  • 00:09:04 info we loaded up here with our builder
  • 00:09:07 that's the info and then we can either
  • 00:09:09 use DS train or d s test use as the
  • 00:09:12 actual data set that method so we have
  • 00:09:14 info and then we have yes train and
  • 00:09:18 really again like we're doing in a
  • 00:09:21 specific way here with tension flow
  • 00:09:24 dataset but often times the same process
  • 00:09:26 no matter how you're working with your
  • 00:09:28 data it's good to kind of follow and
  • 00:09:29 just get a feel for your data before you
  • 00:09:31 start trying to build a model around it
  • 00:09:33 okay so this is what we got you see that
  • 00:09:35 their hands doing rock paper and
  • 00:09:38 scissors and one thing you might notice
  • 00:09:40 about these hands is that they're a
  • 00:09:42 little bit some of them are a little bit
  • 00:09:43 funky looking like this one right here
  • 00:09:45 I'm looking at and that is I remember
  • 00:09:48 looking into the data set a little bit I
  • 00:09:49 believe these are all artificially
  • 00:09:51 generated images of rock paper and
  • 00:09:52 scissors so they're not actual images as
  • 00:09:58 far as I know but they work the same
  • 00:10:00 when we're kind of going ahead and
  • 00:10:03 training our network we can kind of
  • 00:10:06 utilize them as we'd use a a real image
  • 00:10:08 but here yeah here's what we're working
  • 00:10:10 with and they have the labels 0 for rock
  • 00:10:12 papers along scissors is a – all right
  • 00:10:16 we have the examples now let's do a
  • 00:10:20 little bit of additional data prep
  • 00:10:27 if this step this is really kind of for
  • 00:10:30 any sort of image classification task
  • 00:10:32 I'd say this is a pretty similar type of
  • 00:10:35 processing I would probably do but we're
  • 00:10:37 basically converting the tensorflow
  • 00:10:40 datasets format into a numpy format
  • 00:10:43 that's a little bit more Universal a
  • 00:10:45 little bit more easy to work with in my
  • 00:10:47 opinion so that's what we're gonna do in
  • 00:10:49 this next step I thought the easiest way
  • 00:10:51 to figure out how to do this was to
  • 00:10:53 first look at the documentation for the
  • 00:10:56 data sets library within tensorflow so
  • 00:11:00 into tensorflow org slash data sets
  • 00:11:03 slash overview and in there it told me
  • 00:11:06 how to iterate over a data set and
  • 00:11:08 that's what we have and the key insight
  • 00:11:10 I saw here was that in the data set
  • 00:11:13 there's images and labels so those are
  • 00:11:18 the two items that we really want when
  • 00:11:19 we're breaking up our data so if we
  • 00:11:22 iterate over the examples in our data
  • 00:11:24 set we can grab those things and to make
  • 00:11:26 it a little bit more concise let's do
  • 00:11:28 that in the form of a list comprehension
  • 00:11:31 so we're going to say our train images
  • 00:11:36 are going to be equal to example and
  • 00:11:41 then we're going to use get the image
  • 00:11:43 for example in the data set and that's
  • 00:11:49 basically all we have to do I guess the
  • 00:11:52 one we could run this real quick and see
  • 00:11:55 what happens
  • 00:11:55 I don't know Oh data set train sorry
  • 00:12:03 this might take a second to run
  • 00:12:10 okay so let's just look at the first
  • 00:12:13 image in there and let's just get the
  • 00:12:17 type of that so this is a tensor flow
  • 00:12:24 tensor so what I want to do instead is
  • 00:12:28 grab it as a numpy array and so this dot
  • 00:12:34 numpy is a shorthand thing we can do
  • 00:12:38 with tensor flow to get it in numpy
  • 00:12:40 format so rerun these two cells again
  • 00:12:42 let's see what happens okay so now we
  • 00:12:45 have it in the form of a numpy array the
  • 00:12:49 one thing i do want to know though is
  • 00:12:50 what is the shape of the overall images
  • 00:12:55 and oh i guess we have one other issue
  • 00:12:59 is that we did this at a list
  • 00:13:00 comprehension but the surrounding data
  • 00:13:03 type is just a list right now and if we
  • 00:13:05 want to make it also a numpy array we
  • 00:13:07 need to surround it with NPRA okay so
  • 00:13:10 run those two things again okay we have
  • 00:13:14 two thousand five hundred twenty by 300
  • 00:13:18 by 300 by three so it's two thousand
  • 00:13:21 five hundred twenty images with this
  • 00:13:24 shape the one thing I would say and this
  • 00:13:26 is getting really gonna be problem
  • 00:13:28 independent is that we are working with
  • 00:13:31 you know these hands and in my opinion
  • 00:13:35 the colors don't really matter too much
  • 00:13:37 so right now we have these three colour
  • 00:13:40 channels RGB but in my opinion we only
  • 00:13:43 really need one color Channel because
  • 00:13:45 we're really looking at trying to find
  • 00:13:47 edges and whatnot so I think we have
  • 00:13:49 three color channels we might kind of
  • 00:13:52 have more data than we actually need so
  • 00:13:55 we're trying to reduce the number of
  • 00:13:57 things that the network has to learn so
  • 00:14:00 ultimately I think it would be nice to
  • 00:14:02 have this in grayscale so we're gonna do
  • 00:14:04 everything for the dimensions for the
  • 00:14:08 first two dimensions so we need to take
  • 00:14:10 all these same points still but for the
  • 00:14:13 last dimension the color Channel we'll
  • 00:14:15 just take the first color Channel we'll
  • 00:14:17 just take that red color Channel and I
  • 00:14:19 think that's good
  • 00:14:21 and so now let's see what we have now we
  • 00:14:23 don't have a color channel that's cool
  • 00:14:26 we'll have to do a little bit of
  • 00:14:27 additional processing but let's now in
  • 00:14:30 addition to train images let's get train
  • 00:14:31 labels and that will be NPRA this will
  • 00:14:35 be pretty similar
  • 00:14:36 it'll be example label dot dot and on P
  • 00:14:44 we can just take whatever label it is
  • 00:14:46 because it's just going to be a integer
  • 00:14:48 as we looked at the data a little bit
  • 00:14:50 earlier in this tutorial for example in
  • 00:14:53 DS train ok I'm going to now just copy
  • 00:14:58 this code and also do the same thing for
  • 00:15:03 the test images so I'm just going to
  • 00:15:05 change this out to test change this out
  • 00:15:07 to test and change this to test labels
  • 00:15:11 and this to test all right let's load
  • 00:15:18 all this alright so with this cell we've
  • 00:15:22 now loaded everything in in numpy format
  • 00:15:25 we need to do a little bit of a
  • 00:15:27 processing before we're ready to pass it
  • 00:15:29 on to our network and there's a couple
  • 00:15:30 items here so once again we can see the
  • 00:15:33 shape and if we wanted to check out the
  • 00:15:36 test images shape it'll be pretty
  • 00:15:37 similar except lower number of examples
  • 00:15:40 right first thing is we need to reshape
  • 00:15:43 these so I want to just say train images
  • 00:15:47 equals train images dot reshape and if
  • 00:15:52 you remember what it was I'm just going
  • 00:15:54 to comment this out just to see this
  • 00:15:56 again
  • 00:16:01 we had 2,500 180 by 300 by 300 so we're
  • 00:16:05 gonna reshape this one to twenty five
  • 00:16:09 twenty by 300 by 300 501 and the reason
  • 00:16:15 we have this one here is that whenever
  • 00:16:17 we use convolutional neural networks in
  • 00:16:19 Kerris we always have to have some sort
  • 00:16:21 of semblance of a color channel and so
  • 00:16:24 by changing this to one we don't change
  • 00:16:27 the number of values we have at all we
  • 00:16:29 just are basically letting Kerris know
  • 00:16:31 hey this is just a grayscale these are
  • 00:16:34 just grayscale images so do that and
  • 00:16:38 also now I need to just figure out my
  • 00:16:40 shape of the test images 372 so I'll do
  • 00:16:47 the same line basically but now we'll do
  • 00:16:51 372 by 300 by 300 by 1 and this is 372
  • 00:17:02 okay that now reshapes it and we didn't
  • 00:17:08 get any error so that's cool some other
  • 00:17:11 things we'll want to do here so we've
  • 00:17:13 got reshaped it and I think let's see if
  • 00:17:17 I can do that info is that something I
  • 00:17:21 think by data type okay this is this is
  • 00:17:28 good this is you int 8 so it's all
  • 00:17:31 integer values and they're up if they're
  • 00:17:33 RGB I guess they're just our channel
  • 00:17:36 values from 0 to 255 when we're doing
  • 00:17:40 image classification we like to work
  • 00:17:42 with float values specifically between
  • 00:17:44 the range of 0 to 1 so one thing we're
  • 00:17:47 gonna do is we're gonna say train images
  • 00:17:51 we're gonna say train images equals
  • 00:17:54 train images dot as type and now it's
  • 00:17:59 going to be a float 32 instead of the
  • 00:18:02 int 8 that we had before so this is just
  • 00:18:05 basically getting us ready to be able to
  • 00:18:07 convert it from a scale of 0 to 1
  • 00:18:09 instead of 0 to 255
  • 00:18:16 and so the last step we're gonna do is
  • 00:18:20 say train images dividing equal by 255
  • 00:18:26 so the max value we can have is 255
  • 00:18:29 because RGB values are between 0 and 255
  • 00:18:32 so by doing this we're scaling every
  • 00:18:34 value to be between 0 & 1 and this is
  • 00:18:38 just a good common practice that helps
  • 00:18:41 you classify it helps the basically
  • 00:18:44 network learn better than if you use the
  • 00:18:47 0 to 255 values you could leave it 0 to
  • 00:18:50 255 but it's just ultimately it's gonna
  • 00:18:53 probably decrease your performance a bit
  • 00:18:55 so it's a common step to normalize
  • 00:18:57 between 0 & 1 and that's what we're
  • 00:18:59 doing right here I think that's all we
  • 00:19:01 need to do the labels are all fine from
  • 00:19:04 before so we don't need to do any
  • 00:19:05 additional processing to the label so
  • 00:19:07 we'll just rerun this so and now if we
  • 00:19:11 look at training images data type we'll
  • 00:19:15 see it's a flirt 32s and if we actually
  • 00:19:18 wanted to like look at one of the values
  • 00:19:21 here we'll see that all the values are
  • 00:19:25 between 0 & 1 and we can also even look
  • 00:19:28 at the shape here of a single example
  • 00:19:33 300 by 300 by one that looks all good
  • 00:19:36 so I think that's good for the
  • 00:19:37 processing steps alright now let's go
  • 00:19:40 ahead and train our first neural network
  • 00:19:42 for this task and I'm gonna go through
  • 00:19:46 this pretty quickly because I've already
  • 00:19:49 covered kind of a lot of this in my
  • 00:19:51 previous neural network video I had to
  • 00:19:53 start off let's define a model and that
  • 00:19:55 will be a sequential model and so in
  • 00:19:59 that we're gonna have to pass in some
  • 00:20:01 layers so the first layer we'll pass in
  • 00:20:05 is Karis layers dense so as the kind of
  • 00:20:09 basic approach we'll make it a fully
  • 00:20:11 connected no network so let's say that
  • 00:20:13 the first layer of the dense network is
  • 00:20:16 512 when we're defining our first layer
  • 00:20:20 we're gonna want to find the input shape
  • 00:20:22 so in our case now it's 300
  • 00:20:24 by three hundred by one and then we'll
  • 00:20:28 also want to define the activation and a
  • 00:20:30 good one to use is rail you okay that's
  • 00:20:33 our first layer our next layer is going
  • 00:20:35 to be another dense layer so we'll do
  • 00:20:38 Karis layers dense and we'll make this
  • 00:20:42 one a little bit smaller so how about we
  • 00:20:43 do 256 here and then make the activation
  • 00:20:48 another arel you and then finally let's
  • 00:20:52 define our output layer and so the
  • 00:20:54 output layer is going to be the same
  • 00:20:56 size as you have labels that you're
  • 00:20:58 trying to classify it between so in our
  • 00:21:01 case this will be a dense layer of three
  • 00:21:03 because we have rock paper and scissors
  • 00:21:05 and let's also define a activation layer
  • 00:21:09 for our output and when we're doing
  • 00:21:12 classification of and just identifying
  • 00:21:15 one label softmax is a good choice here
  • 00:21:17 alright that looks good
  • 00:21:19 and then we're gonna have to setup the
  • 00:21:21 loss function for this so I'll do model
  • 00:21:23 compile we'll pass in the atom optimizer
  • 00:21:28 we will use for a loss function we will
  • 00:21:32 use Karis losses dots Perce categorical
  • 00:21:38 cross entropy and then finally we will
  • 00:21:43 pass in our metrics and that will just
  • 00:21:47 be accuracy okay so we have our loss
  • 00:21:51 function set up and finally we need to
  • 00:21:55 fit our data to the model so model dot
  • 00:21:58 fit will do train or train images train
  • 00:22:03 labels and then we'll do a parks let's
  • 00:22:07 say equal five and batch size equals 32
  • 00:22:14 on that input shape is not defined
  • 00:22:19 equals 300 by 300 by one
  • 00:22:24 okay go on all right so we got an issue
  • 00:22:28 invalid incompatible shapes 32 or one
  • 00:22:31 versus 32 300 300
  • 00:22:35 and ultimately I think our issue here is
  • 00:22:37 because we're passing in an input shape
  • 00:22:39 and this is kind of a complex you know
  • 00:22:41 this is an actual image and it has a
  • 00:22:42 complex dimensionality so instead of
  • 00:22:45 doing this what we're gonna do and this
  • 00:22:47 is usually what you're going to do if
  • 00:22:49 you're trying to use a fully connected
  • 00:22:51 layer for images we're gonna do Karis
  • 00:22:53 dot layer so we're going to add one more
  • 00:22:55 layer and this is gonna be called a
  • 00:22:56 flatten layer and basically what this
  • 00:22:59 does is it transforms that 300 by 300
  • 00:23:01 image into just a single column which
  • 00:23:05 would probably be of dimension 90,000 so
  • 00:23:09 that's what we're doing with that layer
  • 00:23:10 so now I think we should be able to run
  • 00:23:12 this training step and it's running and
  • 00:23:16 because this takes a little while I'm
  • 00:23:17 going to just kind of speed through this
  • 00:23:19 section and I'll do that probably in
  • 00:23:21 general all right it finished and
  • 00:23:24 looking at our results you see that
  • 00:23:27 after the last epoch we had about 90%
  • 00:23:31 accuracy so you might be like wow like
  • 00:23:33 this was so easy we've already
  • 00:23:35 classified these hands these
  • 00:23:37 rock-paper-scissors hands like very well
  • 00:23:40 well I think though it's pretty telling
  • 00:23:42 if we try to do a model dot evaluate on
  • 00:23:45 the test images and the test labels this
  • 00:23:51 will tell us how well our model
  • 00:23:52 generalizes to unseen data and so if
  • 00:23:56 this is also high then like we're golden
  • 00:23:57 but I don't think it will be yeah and as
  • 00:24:02 you see here it got 50% on the test data
  • 00:24:07 so there's a big disconnect you know we
  • 00:24:09 were able to classify 90% of the
  • 00:24:11 training examples properly but only 50%
  • 00:24:14 of the test so it's not doing a very
  • 00:24:17 well job on the test data and the big
  • 00:24:20 disconnect here is is that we are using
  • 00:24:23 this fully connected layer for these 300
  • 00:24:25 by 300 images and so that's a total of a
  • 00:24:28 90 thousand like single I guess pixels
  • 00:24:32 that we are connecting to the next layer
  • 00:24:34 of our network and ultimately we don't
  • 00:24:37 have
  • 00:24:38 we're putting too much importance in
  • 00:24:40 those little single pixels so what's
  • 00:24:42 happening here is that we're basically
  • 00:24:44 we're overfitting to our training data
  • 00:24:46 and we're not really learning good
  • 00:24:49 patterns because as you see in our test
  • 00:24:52 data we got 50 percent so ultimately
  • 00:24:54 this is what's going to lead us to want
  • 00:24:55 to use convolutional neural networks to
  • 00:24:58 kind of get a more sense of
  • 00:25:01 generalizable features within our
  • 00:25:03 rock-paper-scissors images so that's
  • 00:25:06 what we're going to do for our next
  • 00:25:07 network I'm not going to go through the
  • 00:25:09 details of a convolutional network in
  • 00:25:11 great depth but basically we can think
  • 00:25:13 of all of our images as a grid and
  • 00:25:14 before we are using each pixel to feed
  • 00:25:17 into our network now within our grid
  • 00:25:20 image we are passing over a smaller grid
  • 00:25:23 across the entire image and basically
  • 00:25:27 with these smaller grids that we're
  • 00:25:28 passing on the in the entire image these
  • 00:25:31 convolutions were performing we're
  • 00:25:33 learning features that occupy a little
  • 00:25:36 bit more space and as we think about it
  • 00:25:38 like images there's so much variance in
  • 00:25:40 how they are so we need to like at a
  • 00:25:42 higher level be able to pick up on those
  • 00:25:44 features so that's why we're gonna pass
  • 00:25:46 these smaller grids over our bigger grid
  • 00:25:49 to hopefully pull out general features
  • 00:25:52 in our images all right and so we're
  • 00:25:55 gonna again
  • 00:25:56 start this off by defining a sequential
  • 00:25:58 model and this time what we're gonna do
  • 00:26:03 is our first layer is going to be
  • 00:26:06 caressed layers thought con-con for
  • 00:26:11 convolution and to-day because this is a
  • 00:26:13 2d image and then we have to pass in the
  • 00:26:18 appropriate parameters so if we look at
  • 00:26:22 our Google collab we need a first pass
  • 00:26:24 in filters so this is basically how many
  • 00:26:27 different smaller grids we're gonna pass
  • 00:26:30 on top of our image and each of these
  • 00:26:33 smaller grids is going to have a
  • 00:26:34 different kind of shape and try to find
  • 00:26:36 a different pattern I'll link to some
  • 00:26:38 resources that can explain this in more
  • 00:26:40 depth it's basically how many times
  • 00:26:42 we're passing over a smaller grid on our
  • 00:26:44 image so we're gonna say this is 64
  • 00:26:47 there's just a kind of a see first
  • 00:26:48 answer our kernel size
  • 00:26:51 this is how big our smaller grid is so
  • 00:26:53 if I said three and I didn't pass in any
  • 00:26:56 other parameters this would assume three
  • 00:26:57 by three I think you could also pass in
  • 00:27:00 if you wanted to have a different sized
  • 00:27:02 the X and y direction you could do like
  • 00:27:04 three by six and that would be a
  • 00:27:06 rectangular that'd be a rectangular
  • 00:27:08 kernel size so we'll say let's just say
  • 00:27:11 three to start off and we'll leave the
  • 00:27:15 strides at one two one
  • 00:27:17 that just means they'll move one every
  • 00:27:18 time so it's gonna be a sliding window
  • 00:27:20 of three by three all right
  • 00:27:22 and how about we and we'll give that an
  • 00:27:27 activation of rel you just like we have
  • 00:27:31 been and let's now we actually do need
  • 00:27:33 to define the input shape and this time
  • 00:27:35 it will expect a multi dimensional shape
  • 00:27:38 so we can do 300 by 300 by one that's
  • 00:27:41 totally fine now I'm going to just pass
  • 00:27:45 in another convolutional layer so we're
  • 00:27:49 gonna do another 2d convolution this
  • 00:27:51 time we're gonna make it a little bit
  • 00:27:52 smaller
  • 00:27:53 we'll say thirty two filters with kernel
  • 00:27:56 size of three activation equals real you
  • 00:28:00 and all right I think that's good we're
  • 00:28:04 having we have two convolutional layers
  • 00:28:06 and now we're gonna want to pass this to
  • 00:28:08 a kind of the output layer and to do
  • 00:28:11 that we first need to flatten the output
  • 00:28:14 of all these convolutions so we're gonna
  • 00:28:16 pass another flatten layer
  • 00:28:18 caris dot layers dot flatten and then
  • 00:28:24 we'll just like in the last case we'll
  • 00:28:25 do Karos layers dense and this is our
  • 00:28:29 output layer so we look three with an
  • 00:28:31 activation of softmax cool it looks good
  • 00:28:37 and we're just going to go ahead and
  • 00:28:39 copy our loss function from above great
  • 00:28:48 and let's go ahead and try to train this
  • 00:28:52 it's working but it seems to be going
  • 00:28:57 really really slow like too slow to
  • 00:29:01 let's see yeah really really slow
  • 00:29:05 one insight here and this is something
  • 00:29:07 it might have been annoying that I
  • 00:29:08 haven't told you already
  • 00:29:10 but within google collab we can actually
  • 00:29:11 run on a GPU if we want to so like right
  • 00:29:14 now it shows that like this single epoch
  • 00:29:17 will take ten minutes and it's not ideal
  • 00:29:19 so what we're gonna do is we're gonna
  • 00:29:22 stop this real quick if I can stop stop
  • 00:29:27 okay no no stop I stopped it but then I
  • 00:29:29 tried again all right so we're gonna
  • 00:29:33 real quick just change our run time so
  • 00:29:36 you go to run time and then change run
  • 00:29:38 time type we're going to use a GPU
  • 00:29:40 hardware accelerator and really only use
  • 00:29:44 a GP if you need one as they kind of say
  • 00:29:46 so I'm gonna save that we're gonna go
  • 00:29:50 back and now basically we need to
  • 00:29:52 rewrite everything because it's using a
  • 00:29:53 different different hardware so I'm just
  • 00:29:56 going to run all the cells that are
  • 00:29:57 relevant here
  • 00:30:13 and we don't have to run this one again
  • 00:30:15 because that's all focused on right now
  • 00:30:18 we'll run this one again once everything
  • 00:30:20 else is done let's just let these
  • 00:30:22 execute all right and now let's run our
  • 00:30:27 convolutional neural net last time it
  • 00:30:29 said it was gonna take like ten minutes
  • 00:30:30 door in a single epoch let's see how
  • 00:30:32 well it runs with the GPU and look at
  • 00:30:37 that from what it was taking about ten
  • 00:30:40 minutes we've now run a single epoch in
  • 00:30:42 about ten seconds so that's really cool
  • 00:30:46 look at this it's like really performing
  • 00:30:49 well now that we've used that
  • 00:30:50 convolutional layer on top and wow it's
  • 00:30:54 at the last layer here it's predicting
  • 00:30:56 all the training utterances properly but
  • 00:31:00 you know the ultimate test is it doesn't
  • 00:31:02 matter if we can do really well in
  • 00:31:03 training data that does have that has no
  • 00:31:05 meaning unless we can do well on test
  • 00:31:08 data so we're gonna do model evaluate
  • 00:31:10 test images and test labels and then see
  • 00:31:13 how we do geez we actually did worse
  • 00:31:18 than we did before with the fully
  • 00:31:21 connected approach so obviously we over
  • 00:31:25 fit again so let's see how we can
  • 00:31:28 improve this convolutional approach and
  • 00:31:30 hopefully avoid our overfitting problem
  • 00:31:32 all right so now that's we're going to
  • 00:31:35 find another cell
  • 00:31:37 I'm just call this a better
  • 00:31:39 convolutional Network all right how can
  • 00:31:44 we improve this network that obviously
  • 00:31:45 overfit I'm gonna just go ahead and copy
  • 00:31:49 this into our new cell no need to start
  • 00:31:53 from scratch but let's look at this what
  • 00:31:56 is the main issue that we're having and
  • 00:31:58 why are we overfitting and this is
  • 00:32:00 definitely not obvious unless you've
  • 00:32:02 kind of started to work with image
  • 00:32:04 classification a bit I would say the the
  • 00:32:07 problem kind of boils down to the fact
  • 00:32:09 that we don't have that many training
  • 00:32:11 examples you know only we have under
  • 00:32:14 3000 training examples or yeah thank I
  • 00:32:18 forget the exact number yeah I think we
  • 00:32:21 have 2500 20 training examples so that's
  • 00:32:24 not that many and the grand scheme of
  • 00:32:26 things
  • 00:32:26 and also the other real big issue that I
  • 00:32:29 see here is that our images are 300 by
  • 00:32:32 300 by one but our kernel size here the
  • 00:32:37 small grid that we're passing on top of
  • 00:32:39 the image when we're doing the
  • 00:32:40 convolution is just three by three so
  • 00:32:45 it's still like a really small rectangle
  • 00:32:48 we're passing on top of the big
  • 00:32:50 rectangle so what we're all Tamayo and
  • 00:32:53 like obviously like if we're thinking
  • 00:32:55 about our image we don't need
  • 00:32:57 high-resolution images as long as like
  • 00:33:00 the base features are there we should be
  • 00:33:02 good so ultimately what we want to do
  • 00:33:04 here is reduce the size of this input
  • 00:33:07 image before we start performing the
  • 00:33:10 convolutions on it so what we're gonna
  • 00:33:12 do is a average pool we're gonna add an
  • 00:33:15 average pooling layer so Karis layers
  • 00:33:19 dot average I think it's average pooling
  • 00:33:24 average pooling 2d that sounds good to
  • 00:33:26 me and what are our options here is pool
  • 00:33:30 size will say our pool size is going to
  • 00:33:34 be 6 we can pass in if we if it's square
  • 00:33:39 we can just pass in 1/4 in it again but
  • 00:33:42 we could also pass in multiple
  • 00:33:44 coordinates so and then our stride our
  • 00:33:47 stride is how far if we have a six by
  • 00:33:49 six box our stride is how much we move
  • 00:33:52 that box each time we do an average so
  • 00:33:55 I'm going to say that this is three so
  • 00:33:57 basically what we're doing here in this
  • 00:33:59 layer is that as we pass it and we're
  • 00:34:02 gonna also have the pass in the input
  • 00:34:03 shape here so what we're basically doing
  • 00:34:10 is that we have this 300 by 300 input
  • 00:34:13 image and what we're doing is basically
  • 00:34:16 passing a six by six box on top of that
  • 00:34:19 image and every time we do that we're a
  • 00:34:22 bridging all the pixels and that's our
  • 00:34:25 kind of new representation of that box
  • 00:34:28 and instead of removing the box entirely
  • 00:34:30 over we're moving the box three so every
  • 00:34:34 two times we move the box we are any
  • 00:34:37 like basically new location so you
  • 00:34:39 basically have
  • 00:34:39 this and this and this and this second
  • 00:34:44 time you move that box here kind of
  • 00:34:45 covering new squares I guess from the
  • 00:34:48 first time you did it so I guess the
  • 00:34:50 better way to say this is that each
  • 00:34:52 pixel is counted twice in this
  • 00:34:53 representation but basically we're gonna
  • 00:34:55 boil this down if we have six and
  • 00:35:01 removing on top of it it's going to boil
  • 00:35:05 this down from 300 – 300 – I believe if
  • 00:35:09 I'm not mistaken 100 by 100 as the new
  • 00:35:13 kind of input and you could reduce this
  • 00:35:15 further further if you wanted to if we
  • 00:35:18 did six by six here with a stride of six
  • 00:35:21 think this would reduce it down to 50 by
  • 00:35:23 50 so that's another thing to try but I
  • 00:35:25 want to count each pixel twice here and
  • 00:35:29 then we'll feed this now into our
  • 00:35:32 convolutional Network so this is
  • 00:35:34 basically just the image with instead of
  • 00:35:36 size 300 by 300 after this layer it's
  • 00:35:39 100 by 100 and let's see what happens to
  • 00:35:42 our network when we just do that
  • 00:35:44 reduction in image size so still like
  • 00:35:48 basic classifies the test data very well
  • 00:35:50 let's see what happens when we evaluate
  • 00:35:54 it so test images test labels want to do
  • 00:36:01 better than last time okay
  • 00:36:03 56% I mean that's definitely better I
  • 00:36:06 think we can still do better here and I
  • 00:36:10 would say what our issues here are now
  • 00:36:12 and this is kind of comes with practice
  • 00:36:14 and you know I wouldn't notice these
  • 00:36:16 things unless I really looked at this
  • 00:36:18 for a while and I have looked at this
  • 00:36:20 example before doing this so I have more
  • 00:36:23 of an intuition I would coming into this
  • 00:36:25 exercise cold but we have a lot here
  • 00:36:28 another very common thing in
  • 00:36:29 convolutional neural nets is to do some
  • 00:36:34 max pooling
  • 00:36:37 and this is basically for the same
  • 00:36:39 reason is we don't want to take too many
  • 00:36:42 dimensions when we're doing these things
  • 00:36:44 so max pool 2d is going to basically do
  • 00:36:49 the similar thing as an average pool but
  • 00:36:51 now we're gonna pass over our output
  • 00:36:53 with two by two grids and just take the
  • 00:36:56 max pixel in every 2×2 grid and this
  • 00:36:59 time we're moving the 2×2 grid to every
  • 00:37:02 time so each pixel is only counted one
  • 00:37:04 but we're just taking the max pixel
  • 00:37:06 value from each of those so let's see
  • 00:37:09 what happens if we add that and I
  • 00:37:11 screwed something up now
  • 00:37:15 caris top layers
  • 00:37:23 all right how we doing all right
  • 00:37:26 66% accuracy on the test data that's a
  • 00:37:29 lot better and it's ultimately because
  • 00:37:30 we're reducing the size of things and
  • 00:37:33 basically individual pixels don't have
  • 00:37:35 as much weight anymore so it's smaller
  • 00:37:37 input space smaller parameter size so
  • 00:37:40 that our model should generalize more
  • 00:37:43 but there's one major thing that I'm
  • 00:37:44 thinking of that we can do to probably
  • 00:37:48 help things out and that is to use
  • 00:37:50 dropout so one of our main issues with
  • 00:37:54 our training examples is that we don't
  • 00:37:57 have many
  • 00:37:57 and so what dropout will do is that
  • 00:38:00 after we're done with the max pool 2d
  • 00:38:02 layer we're going to basically cut out
  • 00:38:04 50% of the connections so as we train
  • 00:38:09 things 50% of the connections that this
  • 00:38:11 lay are going to be dropped out and
  • 00:38:12 ultimately the epochs will be more
  • 00:38:15 effective because the data will be kind
  • 00:38:17 of more interesting because these things
  • 00:38:20 are getting dropped out randomly so each
  • 00:38:22 of the individual connections will kind
  • 00:38:24 of have to generalize more and be able
  • 00:38:26 to handle more variation so this is
  • 00:38:29 going to kind of help us simulate having
  • 00:38:32 more training examples by using this
  • 00:38:34 dropout layer and 50% is a good it's
  • 00:38:37 kind of in research the common value to
  • 00:38:40 go to with this dropout layer so let's
  • 00:38:42 try this and thankfully we're on the GPU
  • 00:38:45 so everything's going really fast so we
  • 00:38:50 went from 66 and the last layer okay and
  • 00:38:53 we're about the same 65 percent didn't
  • 00:38:56 seem to help too much the last thing
  • 00:38:58 that I'm gonna recommend doing and this
  • 00:39:02 is something that you just really play
  • 00:39:04 around with but we're gonna add another
  • 00:39:06 dense layer here so instead of going
  • 00:39:09 from everything to three we're going to
  • 00:39:11 try to learn an inter minute
  • 00:39:12 intermediary representation before we
  • 00:39:16 actually do the output so we're hoping
  • 00:39:17 that maybe this layer can capture some
  • 00:39:20 higher-level information and boil it
  • 00:39:22 down to the three better than just
  • 00:39:23 passing all the outputs of the
  • 00:39:25 convolution and the dropout and
  • 00:39:27 everything straight to our
  • 00:39:28 classification layer so make this 128
  • 00:39:31 we'll use see what happens if you use
  • 00:39:34 lü activation comment here all right
  • 00:39:47 moment of truth all right look at that
  • 00:39:50 70 percent accuracy that's pretty good
  • 00:39:53 on the test date on scene data I'm
  • 00:39:54 pretty happy with that so I think as far
  • 00:39:57 as on your own you could tune these
  • 00:40:00 values and play around with adding more
  • 00:40:02 values and whatnot
  • 00:40:03 maybe you'll get better results but our
  • 00:40:05 next and final basically step of this
  • 00:40:07 example will be to look at automatically
  • 00:40:10 tuning the values in this network and
  • 00:40:12 maybe adding more layers to see if we
  • 00:40:14 can get even better test accuracy than
  • 00:40:16 69% to automatically tune our network
  • 00:40:20 we're gonna use a tool called Terrace
  • 00:40:24 tuner and it just does a lot of the work
  • 00:40:28 for us
  • 00:40:29 and it makes it very easy to find the
  • 00:40:30 best parameters for a network so I think
  • 00:40:33 the easiest way to learn about carries
  • 00:40:35 tuner is to go to the cares tuner
  • 00:40:37 website I'll link this in the
  • 00:40:40 description but here we are so to start
  • 00:40:45 off we need to install it so we can use
  • 00:40:47 pip install – hewharris tuner so I'll
  • 00:40:50 just copy that command and I'll run that
  • 00:40:53 with my notebook and while that's going
  • 00:40:56 I think we can look back at that
  • 00:40:58 documentation and just read through
  • 00:41:02 usage the basics and I think it already
  • 00:41:04 kind of starts making things clear so
  • 00:41:07 here we have this build model with this
  • 00:41:11 hyper parameter variable that we're
  • 00:41:14 passing through and as we see this is
  • 00:41:17 used was basically we're defining a
  • 00:41:19 sequential model and this is a slightly
  • 00:41:21 different format than we defined our
  • 00:41:22 SWAT the kind of refactor our current
  • 00:41:26 network a little bit but we are you know
  • 00:41:29 basically in this step they're adding a
  • 00:41:31 dense layer a fully connected layer and
  • 00:41:33 this time the big difference is that the
  • 00:41:37 number of units for that single layer is
  • 00:41:40 defined as a variable like it can be
  • 00:41:43 different values and this is what we're
  • 00:41:45 testing and playing around with so it's
  • 00:41:47 saying
  • 00:41:47 basically that we want to try the min
  • 00:41:49 value of 30 to a max value of 512 for
  • 00:41:53 this layer and a step size of 32 so
  • 00:41:56 behind the scenes this build model and
  • 00:41:59 the Charis tuner will test the numbers
  • 00:42:02 from 32 to 512 skipping every 32 units
  • 00:42:07 for this dense layer and we can do this
  • 00:42:10 for a bunch of things and we don't just
  • 00:42:12 have to use this int command there's
  • 00:42:13 other commands so as you see down here
  • 00:42:15 they're playing around with what
  • 00:42:17 learning rate should we use so it's in
  • 00:42:20 the atom optimizer it's using a choice
  • 00:42:24 of the learning rate so it's trying
  • 00:42:26 these three values at random with the
  • 00:42:29 variability up here as well so it's
  • 00:42:31 allowing us to test different
  • 00:42:33 combinations of things very easily so
  • 00:42:35 let's go ahead and just start writing
  • 00:42:37 this into our code so the first step is
  • 00:42:40 going to be to take our network from up
  • 00:42:42 here and rewrite it kind of in that
  • 00:42:44 format that they were using so if we
  • 00:42:47 look back at that so the first step will
  • 00:42:50 be to import the random search we've
  • 00:42:54 already imported the Charis and the
  • 00:42:55 layers we want to import random search
  • 00:43:04 and then we're going to define that
  • 00:43:06 build model with that hyper parameter
  • 00:43:10 variable being passed in and I've
  • 00:43:13 actually already done this just to speed
  • 00:43:15 it up a little bit the conversion to
  • 00:43:17 this different format so our new model
  • 00:43:19 will look something like this
  • 00:43:24 so this is the same exact model from
  • 00:43:26 above but now I just wrote it with this
  • 00:43:31 different syntax where we define the
  • 00:43:32 model it's just the sequential and we
  • 00:43:34 add to that but same exact model and
  • 00:43:37 we're gonna have to also add in the loss
  • 00:43:41 function again to our function and then
  • 00:43:46 finally we will return the model from
  • 00:43:48 the function that's really all we have
  • 00:43:51 to do but now we can decide if we want
  • 00:43:52 to play around with variables so let's
  • 00:43:55 go back to the documentation we have
  • 00:43:59 next step looks like is defining this
  • 00:44:01 tuner so I'll just copy this in so it's
  • 00:44:09 calling the build model function the
  • 00:44:11 objective is the validation accuracy so
  • 00:44:14 how well does it do on our test set max
  • 00:44:16 trials right now is set to 5 not quite
  • 00:44:19 sure executions per trial is I think
  • 00:44:21 it's how much times it's trying the
  • 00:44:24 different combination of parameters
  • 00:44:26 because there's some randomness involved
  • 00:44:27 I'm not going to keep that for now and
  • 00:44:30 then you can also choose where you save
  • 00:44:31 these results by default it will choose
  • 00:44:34 something so I'm going to just probably
  • 00:44:37 get rid of these directory and project
  • 00:44:39 names for now but know that you could
  • 00:44:41 set those if you want to save it in a
  • 00:44:42 certain spot all right that's good and
  • 00:44:45 then finally once we have now to find
  • 00:44:47 our tuner we need to do a tuner dot
  • 00:44:50 search and we need to pass in some
  • 00:44:53 parameters so we need to pass in first
  • 00:44:56 or x and y so our train images or train
  • 00:44:59 labels and then we need to pass in what
  • 00:45:02 we're trying to optimize this network
  • 00:45:05 for and so that's going to be our
  • 00:45:06 validation data and that's going to be
  • 00:45:09 equal to the test images and the test
  • 00:45:12 labels then we can also set stuff like a
  • 00:45:16 box I'm going to say 10 epochs now and
  • 00:45:18 batch size of 32 cool and now the big
  • 00:45:26 difference is we need to now choose some
  • 00:45:28 of these values to play around and try
  • 00:45:31 to optimize so I'm going to just as an
  • 00:45:34 easy example play around with this dense
  • 00:45:37 layer right here at the end so let's see
  • 00:45:41 what our options are we can use this HP
  • 00:45:44 dot choice I think that will be helpful
  • 00:45:46 or we could probably use this HP dot int
  • 00:45:48 which chooses random int values either
  • 00:45:50 of these options is fine I think it's
  • 00:45:52 sometimes easier to use this HP dot
  • 00:45:54 choice just to kind of limit the number
  • 00:45:57 of things I try so I'm going to go ahead
  • 00:45:58 and do that so I'm going to set this
  • 00:46:02 instead of just 128 it's going to be now
  • 00:46:04 HP dot
  • 00:46:08 HP choice and we now need to pass in a
  • 00:46:11 name whenever using one of these so this
  • 00:46:12 is going to be I'm just gonna call it
  • 00:46:15 dense layer and now what do we want for
  • 00:46:19 our choices well our initial was 128 so
  • 00:46:21 we can keep that in there but how about
  • 00:46:23 we also try 64 maybe 256 maybe 512 maybe
  • 00:46:31 1024 so now we have five different
  • 00:46:34 options here and what we can do is go
  • 00:46:38 ahead and now run this search and we
  • 00:46:42 have five options so max trials 5 is
  • 00:46:44 good but if we had more combinations of
  • 00:46:46 things you might want to bump up this
  • 00:46:48 max trials value to something like 32 it
  • 00:46:51 will run as it'll run random
  • 00:46:54 combinations of the parameters you're
  • 00:46:56 trying to use until it hits 32 or until
  • 00:47:00 all the trials are exhausted so in our
  • 00:47:03 case we only have 5 different things
  • 00:47:05 that can choose from so it should really
  • 00:47:06 just run 5 trials so this is a very
  • 00:47:10 simple example of using the tuner and
  • 00:47:12 let's see how things work and notice
  • 00:47:14 note the tuner can take a while to run
  • 00:47:16 so just be wary of that because we're
  • 00:47:19 only testing 5 different things it
  • 00:47:21 shouldn't be too too bad in our case but
  • 00:47:23 that is something to note all right
  • 00:47:29 awesome
  • 00:47:29 everything executed so everything is
  • 00:47:34 wrong we've now played around with a
  • 00:47:36 different number of parameters here so
  • 00:47:40 let's see some results one of the best
  • 00:47:43 things to do first I guess is you can
  • 00:47:44 actually get the best model it was found
  • 00:47:47 during that by doing tuner dot get best
  • 00:47:50 models models and then grab the 0th
  • 00:47:57 index so that's gonna be the top model
  • 00:47:58 and now best model is actually equal to
  • 00:48:01 tuner Duquette
  • 00:48:03 oh it's get best models like this it's a
  • 00:48:10 function so now best model is set to the
  • 00:48:15 top thing that the tuner found and we
  • 00:48:17 can actually do best model not evaluate
  • 00:48:20 train images or sorry test images test
  • 00:48:25 labels and see how well it does on that
  • 00:48:27 data and look at that so it did got 73%
  • 00:48:35 accuracy so that was like a 4% increase
  • 00:48:37 just by figuring out a better value to
  • 00:48:40 use for this last layer here and one
  • 00:48:48 thing to note too is that this guess get
  • 00:48:51 best models will not only select the
  • 00:48:53 best value that it found here but it
  • 00:48:56 will also select the parameter weights
  • 00:48:58 from the best-performing epoch during
  • 00:49:02 the training so it did give us a boost
  • 00:49:06 in accuracy there we can also do some
  • 00:49:08 other stuff here one thing we could do
  • 00:49:10 is see what actual parameters that best
  • 00:49:12 model is using so summary you can do
  • 00:49:14 best model dot summary and that shows us
  • 00:49:17 that it found that a thousand twenty
  • 00:49:19 four for that last layer did the best
  • 00:49:21 there so that's something to note it
  • 00:49:24 also gives us the kind of info on the
  • 00:49:26 other layers or were using in the
  • 00:49:27 network that's pretty cool you can also
  • 00:49:30 do tuner dot results summary to just get
  • 00:49:35 some high level results on how those
  • 00:49:37 trials went so you can see kind of
  • 00:49:39 scores from each one of those
  • 00:49:40 combinations of layers that we used
  • 00:49:43 alright so that's the basics of using
  • 00:49:45 this search but I think really playing
  • 00:49:48 around with this I was able to I think
  • 00:49:51 get up to like 81 percent accuracy on
  • 00:49:54 the test data and to do that I added
  • 00:49:57 some additional things so I like played
  • 00:49:59 around with how many convolutional
  • 00:50:01 layers were used and how many filters so
  • 00:50:02 that would look something like this so
  • 00:50:05 for I in range HP dot int and now this
  • 00:50:10 is going to be comm volution I'm going
  • 00:50:13 to say con layer how many convolutional
  • 00:50:17 layers we have and that's going to be
  • 00:50:20 let's say between 0 the min value is 0
  • 00:50:27 max value let's say is equal to three
  • 00:50:31 and then you can also with this one set
  • 00:50:34 a step size how many times it skips but
  • 00:50:37 it's default is one so I'm going to
  • 00:50:39 leave that and basically what we could
  • 00:50:41 do here is add we're going to delete
  • 00:50:45 this one basically in our tuner we can
  • 00:50:47 decide how many of these convolutional
  • 00:50:49 layers to add and additionally we could
  • 00:50:52 even like play around with making this a
  • 00:50:57 variable parameter so I'm going to say F
  • 00:51:00 layer so just bear with me for a sec
  • 00:51:06 filters basically I'm going to in the
  • 00:51:11 convolutional layer I'm going to play
  • 00:51:13 around with how many filters we use so
  • 00:51:17 HP tough choice and I'm going to say we
  • 00:51:22 could use maybe 16 32 or 64 and complete
  • 00:51:28 that and then we'll still use a kernel
  • 00:51:31 size of 3 and we'll still use activation
  • 00:51:34 Rella you but basically now what we're
  • 00:51:37 doing if we were to rerun the search
  • 00:51:39 with this is we're deciding whether we
  • 00:51:43 add 0 convolutional layers or up to 3
  • 00:51:47 convolutional layers and within each
  • 00:51:49 convolutional layers we're also playing
  • 00:51:51 around with the number of filters that
  • 00:51:54 we use by using this HP choice so you
  • 00:51:56 can get really creative with this and as
  • 00:51:58 i mentioned if you play around to this
  • 00:52:00 enough i think i got up to like 82
  • 00:52:02 percent accuracy on this test set by
  • 00:52:04 just playing around with these values so
  • 00:52:06 you could rerun this and play around are
  • 00:52:08 things it might be take a little bit of
  • 00:52:10 time to really run this and also one
  • 00:52:13 thing to note is the tuner I think by
  • 00:52:15 default if it sees you have results
  • 00:52:17 already saved you might have to either
  • 00:52:19 delete your results or save it in a
  • 00:52:21 different directory here just because it
  • 00:52:27 doesn't want to run this time intensive
  • 00:52:29 search process again if you've already
  • 00:52:32 gotten results so it's kind of a safety
  • 00:52:34 thing that caris tuner does
  • 00:52:37 and also I'll link to some other videos
  • 00:52:39 I saw one by Centex and one by Krish
  • 00:52:42 Naik that did a good job explaining this
  • 00:52:45 Kerris tuner as well so I'll link to
  • 00:52:47 those but that's all we're going to
  • 00:52:48 cover for the Charis tuner I'd end this
  • 00:52:52 video we're gonna rapid-fire go through
  • 00:52:53 some things that will be helpful as you
  • 00:52:55 try to transition the knowledge you
  • 00:52:56 might have gained in this video to other
  • 00:52:58 projects so the first thing that's
  • 00:53:00 useful is how do we save and load our
  • 00:53:06 models so we can do that by just doing
  • 00:53:09 let's grab the best model from the
  • 00:53:12 tuning step we can do best model that's
  • 00:53:14 save and we could just call this like in
  • 00:53:19 our current directory my model and so
  • 00:53:22 this would save all the parameters in
  • 00:53:23 that model and then you could reload it
  • 00:53:27 with load so like I'll just call this
  • 00:53:32 loaded model equals Karis thought models
  • 00:53:38 dot load and sorry load model and then
  • 00:53:43 we just need to do that same path so my
  • 00:53:45 model and one thing to note is that with
  • 00:53:48 Google collab I don't think that
  • 00:53:49 anything will persist so anything you
  • 00:53:51 save here won't stay saved next time you
  • 00:53:54 run this but if you were to do this
  • 00:53:56 locally on your machine you would have
  • 00:53:59 this model saved there so as you can see
  • 00:54:01 I could do loaded model dot evaluate
  • 00:54:11 and get the same results as before so
  • 00:54:14 that's something useful to know another
  • 00:54:16 thing that is useful as you try to carry
  • 00:54:19 this into your own projects is how do we
  • 00:54:21 plot numpy arrays as images how do we
  • 00:54:24 see those images at from the numpy so
  • 00:54:27 I'm going to say plot image from numpy
  • 00:54:32 array this is something useful want to
  • 00:54:35 do and it's actually pretty easy to do
  • 00:54:38 this so let's say we want to plot one of
  • 00:54:40 our trained images so train images 0
  • 00:54:42 dot shape I'm going to just look at real
  • 00:54:45 quick so it's 300 by 300 by 1 well we
  • 00:54:50 can do is just go ahead and do we need
  • 00:54:54 to import matplotlib which we might have
  • 00:54:57 already done maybe not
  • 00:54:58 let's see plot that show yeah I think we
  • 00:55:05 already did import matplotlib earlier in
  • 00:55:08 this video so using map plot live we can
  • 00:55:10 do plot dot image show of train images 0
  • 00:55:16 and let's see what happens
  • 00:55:20 invalid shape for image data so it
  • 00:55:22 doesn't like the 300 by 300 by 1 but if
  • 00:55:25 we pass in the color map and just let it
  • 00:55:29 know that we're only using Gray's here I
  • 00:55:32 think it should be fine
  • 00:55:35 okay it still doesn't like that I think
  • 00:55:37 it's the weirdness of having this extra
  • 00:55:39 one here so what we can do is it just
  • 00:55:44 need it to be 300 by 300 and not have
  • 00:55:46 any color channel so we're gonna go
  • 00:55:49 ahead and do image equals train images 0
  • 00:55:58 dot reshape 300 300 and now we'll try to
  • 00:56:03 just plot that image
  • 00:56:07 and as you can see we've now plotted the
  • 00:56:10 image in greyscale what we can also do
  • 00:56:13 if let's say we wanted to plot it in RGB
  • 00:56:19 scale we could go ahead and just kind of
  • 00:56:23 looking back from the initial thing we
  • 00:56:26 did so this is just going to be RBG RGB
  • 00:56:28 test I'm going to say our RGB image
  • 00:56:32 equals um P dot array and this is just
  • 00:56:36 copying from earlier example image dot
  • 00:56:43 numpy for example in DS train I'm going
  • 00:56:53 to say take 1 this will only take like
  • 00:56:56 run it for one image so I'll just go a
  • 00:56:58 lot quicker and this is RGB images so
  • 00:57:02 that's going to be an array so RGB image
  • 00:57:05 equals RGB images 0
  • 00:57:10 and now I can also do plot it would show
  • 00:57:14 RGB image and now I don't need a color
  • 00:57:18 map because this is RGB scale and look
  • 00:57:22 at that now I got the image in with
  • 00:57:26 color and just to show you what rvg RGB
  • 00:57:30 image is Dutch shape that is 300 by 300
  • 00:57:36 by 3 so that's plotting images from a
  • 00:57:39 nut B array the other thing you might
  • 00:57:40 want to do is how do we load in a PNG
  • 00:57:44 image or a JPEG image and actually
  • 00:57:46 convert that to numpy so that's another
  • 00:57:48 useful thing so convert PNG / JPEG
  • 00:57:54 images to numpy format that's also
  • 00:57:58 useful we can do this very easily with a
  • 00:58:03 library called image IO so import image
  • 00:58:06 IO and then what we're going to do is
  • 00:58:11 image equals image IO image read and now
  • 00:58:15 you would pass in the path so this would
  • 00:58:17 be path
  • 00:58:18 to image dot like PNG but if you don't
  • 00:58:24 have any images stored or you could also
  • 00:58:26 just read images directly from the
  • 00:58:28 Internet
  • 00:58:29 so if I just went to Wikipedia let's say
  • 00:58:35 and looked up Boston that's where I'm
  • 00:58:38 from
  • 00:58:39 I could just take a picture from like
  • 00:58:42 Wikipedia so you know here is our
  • 00:58:45 capital building so I'm gonna copy the
  • 00:58:49 image address so if I go to that address
  • 00:58:52 you see we have that image so I'm going
  • 00:58:55 to just copy this address and one thing
  • 00:58:57 that's pretty cool is that in the image
  • 00:58:58 read method you can actually just pass
  • 00:59:01 in URLs it works and now we have that
  • 00:59:10 Boston image load it in through the
  • 00:59:14 command here I can go ahead and plot the
  • 00:59:19 image show the image just to show you
  • 00:59:21 that we've read it in and as you see if
  • 00:59:26 that Boston image and then the final
  • 00:59:29 thing is we want to convert it to numpy
  • 00:59:31 so what we would do there is just let's
  • 00:59:35 just print out the type of image it's an
  • 00:59:40 image i/o file but to convert it to
  • 00:59:42 numpy we can just do image dot and P
  • 00:59:45 equals NP dot as array and just pass in
  • 00:59:50 the image and now if we print image NP
  • 00:59:56 that's shape or something like that
  • 01:00:00 we'll see the dimensions of that image
  • 01:00:02 and basically all of our image
  • 01:00:05 information is now stored in this image
  • 01:00:07 NP variable so you could imagine using
  • 01:00:10 this on your own set of images to build
  • 01:00:16 a neural network from scratch so it's
  • 01:00:17 another useful thing to know all right
  • 01:00:20 with that we're gonna end this video
  • 01:00:21 here hopefully you guys enjoyed this one
  • 01:00:23 and hopefully you learn something in the
  • 01:00:25 description of this video I'll link to
  • 01:00:27 some resources to learn more about
  • 01:00:29 convolutional neural net
  • 01:00:30 because I think particularly with
  • 01:00:31 convolutional neural networks it's very
  • 01:00:34 useful to kind of leave read some
  • 01:00:35 literature about them to kind of
  • 01:00:37 ultimately use them as effectively as
  • 01:00:39 possible so check the description for
  • 01:00:41 that if you did enjoy this video it mean
  • 01:00:43 a lot to me if you throw it a thumbs up
  • 01:00:44 and if you haven't already please
  • 01:00:46 subscribe to the channel also check me
  • 01:00:49 out on the other socials Instagram and
  • 01:00:52 Twitter I post probably more frequently
  • 01:00:55 in those places so if you want to stay
  • 01:00:56 up to date with everything I'm doing
  • 01:00:57 check there
  • 01:00:58 anyway this guys has been fun thank you
  • 01:01:01 for watching till next time peace out
  • 01:01:08 [Music]