Coding

Introduction to Neural Networks in Python (what you need to know) | Tensorflow/Keras

  • 00:00:00 hey how's it going everyone and welcome
  • 00:00:02 back to another video in this video
  • 00:00:04 we're going to talk all about neural
  • 00:00:05 networks in Python so we're gonna start
  • 00:00:07 with an overview of the important
  • 00:00:09 concepts and with neural networks I
  • 00:00:11 really think it's important to
  • 00:00:12 understand how they work at a high level
  • 00:00:13 so we'll walk through the basics of how
  • 00:00:15 they work and also we'll go through some
  • 00:00:17 information on network architecture
  • 00:00:20 hyper parameters and activation
  • 00:00:22 functions once we're done with that
  • 00:00:23 we're gonna jump into some code and so
  • 00:00:25 the first coding section we'll walk
  • 00:00:27 through the basics of writing neural
  • 00:00:29 networks with the Charis library and
  • 00:00:31 we'll kind of go through some rapid-fire
  • 00:00:32 examples of actually building models
  • 00:00:35 with that and then the second section
  • 00:00:37 will be a real-world problem and so in
  • 00:00:39 that section we're gonna go through
  • 00:00:41 building a neural network to
  • 00:00:42 automatically classify images as rock
  • 00:00:45 paper or scissors before we get started
  • 00:00:48 though I want to give a quick shout-out
  • 00:00:50 to this videos sponsor and that is kite
  • 00:00:53 kite is a code completion tool for
  • 00:00:56 python that uses machine learning to
  • 00:00:58 find the best suggestions
  • 00:01:00 kites completions are sorted by
  • 00:01:01 relevance instead of popularity or the
  • 00:01:04 alphabet and it can significantly speed
  • 00:01:07 up your development time by completing
  • 00:01:09 up to full lines of code kite integrates
  • 00:01:11 with the most popular Python editors
  • 00:01:13 like atom vs code sublime vim pycharm
  • 00:01:19 and spider and the best part about kite
  • 00:01:22 is that it's completely free to download
  • 00:01:23 to get started I've left a link in the
  • 00:01:26 description I've been using kite for
  • 00:01:27 about three or four months at this point
  • 00:01:30 and it's been a lot of fun to use so I
  • 00:01:32 definitely recommend giving it a shot
  • 00:01:34 alright to get started let's talk a
  • 00:01:36 little bit about why we use neural
  • 00:01:37 networks in the first place and I think
  • 00:01:40 that this can be pretty well explained
  • 00:01:41 through a couple visual examples so
  • 00:01:45 imagine you have the graph that looks
  • 00:01:46 like this and you know you have these
  • 00:01:48 red dots and these blue points and we're
  • 00:01:50 trying to build a classifier to
  • 00:01:52 automatically classify red dots and blue
  • 00:01:55 dots correctly so this first example is
  • 00:01:57 pretty straightforward we could simply
  • 00:01:59 draw a line between them and get perfect
  • 00:02:02 classification so a more slightly more
  • 00:02:04 complicated example imagine we have now
  • 00:02:06 these two sets of curved points in this
  • 00:02:11 case you know it's definitely not as
  • 00:02:13 trivial as the line but if we use a
  • 00:02:16 quadratic equation here we can once
  • 00:02:19 again pretty easily perfectly classify
  • 00:02:21 the red dots and the blue dots but this
  • 00:02:24 leads us to the real significant use
  • 00:02:27 case of neural networks and in reality
  • 00:02:30 oftentimes our data is not as nicely
  • 00:02:33 separated as this often times our data
  • 00:02:36 looks a little bit more like something
  • 00:02:38 like this where you know there's red
  • 00:02:40 dots all over the place seemingly kind
  • 00:02:43 of random with blue dots scattered
  • 00:02:46 between them visually looking at this
  • 00:02:48 graph we can draw some lines and
  • 00:02:50 separate the red from the blue but
  • 00:02:53 training and classifier to automatically
  • 00:02:55 do that that is not so trivial of a task
  • 00:02:58 but neural networks can do this neural
  • 00:03:01 networks can find patterns and find
  • 00:03:03 groups within the data and kind of pull
  • 00:03:06 out
  • 00:03:07 and so that's why they're so powerful so
  • 00:03:09 let's get into kind of what a neural
  • 00:03:11 network looks likes and some of the
  • 00:03:13 basics
  • 00:03:13 so using that last example that last
  • 00:03:16 graph is an example imagine we're trying
  • 00:03:19 to you know classify blue and red points
  • 00:03:20 properly well all neural networks are
  • 00:03:22 going to start with a input layer and in
  • 00:03:25 this case the input layer would be
  • 00:03:27 two-dimensional it would just be the x
  • 00:03:30 coordinate and the y coordinate of the
  • 00:03:32 point that we're trying to classify as
  • 00:03:34 red or blue next all of our neural
  • 00:03:36 networks are going to have some hidden
  • 00:03:38 layers and so this example we have two
  • 00:03:41 hidden layers each of for four neurons
  • 00:03:44 all these neurons here are communicating
  • 00:03:47 with the input layer the values from the
  • 00:03:49 input layer are being passed through
  • 00:03:51 weights in these connections and then
  • 00:03:54 you know red on this side and then
  • 00:03:56 further more pass to the next layer and
  • 00:03:58 that finally leads them to being passed
  • 00:04:01 to a output layer and in this case our
  • 00:04:04 output layer would be determining
  • 00:04:06 whether something a dot is red or blue
  • 00:04:09 with a certain degree of confidence
  • 00:04:11 really what a neural network is doing is
  • 00:04:13 it's updating these weights between the
  • 00:04:16 connections to hopefully be able to
  • 00:04:20 properly classify a graph like this so
  • 00:04:23 what's going to start out is you know
  • 00:04:25 the neural network is gonna have no idea
  • 00:04:27 about how to classify it it might kind
  • 00:04:28 of just draw a line and say everything
  • 00:04:31 to the left of the line is blue and
  • 00:04:32 everything to the right is red and
  • 00:04:34 that's not gonna be very accurate but it
  • 00:04:36 gives us a starting point and so valley
  • 00:04:38 is gonna be coming through this network
  • 00:04:40 and they're gonna have a certain degree
  • 00:04:41 of confidence so let's say we were
  • 00:04:43 looking at a red example here our neural
  • 00:04:45 network shouldn't give us kind of
  • 00:04:46 predictions like with fifty five percent
  • 00:04:48 chance it's red forty five percent
  • 00:04:50 chances blue and what we're trying to
  • 00:04:53 get is that that percentage value that
  • 00:04:55 confidence value here is as close to the
  • 00:04:57 actual value so in this case if we're
  • 00:04:59 looking at a red example this would be a
  • 00:05:01 one and this would be a zero based on
  • 00:05:03 this calculation and we're going to see
  • 00:05:05 that okay we weren't 1 and zero as we
  • 00:05:08 were supposed to be there were some loss
  • 00:05:09 involved and we're going to tell them
  • 00:05:11 weights to update accordingly and that's
  • 00:05:14 going to lead us to kind of a new
  • 00:05:16 separation in our
  • 00:05:17 and once again our you know our values
  • 00:05:21 as we get more and more examples are
  • 00:05:22 gonna be updating we're gonna start to
  • 00:05:23 get more confident about what's coming
  • 00:05:25 in I'm going to keep updating this graph
  • 00:05:27 and the value is going to get kind of
  • 00:05:28 more and more confident if there is
  • 00:05:32 actually you know a separation in the
  • 00:05:34 data they'll kind of converge to better
  • 00:05:36 and better values and ultimately we hope
  • 00:05:39 that if we train it enough we get
  • 00:05:41 something that looks like this where the
  • 00:05:44 data seems to be pretty well fit by our
  • 00:05:46 network and values that come through it
  • 00:05:48 will be pretty confidently predicted as
  • 00:05:51 red or blue correctly if you want a
  • 00:05:53 really good visual explanation of how
  • 00:05:55 these neural networks work I definitely
  • 00:05:57 recommend checking out the video series
  • 00:05:59 that three blue one Brown did on the
  • 00:06:01 topic he animates it beautifully and it
  • 00:06:05 definitely can help drill down some of
  • 00:06:06 these high level concepts before we move
  • 00:06:08 into actually coding networks next let's
  • 00:06:12 talk about the hyper parameters of this
  • 00:06:14 network there's several different
  • 00:06:15 aspects that were kind of able to adjust
  • 00:06:18 within our network I think the the most
  • 00:06:20 obvious one is going to be the number of
  • 00:06:23 hidden layers and the number of neurons
  • 00:06:24 per layer some of the other ones are
  • 00:06:26 going to be the batch size so how many
  • 00:06:28 data points are we passing through the
  • 00:06:30 network each update step so we're not
  • 00:06:33 gonna be passing in a single point
  • 00:06:35 usually we're gonna maybe be passing in
  • 00:06:36 sixteen points into our network or 32 or
  • 00:06:39 64
  • 00:06:40 so if batch size determines that
  • 00:06:42 optimizer so how does our network learn
  • 00:06:44 and it's an algorithm to update the
  • 00:06:46 neural map again one thing I wanted to
  • 00:06:48 note is that you can usually use atom as
  • 00:06:50 a pretty safe bet to your optimizer and
  • 00:06:52 that also leads us to the learning rate
  • 00:06:54 so how much do these weights update each
  • 00:06:58 time we see a batch of inputs and so if
  • 00:07:02 you adjust that higher it's going to
  • 00:07:04 update with the greater magnitude if
  • 00:07:06 it's a lower learning rate updates are
  • 00:07:08 gonna be smaller so we can play around
  • 00:07:10 with that as a hyper parameter another
  • 00:07:12 hyper parameter that's important to note
  • 00:07:14 is dropout so one thing that we find
  • 00:07:16 helps our networks generalize better is
  • 00:07:18 if we randomly basically disconnect
  • 00:07:20 nodes with a certain probability
  • 00:07:22 basically what this is doing is that if
  • 00:07:25 we're dropping out nodes randomly the
  • 00:07:27 rest of the network has to step up do
  • 00:07:29 more
  • 00:07:29 or an influence more I can't rely on a
  • 00:07:31 single node to to learn everything and
  • 00:07:34 this helps because we're not going to
  • 00:07:35 see all the data that's in the wild and
  • 00:07:37 drop out kind of helps us simulate some
  • 00:07:39 of those conditions that we can actually
  • 00:07:41 see on our own
  • 00:07:43 another important hyper parameter should
  • 00:07:45 know is a pox and that is how many times
  • 00:07:47 are we going through our data while we
  • 00:07:49 train it so this is another parameter
  • 00:07:51 that you couldn't adjust so a question
  • 00:07:54 that's asked a lot is you know how do we
  • 00:07:56 choose these layers the neurons the
  • 00:07:57 hyper parameters well the biggest thing
  • 00:08:01 I would say is use your training
  • 00:08:02 performance to guide your decisions if
  • 00:08:05 you are getting a high accuracy on
  • 00:08:08 training data but not on a validation
  • 00:08:10 set then you're kind of overfitting to
  • 00:08:13 your your overfitting to your training
  • 00:08:15 data and you probably should reduce the
  • 00:08:16 number of parameters if you're getting a
  • 00:08:20 low accuracy on training like a
  • 00:08:21 relatively low accuracy and you think
  • 00:08:22 you can boost that up you might be
  • 00:08:24 underfitting the data and maybe you
  • 00:08:26 should increase the number of parameters
  • 00:08:27 and what I will mention is that there's
  • 00:08:29 no exact science to this it's going to
  • 00:08:31 be a lot of kind of tweaking numbers and
  • 00:08:33 tweaking values and that's just the
  • 00:08:35 nature of building neural networks and
  • 00:08:38 don't worry that you don't feel like you
  • 00:08:39 are confident about everything you're
  • 00:08:41 doing a lot of it is just playing around
  • 00:08:43 in testing things and then another way
  • 00:08:46 we can choose hyper parameters is using
  • 00:08:49 some automatic search methods to kind of
  • 00:08:53 help test us help us test a lot of
  • 00:08:55 different values at once
  • 00:08:56 and ultimately choose the best combo of
  • 00:08:59 things and so with like SK learn you
  • 00:09:02 could use grid search CV to help us do
  • 00:09:04 that and I think we'll get to that in
  • 00:09:06 the actual coding section of the
  • 00:09:07 tutorial one thing I haven't mentioned
  • 00:09:09 yet but are very important to how our
  • 00:09:11 neural networks function our activation
  • 00:09:14 functions activation functions introduce
  • 00:09:17 non-linearity into our network
  • 00:09:19 calculations and that might not make
  • 00:09:22 mean anything to you but what that
  • 00:09:24 really comes down to is in like our
  • 00:09:27 example here what we were doing if we
  • 00:09:29 were trying to build a network around
  • 00:09:30 this data is ultimately we're taking our
  • 00:09:34 node values so our kind of input values
  • 00:09:37 multiplying them by some weight and
  • 00:09:39 adding all those values together
  • 00:09:42 to get kind of the output of the node
  • 00:09:44 and what an activation function does is
  • 00:09:47 it allow it to this node value instead
  • 00:09:50 of just adding input values times
  • 00:09:53 weights it adds this non-linearity and
  • 00:09:57 basically adds complexity into the
  • 00:10:00 values that we output out of that node
  • 00:10:01 so to kind of sum that up in the
  • 00:10:06 activation function is what allows us to
  • 00:10:08 fit our neural networks to more complex
  • 00:10:10 data and do some really exciting things
  • 00:10:12 so allows us basically to fit two data
  • 00:10:15 that looks you know more complex like
  • 00:10:17 this more easily and another question I
  • 00:10:21 hear asked a lot is what activation
  • 00:10:24 functions should I use well in general I
  • 00:10:28 would say it's a pretty safe bet to go
  • 00:10:30 with the rel you and your hidden layers
  • 00:10:33 there's this concept in neural networks
  • 00:10:36 called vanishing gradients basically
  • 00:10:38 when you're updating those weights and
  • 00:10:41 how far back you can update the weights
  • 00:10:43 and learn from your training data rel
  • 00:10:46 you activation function helps avoid that
  • 00:10:49 vanishing gradient problem so in general
  • 00:10:51 I would say for hidden layers really use
  • 00:10:53 a safe bet and then what I would say
  • 00:10:55 though is for your output layer if
  • 00:10:58 you're classifying a single label you're
  • 00:11:01 doing like you have you know red blue
  • 00:11:04 yellow green and you just need to
  • 00:11:05 classify each point as one of those soft
  • 00:11:08 max is a good bet but if you wanted to
  • 00:11:12 maybe classify things could be red and
  • 00:11:15 blue at the saint-like there could be
  • 00:11:17 both red and blue for multi-label
  • 00:11:21 classification the sigmoid activation
  • 00:11:22 function is a pretty good bet and then
  • 00:11:27 the last thing I want to get into Before
  • 00:11:29 we jump into code is a quick overview of
  • 00:11:31 tensorflow / Kaos versus PI torch and in
  • 00:11:35 this tutorial we're gonna be using
  • 00:11:37 Kerris and it's great for getting
  • 00:11:39 started quickly and really rapid
  • 00:11:41 experimentation with neural networks
  • 00:11:43 you're gonna find as you get more and
  • 00:11:45 more advanced that it lacks the complete
  • 00:11:47 control and customization that pi torch
  • 00:11:50 and kind of the more full version of
  • 00:11:52 tensorflow has tensorflow is been
  • 00:11:55 storica li the most popular framework
  • 00:11:57 for industry but i would say that it can
  • 00:11:59 get pretty complicated and the
  • 00:12:02 documentation for it isn't always
  • 00:12:03 consistent myself personally I'm not the
  • 00:12:05 most experienced with pencil flow what
  • 00:12:07 I'm usually using if I'm doing more
  • 00:12:10 complex neural network stuff and what
  • 00:12:11 I'd use for my master's thesis
  • 00:12:13 ultimately when I was working on
  • 00:12:15 different types of networks is PI torch
  • 00:12:17 and this has kind of been for a while
  • 00:12:20 the favorite of the research and
  • 00:12:21 academic community it has a very
  • 00:12:23 pythonic syntax and you can easily
  • 00:12:25 access values at any point in the
  • 00:12:29 network to start off the coding section
  • 00:12:31 of this tutorial I would say the best
  • 00:12:33 way to learn is by doing so we're gonna
  • 00:12:34 jump straight into some examples of
  • 00:12:37 building neural networks and through
  • 00:12:39 that you should kind of build up the
  • 00:12:40 fundamentals of what you need to know
  • 00:12:43 with tensor flow and caris so if you go
  • 00:12:46 to my github page get up comm slash
  • 00:12:48 Keith galley slash neural dash nets and
  • 00:12:50 this is linked to in the description
  • 00:12:52 I've left some examples there so we're
  • 00:12:55 gonna build a neural nets for each one
  • 00:12:56 of these examples in this folder and
  • 00:12:59 what the task is is just like what I
  • 00:13:01 introduced at the start of the video so
  • 00:13:03 we'll start with this linear example and
  • 00:13:05 just basically write a simple neural net
  • 00:13:07 to properly classify the red and blue
  • 00:13:09 points so let's download this github
  • 00:13:12 repo and there's two ways to do this I
  • 00:13:14 would recommend forking it and cloning
  • 00:13:17 it locally and I have instructions on
  • 00:13:19 how to do that right here but the other
  • 00:13:21 option you can go with is just simply
  • 00:13:24 downloading the zip and then take this
  • 00:13:26 and extract it to wherever you want to
  • 00:13:28 write the code the last thing I'll say
  • 00:13:30 before we start writing code is make
  • 00:13:32 sure you have tensorflow installed and
  • 00:13:33 probably the easiest way to do that is
  • 00:13:35 by installing the anaconda distribution
  • 00:13:38 and so I'll have a link in the
  • 00:13:39 description on how to install that
  • 00:13:41 alright and I'll be using sublime text
  • 00:13:44 is my head in there but we want to go
  • 00:13:46 into that folder that we just created
  • 00:13:49 wherever you extracted your files or
  • 00:13:51 cloned your files and I'll start out by
  • 00:13:56 going into the examples going into
  • 00:13:59 linear and just saving a file a file
  • 00:14:03 called Network linear T why
  • 00:14:06 so we're writing a neural network the
  • 00:14:08 first thing not always is going to be to
  • 00:14:10 import the tensor full library of the PI
  • 00:14:12 torch library if you're using that and
  • 00:14:14 specifically for this example we're
  • 00:14:16 using Karis so we're gonna import Karis
  • 00:14:18 from tensor flow so from tensorflow
  • 00:14:21 import Kharis so that is good then we'll
  • 00:14:27 probably want to also import some helper
  • 00:14:30 libraries so I think that ones will be
  • 00:14:34 important right now are to import pandas
  • 00:14:36 SPD and import numpy as NP and one thing
  • 00:14:41 I want to quickly mention is I'm using
  • 00:14:43 kite copilot over here so it's a really
  • 00:14:46 nice feature okay and basically as I
  • 00:14:49 type things out this will follow my
  • 00:14:52 cursor and pull up documentation on the
  • 00:14:56 Associated code all right and the other
  • 00:15:00 thing to note real quick is that in that
  • 00:15:03 same linear folder that has the picture
  • 00:15:07 of the example and where we just saved
  • 00:15:08 our file there's also two data files
  • 00:15:10 there's a training data set and a test
  • 00:15:13 data set that is producing the graph
  • 00:15:15 that you see here so we need a load that
  • 00:15:19 did that data so our files right here so
  • 00:15:22 we have to go into data and then load in
  • 00:15:25 training to start and then we'll also do
  • 00:15:27 the same with the test so we'll load in
  • 00:15:30 the CSV file with pandas so I'll do
  • 00:15:32 pandas read CSV and then the path I just
  • 00:15:36 showed you was the data folder and then
  • 00:15:39 train dot CSV file and I'll make all
  • 00:15:42 this code slightly bigger so you can see
  • 00:15:44 it more clearly and we'll probably want
  • 00:15:47 to save that as some time so we'll just
  • 00:15:49 call this the training data frame equals
  • 00:15:52 that and we can confirm that it's loaded
  • 00:15:55 by doing Train D F dot head oh and we
  • 00:15:58 should probably print that
  • 00:16:04 okay cool so yeah we have an x-point
  • 00:16:06 Y point and then the color and the color
  • 00:16:08 is just a zero or one argument here so
  • 00:16:11 we have it loaded in that's good so now
  • 00:16:14 we can go ahead and actually build a
  • 00:16:16 neural network around this training data
  • 00:16:18 and to do that we'll want to start out
  • 00:16:20 by defining a Karass sequential type and
  • 00:16:24 what this sequential is saying is that
  • 00:16:27 we have a certain number of layers in
  • 00:16:29 our neural network so this sequential
  • 00:16:31 here is allowing us to list the
  • 00:16:34 different layers that we have in our
  • 00:16:35 network so we're gonna go ahead and
  • 00:16:37 start defining layers and so to access
  • 00:16:42 layers in carrots you can do Karis
  • 00:16:43 layers dot and then all the types you
  • 00:16:47 see here are different types of layers
  • 00:16:49 you can add we're focused on a fully
  • 00:16:51 connected feed-forward network and
  • 00:16:53 that's going to be defined by this dense
  • 00:16:56 keyword next here are a couple different
  • 00:17:00 things we can pass into dense and you
  • 00:17:02 can see these different keyword
  • 00:17:06 arguments over here on the right in the
  • 00:17:08 kite copilot window but the first thing
  • 00:17:11 that the only thing that's required is
  • 00:17:12 the number of units we want to use and
  • 00:17:15 with this what you're gonna want to do
  • 00:17:18 is actually define your first layer as
  • 00:17:20 your first hidden layer and you'll see
  • 00:17:23 why in a second but I can say for let's
  • 00:17:26 say we just want Ford no neurons in our
  • 00:17:28 first hidden layer next we're going to
  • 00:17:30 pass in and I'm looking at this
  • 00:17:32 documentation to kind of guide me a
  • 00:17:34 little bit to is the input shape so
  • 00:17:37 we're actually passing our input shape
  • 00:17:39 into this layer and so our input if we
  • 00:17:43 remember is x and y so that's going to
  • 00:17:45 have a single dimension of two then
  • 00:17:49 another thing we can pass into this
  • 00:17:51 dense layer is our activation function
  • 00:17:54 and as I mentioned at the start of this
  • 00:17:55 video usually a safe bet for the
  • 00:17:57 activation is a rel u unit and there we
  • 00:18:02 go we've now defined a input layer of
  • 00:18:06 two no neurons that feeds into a hidden
  • 00:18:10 layer of four neurons and that Hinn
  • 00:18:12 layer four neurons has a rel u
  • 00:18:14 activation
  • 00:18:16 and then let's say let's make this first
  • 00:18:20 example because the data is very simple
  • 00:18:22 let's make our first example very simple
  • 00:18:25 and we're just going to feed this in
  • 00:18:26 this one hidden layer into our output
  • 00:18:30 layer and our output layer is too
  • 00:18:32 because colors can either be red or blue
  • 00:18:34 and on our data this looks like 0 or 1
  • 00:18:37 but right here that's our first neural
  • 00:18:40 network it's a two node neurons to four
  • 00:18:43 neurons to two neurons so let's actually
  • 00:18:46 fit our model to that and we're gonna
  • 00:18:48 first want to compile our model and this
  • 00:18:51 is going to tell us how to train it so
  • 00:18:54 we're gonna want to use the atom
  • 00:18:56 optimizer and you'll see again with this
  • 00:19:01 compile often times I forget the exact
  • 00:19:03 syntax of how to write these Kerris
  • 00:19:06 networks but if I go to compile and then
  • 00:19:10 I look at my notes over here we see we
  • 00:19:12 need to pass in an optimizer the safe
  • 00:19:14 bet here is to use atom and then next we
  • 00:19:17 can go ahead and define a loss function
  • 00:19:19 for our network and to do that we're
  • 00:19:22 gonna wanted to go caris dot losses and
  • 00:19:26 then we'll see we have a couple
  • 00:19:27 different options here but with losses
  • 00:19:30 specifically I think sometimes it's it's
  • 00:19:32 nice to get a little bit more
  • 00:19:33 information and if I click on these
  • 00:19:35 doesn't tell us too too much about the
  • 00:19:38 type of loss here so what we're going to
  • 00:19:41 do is actually go to the tensor fluoro
  • 00:19:43 documentation and I want the losses page
  • 00:19:46 and as you can see there's a bunch of
  • 00:19:48 different options and the two popular
  • 00:19:51 members we saw here was categorical
  • 00:19:52 cross entropy and spares categorical
  • 00:19:54 across entropy it's unclear to me just
  • 00:19:57 reading that what the difference is here
  • 00:19:59 so that's a good thing we can kind of
  • 00:20:01 check on the losses page and the tense
  • 00:20:05 of flow documentation so first I'll
  • 00:20:08 click on categorical cross entropy so it
  • 00:20:12 computes the cross entropy loss between
  • 00:20:13 the labels and predictions so as you can
  • 00:20:18 see here this was 0.9 this was point
  • 00:20:20 zero five and point zero five the actual
  • 00:20:22 was one zero zero so we compute the loss
  • 00:20:25 on that
  • 00:20:26 you know the difference is between that
  • 00:20:28 that sounds pretty good but the one
  • 00:20:30 issue we have here is the way that these
  • 00:20:33 are encoded is a little bit strange
  • 00:20:35 they're called they're encoded in what's
  • 00:20:37 called one-hot representations so
  • 00:20:40 basically this is saying label 1 this is
  • 00:20:42 saying label 2 this is saying label 3
  • 00:20:44 and in our data that we are looking at
  • 00:20:46 this was a float value of just 0 or 1 so
  • 00:20:50 what's going to be good for us to do
  • 00:20:52 here is actually check what that space
  • 00:20:55 categorical cross entropy was and the
  • 00:20:58 difference between the two is it says
  • 00:21:00 the same exact thing as the last loss
  • 00:21:02 but the key difference here is that it
  • 00:21:06 says use this cross entropy loss when
  • 00:21:08 there are two more labels and that's
  • 00:21:10 good for us and we expect the labels to
  • 00:21:13 be provided as integers instead of one
  • 00:21:16 hot representation so we can pass in
  • 00:21:20 integers here and we don't have to
  • 00:21:22 encode them in one hot representations
  • 00:21:25 so that's good for us so we'll define
  • 00:21:28 that spares categorical cross entropy so
  • 00:21:31 I'll do dots Perce categorical cross
  • 00:21:37 entropy and let's see find it like this
  • 00:21:49 and to be honest if you saw the
  • 00:21:52 autocomplete I was suggesting I format
  • 00:21:54 it like this and I don't quite know the
  • 00:21:57 difference between this I just know in
  • 00:21:59 all the examples that I've worked
  • 00:22:00 through I define it with this
  • 00:22:02 representation and then the last thing
  • 00:22:05 with this that we will want to say is
  • 00:22:07 that from logits equals true so if I go
  • 00:22:12 back to the losses that was one of the
  • 00:22:16 options we had with the special
  • 00:22:17 categorical cross entropy and if you are
  • 00:22:20 curious what from logits equals true
  • 00:22:22 means I always recommend you know go
  • 00:22:26 ahead and do a Google search on this
  • 00:22:30 question and as you see I was going
  • 00:22:34 through the potential flow API Docs here
  • 00:22:36 and tension-filled documentation they
  • 00:22:37 use a keyword called logits what is it
  • 00:22:39 well we found a nice little answer here
  • 00:22:42 and it simply means that the function
  • 00:22:44 operates on the unscaled output of
  • 00:22:46 earlier layers and in particular that
  • 00:22:49 the sum of the inputs may not equal 1
  • 00:22:51 and that's what we want because we're
  • 00:22:54 using values that aren't necessarily one
  • 00:22:56 like if you look at our input values
  • 00:22:59 here you know these aren't going to be
  • 00:23:01 between 0 and 1 so we want to use the
  • 00:23:03 from logits equals true keyword and then
  • 00:23:07 the last thing we want to keep track of
  • 00:23:09 is a metric and we're going to use the
  • 00:23:12 metric of accuracy to see how our
  • 00:23:15 network does when we evaluate it so
  • 00:23:19 there we have compiled how the network's
  • 00:23:23 going to be trained and learn so it's
  • 00:23:25 using the atom optimizer that is
  • 00:23:28 figuring it is updating oh shoot I see I
  • 00:23:31 left a eye out of optimizer that updates
  • 00:23:36 the network based on the spares
  • 00:23:39 categorical cross entropy loss function
  • 00:23:45 now that we've done that we can actually
  • 00:23:46 go ahead and fit the training data to
  • 00:23:49 our network so as you can see here it's
  • 00:23:53 expecting an X Y and a batch size so our
  • 00:23:58 X values here are going to be the x and
  • 00:24:02 y coordinates and the values associated
  • 00:24:05 with them the Y value here is going to
  • 00:24:07 be the color 0 or 1 and the batch size
  • 00:24:11 we can kind of set how big we wanted
  • 00:24:13 that to be so let's just say batch size
  • 00:24:16 equals 16 to start so now what are our x
  • 00:24:21 and y well looking over here at the
  • 00:24:23 documentation we see that the argument X
  • 00:24:26 is the input data and it's expecting a
  • 00:24:30 type of a numpy array or a list of
  • 00:24:34 arrays
  • 00:24:34 you also could pass in a tensor flow
  • 00:24:37 tensor or a list of tensors and then
  • 00:24:41 there's a couple other options we're
  • 00:24:43 going to focus on that numpy array and
  • 00:24:46 so right now we have our data in a data
  • 00:24:48 frame which obviously isn't a numpy
  • 00:24:50 array but what's really nice is that we
  • 00:24:52 can easily convert the data frame from
  • 00:24:56 pandas into a numpy array by just doing
  • 00:25:00 train DF dot the column that we want to
  • 00:25:04 access dot values and so this will get
  • 00:25:09 the data in numpy form and I can show
  • 00:25:12 you that by doing 0 to 5 and Train DF
  • 00:25:16 dot X Dogg values we can surround this
  • 00:25:20 with a type and then we can print both
  • 00:25:24 of these things out and print this and
  • 00:25:33 there's going to quickly comment all
  • 00:25:34 this out so it doesn't run see these are
  • 00:25:38 our X values in numpy form and the type
  • 00:25:41 is numpy so we see that that is correct
  • 00:25:44 so let's go ahead and start passing
  • 00:25:46 things into our network and we note we
  • 00:25:49 could have done the same exact thing
  • 00:25:50 with the color label and also the Y
  • 00:25:55 values
  • 00:25:58 and just to help us out I'm going to
  • 00:26:00 also print out that data frame again so
  • 00:26:08 for our y-value that is the color so we
  • 00:26:12 can go ahead and fill in train D F dot
  • 00:26:15 color dot values to get that in numpy
  • 00:26:18 form X is a little bit tricky because
  • 00:26:20 it's not only the x value it's the X and
  • 00:26:24 the y value because both of them are
  • 00:26:26 important in influencing whether or not
  • 00:26:28 graph something is red or blue dot so
  • 00:26:31 what we're gonna have to do here is
  • 00:26:33 actually stack those columns together so
  • 00:26:37 that they're paired up so I'll say x
  • 00:26:39 equals numpy column stack and we want to
  • 00:26:44 stack the values that are in the X
  • 00:26:48 column X dot values to get it a number
  • 00:26:51 before and the values that are in the Y
  • 00:26:53 column train DF dot y dot values so now
  • 00:26:58 what this is doing is its pairing up
  • 00:27:00 this as the kind of first input this is
  • 00:27:04 the second input this is the third input
  • 00:27:06 all these columns the X and the y column
  • 00:27:09 are now paired together with this column
  • 00:27:11 stack command and I can just pass an X
  • 00:27:14 here that's cool all right before we run
  • 00:27:18 this let's do one last sanity check on
  • 00:27:20 our code to make sure that we didn't
  • 00:27:21 miss anything so we load in the data we
  • 00:27:24 build a model around the data we've set
  • 00:27:27 up loss for our network and then we
  • 00:27:29 prepare our x values and one thing I do
  • 00:27:33 notice that we forgot to do and this is
  • 00:27:35 very very important whenever we're
  • 00:27:36 training a deep neural network and that
  • 00:27:39 is shuffle our training our training
  • 00:27:42 data so this is important because as we
  • 00:27:45 have it now all of these zero labeled
  • 00:27:47 colors are right in a row and basically
  • 00:27:49 when we're updating our network we'll
  • 00:27:51 have highly correlated examples right
  • 00:27:53 next to each other and our network is
  • 00:27:56 not going to have a real idea of what
  • 00:27:59 the data looks like in the wild so it's
  • 00:28:01 very important that we shuffle our data
  • 00:28:03 and the easiest way to do that is going
  • 00:28:05 to be to do NP dot random dot shuffle
  • 00:28:08 and we can pass directly in train DF
  • 00:28:11 values to have it shuffle everything
  • 00:28:14 that's in the Train DF data frame and
  • 00:28:17 this shuffle method works in place which
  • 00:28:20 means we don't need a reset train DF to
  • 00:28:24 be the results of that it will update
  • 00:28:26 behind the scenes alright so now we've
  • 00:28:28 shuffled it and we can confirm that it's
  • 00:28:30 been shuffled by rerunning train DF head
  • 00:28:33 and notice before we had all zeros and
  • 00:28:36 now we have ones mixed with zeros so we
  • 00:28:39 know that the order is now flipped
  • 00:28:41 that's good anything else we miss and
  • 00:28:44 one other thing I noticed is that let's
  • 00:28:46 add an activation to our output layer
  • 00:28:49 and with this what I mentioned was good
  • 00:28:52 is to either use for for the binary
  • 00:28:55 classification task we can either use
  • 00:28:58 sigmoid or softmax so let's use sigmoid
  • 00:29:01 in this case it won't really matter for
  • 00:29:05 this specific example and with that I
  • 00:29:08 think we're good oh ok it just went
  • 00:29:11 through one time on the data as you can
  • 00:29:14 see I classified 70% of the examples
  • 00:29:18 correctly let's see what happens if we
  • 00:29:21 decrease the batch size okay it does a
  • 00:29:25 lot better if it's learning on small I
  • 00:29:28 never make examples at a time another
  • 00:29:32 thing we could try doing is maybe make
  • 00:29:34 this a little bit bigger and as you can
  • 00:29:37 see it really I can learn quickly when
  • 00:29:41 we made it bigger let's reset it to 4
  • 00:29:43 and these are the types of hyper
  • 00:29:45 parameter testing we're gonna do
  • 00:29:47 in addition to bachelor's let's specify
  • 00:29:49 any number of epochs let's say 5 for a
  • 00:29:51 pox and look at that
  • 00:29:56 with the current settings we have here
  • 00:29:58 the network can classify a hundred
  • 00:30:00 percent of our training examples
  • 00:30:02 correctly and that makes sense given
  • 00:30:05 that our data looks like this it should
  • 00:30:09 be pretty easy to classify that the last
  • 00:30:12 step we want to do here though is
  • 00:30:14 actually evaluate on the test data and
  • 00:30:17 we can do this very similar to how we
  • 00:30:20 load it in the
  • 00:30:21 training data will say test DF equals PD
  • 00:30:25 read CSV of dot slash data slash test
  • 00:30:29 dot CSV and then we can say that test x
  • 00:30:34 equals NP column stack of the test DF
  • 00:30:41 dot X dot values and test DF dot y dot
  • 00:30:46 values it doesn't actually matter that
  • 00:30:48 the test data is shuffled because when
  • 00:30:50 we're evaluating a model after it's been
  • 00:30:51 trained you know we're not updating the
  • 00:30:53 network anymore so the order doesn't
  • 00:30:56 matter anymore because it's just
  • 00:30:57 performing classification all then the
  • 00:30:59 weights that have been set in the
  • 00:31:00 network so test X and then to evaluate
  • 00:31:04 we can use model dot evaluate and we'll
  • 00:31:09 want to evaluate test X on the test X
  • 00:31:13 label so that's when we test the app dot
  • 00:31:15 values per sorry dot color values so
  • 00:31:22 it's training again and then the final
  • 00:31:26 step here is the evaluation and as you
  • 00:31:29 can see this bottom part here is the
  • 00:31:33 evaluation run and as you can see I
  • 00:31:36 could just confirm that by doing print
  • 00:31:40 evaluation just training again
  • 00:31:48 and as you can see evaluation the stuff
  • 00:31:51 below that it does 100% accuracy on the
  • 00:31:54 test examples cool so that was example
  • 00:31:56 one
  • 00:31:57 let's rapid-fire go through some more
  • 00:31:59 examples the easiest way for us to test
  • 00:32:01 the next example is just to just save
  • 00:32:03 this file somewhere else so instead of
  • 00:32:06 the linear example we'll go to how about
  • 00:32:09 the quadratic example and we'll save
  • 00:32:12 network dot quadratic py here so this is
  • 00:32:21 the same exact network as before but now
  • 00:32:24 instead of the linear data we are
  • 00:32:28 training it on the quadratic data that
  • 00:32:31 looks like this
  • 00:32:35 alright so right off the bat one thing
  • 00:32:38 that's nice is how well would it do
  • 00:32:41 given the current network setup so now
  • 00:32:45 this is running on different data
  • 00:32:46 because we're running this function in
  • 00:32:48 that different folder and as we see here
  • 00:32:53 okay it only got 78 percent accuracy on
  • 00:32:56 this new data and with this graph you
  • 00:33:00 know we've added more complexity into
  • 00:33:06 the classification process so what were
  • 00:33:09 probably not to do here is add more more
  • 00:33:13 layers into that hidden layer hidden
  • 00:33:18 layer of the neural network so what
  • 00:33:20 happens if we bump from it from 4 to 16
  • 00:33:28 and as you can see when we bumped it
  • 00:33:31 from four to sixteen it did ninety
  • 00:33:35 percent accuracy so that was
  • 00:33:36 significantly better than before what
  • 00:33:38 else could we do here I mean we could
  • 00:33:41 honestly keep going up with this we
  • 00:33:44 could just say thirty-two here
  • 00:33:55 and now it's classifying 95% of the
  • 00:33:59 training examples correctly and maybe we
  • 00:34:01 would even learn more if we bump this up
  • 00:34:05 to ten epochs
  • 00:34:14 so now we're running through the
  • 00:34:15 training data more and more times and as
  • 00:34:20 you can see we're near perfect we're at
  • 00:34:23 98% accuracy this is a type of task we
  • 00:34:26 should be able to get 100% accuracy on
  • 00:34:27 because there's such a clear separation
  • 00:34:28 in the data so as one additional thing
  • 00:34:33 maybe we want to do is add a dropout
  • 00:34:36 layer and what we need to draw put this
  • 00:34:45 in here is a percentage that ran nodes
  • 00:34:48 are randomly dropped out so let's say
  • 00:34:51 0.2 20% is a common value you'll use for
  • 00:34:54 dropout you need out of a comma so now
  • 00:34:57 it goes from the hidden layer it drops
  • 00:35:00 out 20 percent of that
  • 00:35:02 32 node hidden layer okay so that it
  • 00:35:12 actually didn't improve our model
  • 00:35:14 another option we have would be to add
  • 00:35:17 another fully connected layer so layers
  • 00:35:21 dense let's say maybe we add another
  • 00:35:24 layer of 32 nodes also with an
  • 00:35:27 activation of rel you maybe that will
  • 00:35:33 give us 100% classification but honestly
  • 00:35:36 we're very close to what we're looking
  • 00:35:39 for
  • 00:35:53 and look at that yeah it didn't get all
  • 00:35:55 the training examples cracked but on the
  • 00:35:58 thousand test examples I got all of them
  • 00:36:01 so yeah that's a pretty good setup it
  • 00:36:04 seems like for the quadratic example and
  • 00:36:06 let's just keep this going next I
  • 00:36:08 recommend saving this file as will go to
  • 00:36:13 the clusters example so network clusters
  • 00:36:16 dot py will save this as and let's just
  • 00:36:23 look at the clusters data or the graph
  • 00:36:26 for the clusters it looks like this and
  • 00:36:29 so I think the one big thing to note
  • 00:36:33 here is that instead of now just red and
  • 00:36:35 blue dots were classifying six different
  • 00:36:38 colors so go ahead I would recommend
  • 00:36:41 trying this one on your own and seeing
  • 00:36:43 if you can tweak the network to get it
  • 00:36:44 to work for all six colors and note you
  • 00:36:47 might have to actually dig into the data
  • 00:36:49 a little bit because there's a slight
  • 00:36:50 nuance in this example on how we do that
  • 00:36:54 so maybe pause first I could try to do
  • 00:36:56 this on your own
  • 00:36:56 alright so I think right from the get-go
  • 00:36:59 we've saved this file in that correct
  • 00:37:01 directory last time we could just
  • 00:37:03 immediately run our file in the new spot
  • 00:37:06 and it just would work pretty well and
  • 00:37:10 here we have an issue it looks like
  • 00:37:13 let's see what her issue is now what
  • 00:37:17 could this issue be well let's actually
  • 00:37:18 look at the data real quick and I think
  • 00:37:21 it's safe enough that we could actually
  • 00:37:23 just look at what we printed out started
  • 00:37:25 this and here's the big difference with
  • 00:37:27 the last one right now we have colors
  • 00:37:31 that are written out of strings and so
  • 00:37:34 when it gets down to the fitting over
  • 00:37:36 here it's not going to know how to
  • 00:37:38 handle that so we're going to need to
  • 00:37:41 convert these strings that we have in
  • 00:37:43 our training data that we see printed
  • 00:37:45 here into a numpy array so probably just
  • 00:37:49 convert strings to colors to do some
  • 00:37:51 sort of mapping so that's not too hard
  • 00:37:53 so we can just do print train D F colors
  • 00:37:57 [Music]
  • 00:37:58 dot unique to get all the different
  • 00:38:01 colors that we have
  • 00:38:04 I'm gonna comment all this out real
  • 00:38:06 quick this is just color okay
  • 00:38:14 red blue green teal orange purple so all
  • 00:38:18 we're gonna do is just do a dictionary
  • 00:38:20 mapping from the string to a integer
  • 00:38:23 number so we can just do something like
  • 00:38:26 colored dict equals and we'll say red
  • 00:38:33 maps to zero
  • 00:38:34 blue maps to one green maps to two teal
  • 00:38:43 maps the three orange maps to 4 and
  • 00:38:49 finally purple maps to 5 and so once we
  • 00:39:00 have this dick now the next thing is to
  • 00:39:02 actually apply it to our data frame so
  • 00:39:04 we can do Train DF color we're changing
  • 00:39:08 the color column equals Train
  • 00:39:11 DF color apply and this is going to
  • 00:39:15 apply a lambda function on this so I'm
  • 00:39:17 gonna say lambda X which is going to say
  • 00:39:19 everything for every X cell in the color
  • 00:39:23 column we want to change the color
  • 00:39:26 column to the color dict of that string
  • 00:39:31 and as we will see when we print this
  • 00:39:34 out again now the colors are zeros and
  • 00:39:38 the unique colors are 0 1 2 3 4 5 that
  • 00:39:41 looks good so with that we can literally
  • 00:39:44 uncomment our model and we should be
  • 00:39:50 able run this just not
  • 00:39:55 Oh interesting we have one more issue it
  • 00:40:01 looks like received a label of five
  • 00:40:04 which is outside the valid range of 0 to
  • 00:40:07 2 so the other thing that we have to
  • 00:40:10 actually change is before we were just
  • 00:40:12 constipating two labels so now we need
  • 00:40:15 to change this last layer to be 6
  • 00:40:18 because there's now six different labels
  • 00:40:20 we could have look at ago getting up
  • 00:40:24 there
  • 00:40:32 and then we also had the same issue here
  • 00:40:35 with the test data frame so we'd also
  • 00:40:38 want to that same processing that we did
  • 00:40:42 the test data frame or the train data
  • 00:40:44 frame we'll also want to do that the
  • 00:40:45 test and now we'll see that it should
  • 00:40:53 give us some test performance here okay
  • 00:41:00 97% accuracy on the test data that's
  • 00:41:02 pretty dang good and the last thing I
  • 00:41:04 really do want to say real quick is in
  • 00:41:06 addition to evaluate sometimes it's nice
  • 00:41:08 for us to just predict what the output
  • 00:41:11 will be for a single point so I could
  • 00:41:14 use the model dot predict function here
  • 00:41:17 and I could pass in well it's our type
  • 00:41:20 is a numpy array which is a list of
  • 00:41:30 two-dimensional values so let's just
  • 00:41:33 pass in a value like 2 comma 2 if we
  • 00:41:36 look at our chart 2 comma 2 right maybe
  • 00:41:40 not – coming – let's do 0 comma 3 that
  • 00:41:43 should be a purple point so it should
  • 00:41:46 map to the number 5 I believe from that
  • 00:41:51 color dick we were just talking about
  • 00:41:53 yeah purple is 5 so let's see if the
  • 00:41:59 prediction gives us 5
  • 00:42:13 and ultimately like one we're using
  • 00:42:14 these neural networks out in the wild
  • 00:42:16 this is what we're going to be using
  • 00:42:18 like incoming data we would have to
  • 00:42:20 predict the value and use that as like
  • 00:42:22 kind of the the truth as long and as
  • 00:42:25 long as our models train well you would
  • 00:42:28 think that it would hopefully be giving
  • 00:42:31 you the correct prediction and you can
  • 00:42:33 use that however you want in your
  • 00:42:34 applications
  • 00:42:42 okay well it gives us all this
  • 00:42:47 information might be a little bit hard
  • 00:42:49 to read this what you could do is do NP
  • 00:42:54 dot around and then just have it round
  • 00:42:57 up to the nearest integer and so
  • 00:43:01 whatever one's closest to one would be
  • 00:43:04 ultimately your prediction here yet as
  • 00:43:10 we can see the prediction is not one not
  • 00:43:18 two not three not four not five not it's
  • 00:43:20 the sixth value so that's what we are
  • 00:43:23 looking for it was purple cool that
  • 00:43:26 looks good let's move on to the next one
  • 00:43:28 okay for the next example let's go ahead
  • 00:43:30 and save this file as network clusters
  • 00:43:35 will just say underscore two meaning two
  • 00:43:38 categories and when I say two categories
  • 00:43:41 I think it's nice to look at the data
  • 00:43:45 for this so first off let's look at the
  • 00:43:47 figure it looks like this and as you can
  • 00:43:50 see in this case in addition to having a
  • 00:43:52 color they also have a marker so what if
  • 00:43:56 we wanted to not only in our neural net
  • 00:43:59 predict the color but also predict the
  • 00:44:00 marker so like a plus sign here a star
  • 00:44:03 here triangles here in here how could we
  • 00:44:07 do that so now we're predicting two
  • 00:44:08 labels instead of one and that's going
  • 00:44:10 to change up what our network looks like
  • 00:44:11 so again feel free to try this on your
  • 00:44:14 own but this one is definitely gonna be
  • 00:44:15 trickier so it might not be as
  • 00:44:16 straightforward and just to show you the
  • 00:44:21 data would look like this we're now with
  • 00:44:25 XY red and some sort of marker so we're
  • 00:44:28 gonna have to convert all these into
  • 00:44:31 kind of more numerical we'll have to
  • 00:44:33 convert the color and the marker to some
  • 00:44:36 sort of vector representation and then
  • 00:44:38 be able to predict two things at once
  • 00:44:41 instead of just one so this will
  • 00:44:42 definitely change things up all right so
  • 00:44:46 now it might not make sense to do this
  • 00:44:49 color deck because now we need to
  • 00:44:50 predict two things at once
  • 00:44:51 what I recommend we do
  • 00:44:54 is go ahead and now instead of passing
  • 00:44:56 in like just integer numbers we're gonna
  • 00:45:01 ultimately stop using this Parrish
  • 00:45:03 categorical cross-entropy and use just
  • 00:45:05 the categorical across entropy and the
  • 00:45:07 difference here is that if you see what
  • 00:45:09 I type on the screen before it was
  • 00:45:11 expecting things like three to be passed
  • 00:45:14 in now we're gonna use labels that are
  • 00:45:18 vectors so just to show you what we want
  • 00:45:25 to get is have the first three labels
  • 00:45:29 represent the six different colors you
  • 00:45:33 have the first six labels represent the
  • 00:45:35 first six colors that we can output and
  • 00:45:38 they have the last three labels
  • 00:45:40 represent what the marker is so how that
  • 00:45:44 what the marker is in the graphed so
  • 00:45:47 what we can do here is pandas actually
  • 00:45:50 has this function called get dummies
  • 00:45:52 that can convert labels unique labels
  • 00:45:56 into this kind of one hot and coding
  • 00:45:59 representation so we're gonna utilize
  • 00:46:01 that so I'm going to delete the color
  • 00:46:03 decked for now delete this and now we're
  • 00:46:09 gonna do is get our labels so I'm going
  • 00:46:13 to say one hot color equals PD get
  • 00:46:20 dummies and we're going to pass in train
  • 00:46:23 D F dot color and then so this will can
  • 00:46:29 get it in a data frame form but we
  • 00:46:31 wanted in a numpy four so we're gonna do
  • 00:46:33 get values and just to show you what
  • 00:46:36 that looks like
  • 00:46:37 one hot color and one hot is the
  • 00:46:41 encoding it's what you call like
  • 00:46:43 encoding like this where you have a one
  • 00:46:46 with the truth of where the label is and
  • 00:46:49 I'll comment the rest of this stuff out
  • 00:46:50 quick oh I also have to delete this now
  • 00:46:56 I guess I could
  • 00:46:59 it would be helpful comment that out
  • 00:47:01 keep it on the screen oh no I don't want
  • 00:47:04 to put the whole thing as you can see
  • 00:47:06 though now its each of these values for
  • 00:47:11 the colors is encoded as can be seen
  • 00:47:14 here so we can do the same thing for the
  • 00:47:18 marker gonna BPD get dummies trained D F
  • 00:47:26 dot marker dot values and then what
  • 00:47:31 we're gonna have to do is concatenate
  • 00:47:33 the two so we want to append this with
  • 00:47:37 the three values that are going to be
  • 00:47:39 found for marker so we'll do n P dot
  • 00:47:43 concatenate then we'll pass in one hot
  • 00:47:46 color and one hot marker on the first
  • 00:47:56 axis and this will be our labels that's
  • 00:48:03 good and now it's a little bit trickier
  • 00:48:08 to do the shuffling like we were doing
  • 00:48:10 before just doing train DFL use because
  • 00:48:13 our labels are now separate from our
  • 00:48:15 data frame so I'm actually not going to
  • 00:48:17 do this right now we're gonna shuffle
  • 00:48:19 later let's uncomment everything
  • 00:48:27 and let's see I mean it will keep the
  • 00:48:33 data frame
  • 00:48:35 now let's shuffle down here where we
  • 00:48:37 actually get our x-values and this is
  • 00:48:39 now going to be labels instead of X so
  • 00:48:43 what we're going to do is we're going to
  • 00:48:44 do NP random Dutch shuffle X + NP dot
  • 00:48:50 random Dutch shuffle of the labels and
  • 00:48:56 the one thing to be careful here is if
  • 00:48:59 we separate out how we shuffle these we
  • 00:49:01 need to make sure that they're shuffled
  • 00:49:02 in the same order so we can do – this is
  • 00:49:05 because it would be terrible if we
  • 00:49:07 shuffled our input like our X values and
  • 00:49:10 shuffled our labels in a different order
  • 00:49:12 then they wouldn't actually match up
  • 00:49:13 with the truth so we would have at our a
  • 00:49:15 really hard time building a neural
  • 00:49:16 network around that so we can do is
  • 00:49:18 random state seed equals 42 let's say it
  • 00:49:23 doesn't matter what we set this seed to
  • 00:49:24 but basically this is just ensuring that
  • 00:49:28 in our random shuffle that the same
  • 00:49:31 shuffling is happening and happening in
  • 00:49:33 both places and now we can go ahead and
  • 00:49:39 fit our data comment all this out
  • 00:49:44 temporarily
  • 00:49:48 a target array six thousand nine was
  • 00:49:51 passed for an output of shape nun six
  • 00:49:54 huh using lost categorical cross-entropy
  • 00:49:58 okay that looks like it's pointing to
  • 00:50:02 this as an error and now that we're
  • 00:50:04 concatenated not only our color but also
  • 00:50:07 with our marker value we need to make
  • 00:50:10 this nine so now we're predicting you
  • 00:50:14 know a color in the first six cells of
  • 00:50:17 that neural network and a marker in the
  • 00:50:19 last three brutal and look at this this
  • 00:50:26 is really not good we got twelve percent
  • 00:50:31 accuracy on the task obviously we want
  • 00:50:34 to do better and when you redo something
  • 00:50:37 this bad you know something's wrong with
  • 00:50:39 their network so it's a matter of
  • 00:50:40 figuring out what going through and
  • 00:50:42 making sure that like the labels
  • 00:50:44 properly look like they should might be
  • 00:50:50 one thing to do like this let's just
  • 00:50:56 confirm that labels zero looks kind of
  • 00:50:59 yeah this looks good so that's good what
  • 00:51:03 else could be going wrong another thing
  • 00:51:05 I might look at as being a possibility
  • 00:51:07 of what's going wrong would be this
  • 00:51:12 shuffling here making sure that they're
  • 00:51:14 shuffled together if they're not
  • 00:51:15 shuffled right it's gonna be really hard
  • 00:51:16 to learn anything because it's not like
  • 00:51:18 truthful data each by just kind of
  • 00:51:19 random pairings of things but we seeded
  • 00:51:23 everything here so that looks good and
  • 00:51:25 here is the issue is that we're using
  • 00:51:28 product categorical cross entropy loss
  • 00:51:30 and this is expecting only one thing to
  • 00:51:34 be the label instead of the possibility
  • 00:51:37 of multiple things being the label as a
  • 00:51:39 result of that all we have to do to
  • 00:51:41 really fix our network is change this to
  • 00:51:43 binary cross entropy and what binary
  • 00:51:46 cross entropy is going to do is that in
  • 00:51:48 the output layer of our network now
  • 00:51:50 we're kind of going to be predicting
  • 00:51:51 each of these positions independently of
  • 00:51:56 the other positions so we can have
  • 00:51:58 multiple things be one or zero like we
  • 00:52:01 can have all
  • 00:52:02 this b1 and this b1 so let's see how
  • 00:52:05 that fixes our model and look at that
  • 00:52:07 yeah way better accuracy 90% that's yeah
  • 00:52:11 significantly better if we wanted to do
  • 00:52:14 the test we would go ahead and just
  • 00:52:19 basically we don't need the color dict
  • 00:52:24 anymore we do the same processing that
  • 00:52:27 we did up here just copy all this
  • 00:52:39 we'll say test one hot color test one
  • 00:52:45 hot marker this is test EF this is test
  • 00:52:50 DF and test labels and now we're
  • 00:53:00 evaluating on test labels and because
  • 00:53:04 we've only got 90% accuracy I think we
  • 00:53:06 could maybe need some more parameters
  • 00:53:08 I'm gonna bump these up to 64 neurons
  • 00:53:11 per layer and see if maybe that helps
  • 00:53:13 things to because there's more more
  • 00:53:16 things we need to learn here so I might
  • 00:53:18 need a little bit more parameters and
  • 00:53:21 we'll also just do a fun prediction so
  • 00:53:24 let's look at what we should expect zero
  • 00:53:30 three would be purple and a star and we
  • 00:53:34 would have to probably like if we wanted
  • 00:53:35 to really utilize this in the wild we'd
  • 00:53:37 have to do one additional step of
  • 00:53:39 basically converting our one hot and
  • 00:53:42 codings back to a string but I'm going
  • 00:53:46 to kind of skip over that for now yeah
  • 00:53:51 that's pretty good 93 percent
  • 00:53:52 classification and I'm sure if we wanted
  • 00:53:55 to we could like maybe bumping this up
  • 00:53:57 would help notice too that we got this
  • 00:54:02 prediction down here and if we wanted to
  • 00:54:04 kind of just sanity check that things
  • 00:54:06 were working well we could maybe
  • 00:54:08 classify things in three different spots
  • 00:54:11 so 0 1 would be arrow and red negative 2
  • 00:54:18 wanted to be a plus and green so let's
  • 00:54:21 pass in those 2 0 1 and negative 2 1 I
  • 00:54:31 think was the last one negative 2 1
  • 00:54:34 would be plus let's see what happens
  • 00:54:36 with these predictions or actually I
  • 00:54:40 make this 4
  • 00:54:46 and look at that so we don't actually
  • 00:54:48 have the string representations but we
  • 00:54:50 can see that they all classify them as a
  • 00:54:53 different color and all in the last
  • 00:54:56 three things there all were a different
  • 00:54:58 marker so that looks like things are
  • 00:55:00 good and also our accuracy tells us that
  • 00:55:02 things are pretty good and for the sake
  • 00:55:05 of time the next step would be I would
  • 00:55:07 say to be able to convert these back
  • 00:55:10 into their string representations but
  • 00:55:11 that's more of a Python task than a
  • 00:55:14 specific neural net task so I will leave
  • 00:55:17 that for you guys to try to do as a
  • 00:55:20 final example and kind of a to bring
  • 00:55:22 this video full circle the last thing
  • 00:55:25 we're gonna do is build a neural net to
  • 00:55:27 classify the data that was introduced at
  • 00:55:30 the start of this video and to do that
  • 00:55:33 let's again open up our quadratic
  • 00:55:36 example and just save that now just
  • 00:55:40 because like the actual problem is very
  • 00:55:43 similar to the quadratic problem over
  • 00:55:44 the categories probably just did save
  • 00:55:47 that as Network complex dot py okay and
  • 00:55:57 so right off the bat let's see how our
  • 00:55:58 quadratic network does on the more
  • 00:56:01 complex data that's it here one last
  • 00:56:06 reminder ok so right off the bat it
  • 00:56:11 actually classifies 80% of those about
  • 00:56:13 80% 79% of those points correctly so how
  • 00:56:17 can we do better well I think it's as
  • 00:56:19 easy sometimes as let's increase the
  • 00:56:21 number of parameters let's maybe add a
  • 00:56:26 drop out layer
  • 00:56:35 let's maybe add another one of these so
  • 00:56:38 we'll have three hidden layers we're
  • 00:56:42 gonna increase our batch size just so it
  • 00:56:43 goes a bit quicker all right
  • 00:56:49 maybe we'll make this like 256 will
  • 00:56:53 really increase the parameters here all
  • 00:56:55 right let's see how that now does Oh
  • 00:57:00 caris all right come on come on okay
  • 00:57:06 eighty one percent that's better
  • 00:57:08 I wonder if you know increasing this
  • 00:57:10 even more would help like how many
  • 00:57:12 parameters do what you really need and I
  • 00:57:14 guess here's the balance of you know we
  • 00:57:17 don't want to over fit too so we want to
  • 00:57:20 make sure that our test accuracy stays
  • 00:57:22 pretty high and you know this house
  • 00:57:23 actually here was eighty one percent and
  • 00:57:27 the training was eighty so didn't we
  • 00:57:30 didn't lose any generalizability there
  • 00:57:34 hmm that actually decreased performance
  • 00:57:37 maybe we dropped this back down drop
  • 00:57:42 these down maybe we try making the drop
  • 00:57:47 out a bit higher so let's say 0.4 0.4
  • 00:57:54 and maybe just to go through the data
  • 00:57:57 more times with this higher rate of
  • 00:57:59 dropout and maybe that will help us more
  • 00:58:02 generalize and look at that that time I
  • 00:58:09 think with this added drop out here we
  • 00:58:11 did the best that we have done we didn't
  • 00:58:13 over fit it too too much and we also
  • 00:58:14 learned more because we dropped out
  • 00:58:17 nodes at randomly and forced these nodes
  • 00:58:18 in this network to do more on their own
  • 00:58:23 so that's a pretty good I'm pretty happy
  • 00:58:26 with that with 84 percent on the test
  • 00:58:28 set I might just as a last thing add the
  • 00:58:33 dropout to that final hidden layer as
  • 00:58:35 well and just see if that does anything
  • 00:58:42 it doesn't seem like that improved
  • 00:58:44 anything so we'll just keep it at this I
  • 00:58:46 think this is a pretty good solution we
  • 00:58:48 might not get a hundred percent on this
  • 00:58:49 last test I do recommend if you want to
  • 00:58:51 try to keep you know tuning these
  • 00:58:55 parameters see if you can do better and
  • 00:58:56 better but I think that's going to be
  • 00:58:58 the ending point of this video a couple
  • 00:59:00 things we're going to say before we
  • 00:59:02 conclude is that this took longer than I
  • 00:59:04 expected so we're not gonna have time to
  • 00:59:05 do the rock-paper-scissors example in
  • 00:59:07 this video but I'll bring that out into
  • 00:59:09 a part 2 video and kind of make that its
  • 00:59:12 own real-world example video another
  • 00:59:15 thing we'll do in that next video is
  • 00:59:16 look at how we can automatically select
  • 00:59:18 some of these parameters instead of
  • 00:59:20 manually setting them and then also I
  • 00:59:22 just want to say you know what our next
  • 00:59:24 steps where you can kind of take your
  • 00:59:25 neural network skills to the next level
  • 00:59:27 well I recommend looking at in addition
  • 00:59:29 to these fully connected layers that we
  • 00:59:31 were working with look at other types of
  • 00:59:33 networks look at you know RN ends look
  • 00:59:37 at convolutional neural networks and
  • 00:59:40 building networks around those types of
  • 00:59:43 things that's a one good thing and then
  • 00:59:45 another thing you could do is for all
  • 00:59:47 the examples we went through today maybe
  • 00:59:50 if you want to learn PI torch you could
  • 00:59:52 try doing try implementing these
  • 00:59:55 solutions with PI torch instead of Carus
  • 00:59:57 alright that concludes the video thank
  • 01:00:00 you all for watching hopefully you
  • 01:00:01 enjoyed if you have any questions let me
  • 01:00:03 know in the comments and if you haven't
  • 01:00:05 already make sure to throw this video a
  • 01:00:07 thumbs up and subscribe to the channel
  • 01:00:09 also I want to mention real quick if you
  • 01:00:12 enjoyed watching me use kite definitely
  • 01:00:14 check out kite and download kite in the
  • 01:00:17 link in the description feel free to
  • 01:00:18 also check me out on my socials
  • 01:00:20 Instagram and Twitter all right that's
  • 01:00:22 all that we have until next time
  • 01:00:24 everyone p7
  • 01:00:26 you
  • 01:00:30 [Music]