Coding

TensorFlow 2.0 Crash Course

  • 00:00:00 hey guys and welcome to a brand new
  • 00:00:02 tutorial series on neural networks with
  • 00:00:05 Python and tensorflow 2.0 now tensorflow
  • 00:00:08 2.0 is the brand-new version of
  • 00:00:10 tensorflow still actually in the alpha
  • 00:00:12 stages right now but it should be
  • 00:00:14 released within the next few weeks but
  • 00:00:17 because it's an alpha tensorflow has
  • 00:00:18 been kind enough to release us that
  • 00:00:20 alpha version so that's what we're gonna
  • 00:00:21 be working with in this tutorial series
  • 00:00:23 and this will work for all future
  • 00:00:26 versions of tensorflow
  • 00:00:28 2.0 so don't be worried about that now
  • 00:00:30 before I get too far into this first
  • 00:00:32 video I just want to quickly give you an
  • 00:00:33 overview of exactly what I'm gonna be
  • 00:00:35 doing throughout this series so you guys
  • 00:00:36 have an idea of what to expect and what
  • 00:00:39 you're going to learn now the beginning
  • 00:00:40 videos and especially this one are going
  • 00:00:42 to be dedicated to understanding how a
  • 00:00:44 neural network works and I think this is
  • 00:00:47 absolutely fundamental and you have to
  • 00:00:49 have some kind of basis on the math
  • 00:00:51 behind a neural network before you're
  • 00:00:53 really able to actually properly
  • 00:00:55 implement one now tensorflow does a
  • 00:00:58 really nice job of making it super easy
  • 00:01:00 to implement neural networks and use
  • 00:01:02 them but to actually have a successful
  • 00:01:04 and complex neural network you have to
  • 00:01:06 understand how they work on the lower
  • 00:01:08 level so that's what we're gonna be
  • 00:01:09 doing for the first few videos after
  • 00:01:11 that what we'll do is we'll start
  • 00:01:12 designing our own neural networks that
  • 00:01:14 can solve the very basic Emnes data sets
  • 00:01:17 that tensorflow provides to us now these
  • 00:01:20 are pretty straightforward and pretty
  • 00:01:21 simple but they give us a really good
  • 00:01:23 building block on understanding how the
  • 00:01:25 architecture of a neural network works
  • 00:01:27 what are some of the different
  • 00:01:28 activation functions how you can connect
  • 00:01:30 layers and all of that which will
  • 00:01:32 transition us nicely into creating our
  • 00:01:33 own neural networks using our own data
  • 00:01:36 for something like playing a game now
  • 00:01:39 personally I'm really interested with
  • 00:01:40 neural networks playing games and I'm
  • 00:01:42 sure a lot of you are as well and that's
  • 00:01:44 what I'm gonna be aiming to do near the
  • 00:01:45 end of the series on kind of our larger
  • 00:01:47 project I'll be designing a neural
  • 00:01:49 network and tweaking it so they can play
  • 00:01:51 a very basic game that I've personally
  • 00:01:53 designed in Python with PI game now with
  • 00:01:57 that being said that's kind of eight
  • 00:01:59 four we're gonna be doing in this series
  • 00:02:00 I may continue this on future in later
  • 00:02:03 videos and do like very specific neural
  • 00:02:05 network series maybe a chatbot or
  • 00:02:06 something like that but I need you guys
  • 00:02:08 to let me know what you'd like to see in
  • 00:02:10 the comments down below
  • 00:02:11 with that being said if you're excited
  • 00:02:12 about the series
  • 00:02:13 make sure you drop a like on this video
  • 00:02:15 and subscribe to the channel to be
  • 00:02:17 notified when I post the new videos and
  • 00:02:19 with that being said let's get into this
  • 00:02:20 first video on how a neural network
  • 00:02:23 works and what a neural network is so
  • 00:02:25 let's start talking about what a neural
  • 00:02:27 network is and how they work now when
  • 00:02:29 you hear neural network you usually
  • 00:02:31 think of neurons
  • 00:02:33 now neurons are what compose our brain
  • 00:02:35 and I believe don't quote me on this we
  • 00:02:37 have billions of them in our body or in
  • 00:02:39 our brain now the way that neurons work
  • 00:02:41 on a very simple and high level is you
  • 00:02:44 have a bunch of them that are connected
  • 00:02:46 in some kind of way so let's say these
  • 00:02:48 are four neurons and they're connected
  • 00:02:50 in some kind of pattern now in this case
  • 00:02:53 our pattern is completely like like
  • 00:02:55 random we're just arbitrary we're just
  • 00:02:57 picking a connection but this is the way
  • 00:02:59 that they're connected
  • 00:03:00 okay now neurons can either fire or not
  • 00:03:03 fire so you need to be on or off just
  • 00:03:05 like a 1 or 0 ok so let's say that for
  • 00:03:08 some reason this neuron decides to fire
  • 00:03:10 maybe you touch something maybe you
  • 00:03:13 smelt something something fires in your
  • 00:03:16 brain and this neuron decides to fire
  • 00:03:18 now it's connected to in this case all
  • 00:03:20 of the other neurons so what it will do
  • 00:03:22 is it will look at its other neurons and
  • 00:03:24 the connection and it will possibly
  • 00:03:26 cause it's connected neurons to fire or
  • 00:03:29 to not fire so in this case let's say
  • 00:03:31 maybe did what this one firing causes
  • 00:03:33 this connected neuron to fire this one
  • 00:03:36 to fire and maybe this one was already
  • 00:03:37 firing and now it's decided it turned it
  • 00:03:40 off or something like that ok so that's
  • 00:03:42 what happened now when this neuron fires
  • 00:03:45 well it's connected to this neuron and
  • 00:03:47 it's connected to this nerve well it's
  • 00:03:48 already got that connection but let's
  • 00:03:50 say that maybe when this one fires it
  • 00:03:52 causes this one to on fire because it
  • 00:03:54 was just fired something like that right
  • 00:03:56 and then this one now that it's off it
  • 00:03:59 causes this one to fire back up and then
  • 00:04:00 it goes it's just a chain of firing and
  • 00:04:02 unfired
  • 00:04:03 and that's just kind of how it works
  • 00:04:06 right firing an unfired now that's as
  • 00:04:09 far as i'm going to go into explaining
  • 00:04:10 neurons but this kind of gives us a
  • 00:04:12 little bit of a basis for a neural
  • 00:04:13 network now a neural network essentially
  • 00:04:16 is a connected layer of neurons or
  • 00:04:19 connected layers so multiple of neurons
  • 00:04:22 so in this case let's say that we have a
  • 00:04:24 first layer we're going to call this our
  • 00:04:26 input
  • 00:04:26 that has four gnomes and we have one
  • 00:04:30 more layer that only contains one nur
  • 00:04:33 now these neurons are connected now in
  • 00:04:37 our neural network we can have our
  • 00:04:39 connections happening in different ways
  • 00:04:40 we can have each woody Coates neuron
  • 00:04:43 connected to each other neurons so from
  • 00:04:46 layer to layer or we can have like some
  • 00:04:48 connected to others some not connected
  • 00:04:50 so I'm connected multiple times it
  • 00:04:51 really depends on the type of neural
  • 00:04:54 network we're doing now in most cases
  • 00:04:56 what we do is we have what's called a
  • 00:04:57 fully connected neural network which
  • 00:05:00 means that each neuron in one layer is
  • 00:05:03 connected to each neuron in the next
  • 00:05:05 layer exactly one time so if I were to
  • 00:05:08 add another neuron here then what would
  • 00:05:11 happen is each of these neurons would
  • 00:05:13 also connect to this neuron one time so
  • 00:05:16 we would have a total of eight
  • 00:05:18 connections because four times two is
  • 00:05:19 eight right and that's how that would
  • 00:05:21 work now for simplicity sake we're just
  • 00:05:24 gonna use one neuron in the next layer
  • 00:05:27 just to make things a little bit easier
  • 00:05:28 to understand now all of these
  • 00:05:31 connections have what is known as a
  • 00:05:34 weight now this is in a neural network
  • 00:05:36 specifically okay so we're gonna say
  • 00:05:37 this is known as weight one this is
  • 00:05:39 known as way – this is weight three and
  • 00:05:42 this is weight 4 and again just to rehab
  • 00:05:46 size this is known as our input layer
  • 00:05:47 because it is the first layer in our
  • 00:05:50 connected layers of neurons okay and
  • 00:05:53 going with that the last layer in our
  • 00:05:56 connected layer of neurons is known as
  • 00:05:59 our output layer now these are the only
  • 00:06:02 two layers that we really concern
  • 00:06:04 ourselves with when we look and use a
  • 00:06:07 neural network now obviously when we
  • 00:06:09 create them we have to determine what
  • 00:06:11 layers we're gonna have in the
  • 00:06:12 connection type but when we're actually
  • 00:06:14 using the neural network to make
  • 00:06:15 predictions or to Train it
  • 00:06:17 we are only concerning ourselves with
  • 00:06:19 the input layer and the output layer now
  • 00:06:21 what does this do and how do these
  • 00:06:24 neural networks work well essentially
  • 00:06:25 given some kind of input we want to do
  • 00:06:29 something with it and get some kind of
  • 00:06:31 output right in most instances that's
  • 00:06:33 what you want input results in the
  • 00:06:35 output in this case we have four inputs
  • 00:06:37 and we have
  • 00:06:38 output but we could have a case where we
  • 00:06:41 have four inputs and we have 25 outputs
  • 00:06:43 right it really depends on the kind of
  • 00:06:45 problem we're trying to solve so this is
  • 00:06:48 a very simple example but what I'm going
  • 00:06:49 to do is show you how we would or how a
  • 00:06:52 neuron that work would work to train a
  • 00:06:54 very basic snaking so let's look at a
  • 00:06:59 very basic snake game so let's say this
  • 00:07:01 is our snake okay and this is his head
  • 00:07:05 actually yeah let's say this is his head
  • 00:07:08 but like this is what the position the
  • 00:07:09 snake looks like where this is the tail
  • 00:07:11 okay we'll circle the tail now what I
  • 00:07:14 want to do is I want to train a neural
  • 00:07:15 network that will allow this snake to
  • 00:07:18 stay alive so essentially its output
  • 00:07:20 will be what direction to go in or like
  • 00:07:23 to follow a certain direction or not
  • 00:07:25 okay essentially just keep this snake a
  • 00:07:26 lot that's what I want it to do now how
  • 00:07:29 am I gonna do this well the first step
  • 00:07:30 is to decide what our input is gonna be
  • 00:07:32 and then to decide what our output is
  • 00:07:34 going to be so in this case I think a
  • 00:07:36 clever input is gonna be do we have
  • 00:07:38 something in front of the snake do we
  • 00:07:40 have something to the left of the snake
  • 00:07:41 and do we have something to the right of
  • 00:07:44 the snake because in this case all
  • 00:07:45 that's here is just the snake and he
  • 00:07:46 just needs to be able to survive so what
  • 00:07:49 we'll do is we'll say okay is there
  • 00:07:51 something to the left yes no something
  • 00:07:53 in front yes no so 0 1 something to the
  • 00:07:55 right yes no and then our last input
  • 00:07:57 will be a recommended direction for the
  • 00:08:00 snake to go in so the recommended
  • 00:08:02 direction could be anything so in this
  • 00:08:04 case maybe we'll say the recommended
  • 00:08:05 direction is left and what our output
  • 00:08:07 will be is whether or not to follow that
  • 00:08:09 recommended direction or not or to try
  • 00:08:12 to do a different recommendation
  • 00:08:14 essentially or go to a different
  • 00:08:16 direction so let's do one case on how we
  • 00:08:19 would expect this neural network to
  • 00:08:21 perform without rain like once it's
  • 00:08:23 trained right based on some given input
  • 00:08:25 so let's say there's not something to
  • 00:08:28 the left so we're gonna put a 0 here
  • 00:08:29 because this one will represent if
  • 00:08:30 there's anything to the left the next
  • 00:08:33 one will be front so we'll say well
  • 00:08:36 there's nothing in front the next one
  • 00:08:38 will be to the right so we'll say right
  • 00:08:40 and we'll say yes there is something to
  • 00:08:42 the right of the snake and our
  • 00:08:43 recommended direction what can be
  • 00:08:45 anything we'd like so in this case we
  • 00:08:47 say the recommended direction is left
  • 00:08:48 and we'll way we'll do the recommended
  • 00:08:50 direction is negative
  • 00:08:52 1:01 where negative one is left zero is
  • 00:08:56 in front and one is to the right okay so
  • 00:09:01 we'll say in this case our recommended
  • 00:09:03 direction is negative one and we'll just
  • 00:09:05 denote this by direction now our output
  • 00:09:08 in this instance should either be a zero
  • 00:09:12 or one representing do we follow the
  • 00:09:15 recommended direction or do we not so
  • 00:09:17 let's see in this case following the
  • 00:09:20 recommended direction would keep our
  • 00:09:21 snake alive
  • 00:09:22 so we'll say 1 yes we will follow the
  • 00:09:24 recommended direction that is acceptable
  • 00:09:26 that is fine we're gonna stay alive when
  • 00:09:28 we do that now let's see what happens
  • 00:09:30 when we change the recommended direction
  • 00:09:32 to be right so let's say that we say 1
  • 00:09:35 as a recommended direction again this is
  • 00:09:37 during here then what should our output
  • 00:09:40 be well if we decide to go right we're
  • 00:09:42 gonna crash into our tail which means
  • 00:09:45 that we should not follow that direction
  • 00:09:47 so our output should be 0 so I hope
  • 00:09:49 you're understanding how we would expect
  • 00:09:51 this neural network to perform all right
  • 00:09:54 so now how do we actually design this
  • 00:09:57 neural network how do we get this work
  • 00:09:59 how do we train this right well that is
  • 00:10:01 a very good question and that is what
  • 00:10:02 I'm gonna talk about now so let me
  • 00:10:04 actually just erase some of this stuff
  • 00:10:06 so we have a little bit more room to
  • 00:10:07 work with some math stuff right here but
  • 00:10:10 right now what we start by doing is we
  • 00:10:12 start by designing what's known as the
  • 00:10:13 architecture of our neural network so
  • 00:10:15 we've already done this we have the
  • 00:10:16 input and we have the output now each of
  • 00:10:19 our inputs is connected to our outputs
  • 00:10:21 and each of these connections has what's
  • 00:10:23 known as a weight now another thing that
  • 00:10:26 we have is each of our input neurons has
  • 00:10:28 a value right we had in this case we
  • 00:10:30 either had 0 or we had 1 now these
  • 00:10:34 values can be different right these
  • 00:10:36 values can either be decimal values or
  • 00:10:38 they can be like between 0 and 100 they
  • 00:10:40 don't have to be just between 0 and 1
  • 00:10:41 but the point is that we have some kind
  • 00:10:43 of value right so what we're gonna do in
  • 00:10:46 this output layer to determine what way
  • 00:10:48 we should go is essentially we are going
  • 00:10:50 to take the weighted sum of the values
  • 00:10:53 multiplied by the weights I'm gonna talk
  • 00:10:55 about how this works more in depth in a
  • 00:10:57 second but just just follow me now so
  • 00:10:59 what this symbol means is take the sum
  • 00:11:01 and what we do is I'm gonna say in this
  • 00:11:03 case I which is gonna be our variable
  • 00:11:05 and
  • 00:11:06 talk about how this kind of thing works
  • 00:11:07 in a second we'll say I equals one and
  • 00:11:09 I'm going to say we'll take the weighted
  • 00:11:11 sum of in this case value I multiplied
  • 00:11:15 by weight I so what this means
  • 00:11:18 essentially is we're going to start at I
  • 00:11:20 equals 1 we're gonna use I as our
  • 00:11:22 variable for looping and we're gonna say
  • 00:11:24 in this case we're gonna do B 1 times V
  • 00:11:27 I VI times W I and then we're gonna add
  • 00:11:30 all this so what this will return to us
  • 00:11:33 actually will be V 1 W 1 plus V 2 W 2
  • 00:11:39 plus V 3 w 3 plus V 4 w 4 and this will
  • 00:11:46 be our output that's that's what our
  • 00:11:49 output layer is going to have as a value
  • 00:11:52 now this doesn't really make much sense
  • 00:11:55 right now right like why why we doing
  • 00:11:57 this weights what is this multiplication
  • 00:11:58 we'll just follow with me for one second
  • 00:12:01 so this is what our output layer is
  • 00:12:03 going to do now there's one thing that
  • 00:12:05 we have to add to this as well and this
  • 00:12:08 is what is known as our biases okay so
  • 00:12:11 what we're gonna do is we're going to
  • 00:12:12 take this weighted sum but we're also
  • 00:12:14 going to have some kind of bias on each
  • 00:12:17 of these weights okay
  • 00:12:18 and what this bias is known as it's
  • 00:12:20 denoted by C typically but essentially
  • 00:12:23 it is some value that we just
  • 00:12:25 automatically add or subtract it's a
  • 00:12:27 constant value for each of these weights
  • 00:12:29 so we're gonna say all of these these
  • 00:12:31 connections have weight but they also
  • 00:12:32 have a bias so we're gonna have B 1 B 2
  • 00:12:35 B 3 and B 4 is what we'll call it B
  • 00:12:40 instead of C so what I'll do here is
  • 00:12:42 what I'm also going to do is I'm also
  • 00:12:44 gonna add these biases in when I do
  • 00:12:46 these weights so we're going to say B I
  • 00:12:47 as well so now what we'll have is we'll
  • 00:12:50 have at the end here plus bi or plus B 1
  • 00:12:54 plus B 2 plus B 3 + B 4 now again I know
  • 00:12:59 you guys like what the heck am I doing
  • 00:13:00 with this this makes no sense it's about
  • 00:13:02 to make sense in one second so now what
  • 00:13:06 we need to do is we need to train the
  • 00:13:07 network so we've understood now this is
  • 00:13:09 essentially what this output layers
  • 00:13:10 doing we're taking all of these weights
  • 00:13:13 and these values we're multiplying them
  • 00:13:15 together and we're adding them and we're
  • 00:13:16 taking what's known as the weighted sum
  • 00:13:18 okay
  • 00:13:19 but how do we like what are these values
  • 00:13:21 how do we get these values and how is
  • 00:13:22 this gonna give us a valid output well
  • 00:13:25 what we're going to do is we're gonna
  • 00:13:26 train the network on a ton of different
  • 00:13:28 information so let's say we play 1,000
  • 00:13:32 games of snake and we get all of the
  • 00:13:34 different inputs and all the different
  • 00:13:36 outputs so what we'll do is we'll
  • 00:13:38 randomly decide like a recommended
  • 00:13:39 direction and we'll just take the state
  • 00:13:41 of the snake which will be either
  • 00:13:43 there's something left to the right
  • 00:13:45 we're in front of it and then we'll take
  • 00:13:46 the output which will be like did the
  • 00:13:49 snake survive or did the snake not
  • 00:13:50 survive so well what we'll do is we'll
  • 00:13:54 train the network using that information
  • 00:13:56 so we'll generate all of this different
  • 00:13:59 information and then train the network
  • 00:14:00 and what the network will do is it will
  • 00:14:03 look at all of this information and it
  • 00:14:05 will start adjusting these biases and
  • 00:14:07 these weights to properly get a correct
  • 00:14:11 output because what we'll do is we'll
  • 00:14:12 give it all this input right so let's
  • 00:14:14 say we give it the input again of zero
  • 00:14:16 one zero and maybe one like this random
  • 00:14:19 input and let's say the output for this
  • 00:14:21 case is what do you call it so one is go
  • 00:14:25 to the right the output is one which is
  • 00:14:26 correct well what the network could do
  • 00:14:29 is say okay I got that correct so what
  • 00:14:30 I'm gonna do is I'm not gonna bother
  • 00:14:32 adjusting the network because this fine
  • 00:14:34 so I don't have to change any of these
  • 00:14:36 biases I don't have to change any of
  • 00:14:37 these weights everything is working fine
  • 00:14:39 but let's say that we get the answer
  • 00:14:42 wrong so maybe the output was zero but
  • 00:14:43 the answer should have been one because
  • 00:14:45 we know the answer obviously because
  • 00:14:47 we've generated all the input and the
  • 00:14:48 output so now what the network will do
  • 00:14:50 is it will start adjusting these weights
  • 00:14:52 and adjusting these biases they'll say
  • 00:14:55 all right so I got this one wrong and
  • 00:14:57 I've gotten like five or six wrong
  • 00:14:59 before and this is what was familiar
  • 00:15:01 when I got something wrong so let's add
  • 00:15:03 one to this bias or let's multiply this
  • 00:15:05 weight by two and what it will do is
  • 00:15:07 it'll start adjusting these weights and
  • 00:15:09 these biases so that it gets more things
  • 00:15:12 correct so obviously that's why neural
  • 00:15:15 networks typically take a massive amount
  • 00:15:17 of information to Train because what you
  • 00:15:19 do is you pass it all of this
  • 00:15:21 information and then it keeps going
  • 00:15:23 through the network and at the beginning
  • 00:15:24 it sucks right because it doesn't this
  • 00:15:26 network just starts with random weights
  • 00:15:28 and random biases but as it goes through
  • 00:15:31 and it learns it says okay
  • 00:15:33 well I got this one correct so let's
  • 00:15:36 leave the weights and the bias is the
  • 00:15:37 same but let's remember that this is
  • 00:15:39 what the way in the bias was when this
  • 00:15:41 was correct and then maybe he gets
  • 00:15:42 something wrong and it says okay so
  • 00:15:44 let's adjust bias one a little bit
  • 00:15:46 let's adjust weight one let's mess with
  • 00:15:48 these and then let's try another example
  • 00:15:50 and then it says okay I got this example
  • 00:15:52 right maybe we're moving in the right
  • 00:15:53 direction maybe you will adjust another
  • 00:15:55 way maybe we'll adjust another bias and
  • 00:15:56 that's eventually your goal is that you
  • 00:15:59 get to a point where your network is
  • 00:16:01 very accurate because you've given it a
  • 00:16:03 ton of data and it's adjusted the
  • 00:16:04 weights and the biases correctly so that
  • 00:16:06 this kind of formula here of this
  • 00:16:08 weighted average will just always give
  • 00:16:10 you the correct answer or has a very
  • 00:16:13 high accuracy or high chance of giving
  • 00:16:15 you the correct answer so I hope that
  • 00:16:18 kind of makes sense I'm definitely over
  • 00:16:19 simplifying things in how the adjustment
  • 00:16:22 of these weights and these biases work
  • 00:16:23 but it's not crazy important and we're
  • 00:16:25 not going to be doing any of the
  • 00:16:27 adjustment ourselves we're we are just
  • 00:16:29 gonna be kind of tweaking a few things
  • 00:16:31 with the network so as long as you
  • 00:16:32 understand that when you feed
  • 00:16:34 information what happens is it checks
  • 00:16:36 whether the network got it correct or it
  • 00:16:38 got it incorrect and then it adjusts the
  • 00:16:40 network accordingly and that is how the
  • 00:16:41 learning process works for a neural
  • 00:16:43 network alright so now it's time to
  • 00:16:45 discuss a little bit about activation
  • 00:16:48 functions so right now what I've
  • 00:16:50 actually just described to you is a very
  • 00:16:52 advanced technique of linear regression
  • 00:16:54 so essentially I was saying we are
  • 00:16:57 adjusting weights we're adjusting biases
  • 00:16:59 and essentially we are creating a
  • 00:17:00 function that given the inputs of like X
  • 00:17:03 Y Z W or like left front right we are
  • 00:17:06 giving some kind of output but all we've
  • 00:17:08 been doing to do that essentially is
  • 00:17:10 just adjusting a linear function because
  • 00:17:13 our degree is only 1 right we have
  • 00:17:15 weights of degree one multiplying by
  • 00:17:17 values of degree one and we're adding
  • 00:17:19 some kind of bias and that kind of
  • 00:17:21 reminds you of the form MX plus B we're
  • 00:17:24 literally just adding a bunch of MX plus
  • 00:17:26 B is together which gives us like a
  • 00:17:28 fairly complex linear function but this
  • 00:17:32 is really not a great way to do things
  • 00:17:35 because it limits the degree of
  • 00:17:37 complexity that our network can actually
  • 00:17:40 have to be linear and that's not what we
  • 00:17:43 want so now we have to talk about
  • 00:17:44 activation functions
  • 00:17:46 so if you understand everything that
  • 00:17:48 I've talked about so far you're doing
  • 00:17:49 amazing this is great you understand
  • 00:17:51 that essentially the way that the
  • 00:17:53 network works is you feed information in
  • 00:17:54 and it adjusts these weights and biases
  • 00:17:57 there's a specific way it does that
  • 00:17:58 which we'll talk about later and then
  • 00:18:00 you get some kind of output and based on
  • 00:18:03 that output you're trying to adjust the
  • 00:18:05 weights and biases and and all that
  • 00:18:06 right so now what we need to do is talk
  • 00:18:09 about activation functions and what an
  • 00:18:10 activation function does is it's
  • 00:18:12 essentially a nonlinear function that
  • 00:18:15 will allow you to add a degree of
  • 00:18:17 complexity to your network so that you
  • 00:18:19 can have more of a function that's like
  • 00:18:21 this as opposed to a function that is a
  • 00:18:24 straight line so an example of an
  • 00:18:26 activation function is something like a
  • 00:18:28 sigmoid function now a sigmoid function
  • 00:18:31 what it does is it'll map any value you
  • 00:18:35 give it in between the value of negative
  • 00:18:37 1 and 1
  • 00:18:39 so for example when we create this
  • 00:18:41 Network our output might be like the
  • 00:18:44 number 7 now this number 7 well it is
  • 00:18:48 closer to 1 that is to 0 so we might
  • 00:18:50 deem that a correct answer or we might
  • 00:18:53 say that this is actually way off
  • 00:18:55 because it's way above 1 right but what
  • 00:18:57 we want to do essentially is in our
  • 00:18:59 output layer we only want our values to
  • 00:19:02 be within a certain range we want them
  • 00:19:04 to be in this case between 0 and 1 or
  • 00:19:06 maybe we want them to be between
  • 00:19:07 negative 1 and 1 I'm saying like how
  • 00:19:11 close we are to 0 making that decision
  • 00:19:13 how close we are to 1 something like
  • 00:19:14 that right so what the sigmoid
  • 00:19:16 activation function does it's a
  • 00:19:17 nonlinear function and it takes any
  • 00:19:20 value and essentially the closer that
  • 00:19:22 value is to infinity the closer the
  • 00:19:25 output is to 1 and the closer that value
  • 00:19:28 is to negative infinity the closer that
  • 00:19:30 output is to negative 1 so what it does
  • 00:19:33 is it adds a degree of complexity to our
  • 00:19:35 network now if you don't if you're not a
  • 00:19:37 high level like math student or you only
  • 00:19:39 know like very basic high school math
  • 00:19:41 this might not really make sense to you
  • 00:19:42 but essentially the degree of something
  • 00:19:44 right is honestly how complex you can
  • 00:19:46 get if you have like a degree 9 function
  • 00:19:49 then what you could do is you can have
  • 00:19:51 some crazy kind of curve and stuff going
  • 00:19:55 on especially in multiple dimensions
  • 00:19:56 that will just make things like much
  • 00:19:59 more complex
  • 00:20:00 so for example if you have like a degree
  • 00:20:02 nine function you can have curves that
  • 00:20:04 are going like like this like all around
  • 00:20:07 here that are mapping your different
  • 00:20:08 values and if you only have a linear
  • 00:20:11 function well you can only have a
  • 00:20:12 straight line which limits your degree
  • 00:20:14 of complexity by a significant amount
  • 00:20:17 now what these activation functions also
  • 00:20:20 do is they shrink down your data so that
  • 00:20:22 it is not as large so for example right
  • 00:20:24 like say we're looking with data that is
  • 00:20:26 like hundreds of thousands of like
  • 00:20:27 characters long or digits we'd want to
  • 00:20:30 shrink that into like normalize that
  • 00:20:32 data so that it's easier to actually
  • 00:20:34 work with so let me give you a more
  • 00:20:36 practical example of how to use the
  • 00:20:38 activation function I talked about what
  • 00:20:39 sigmoid does what we would do is we
  • 00:20:41 would take this weighted sum so we did
  • 00:20:43 the sum of WI bi plus bi right and we
  • 00:20:51 would apply an activation function to
  • 00:20:53 this so we'd say maybe our activation
  • 00:20:55 function is f of X and we would say F of
  • 00:20:58 this and this gives us some value which
  • 00:21:00 is now gonna be our output neuron and
  • 00:21:02 the reason we do that again is so that
  • 00:21:05 when we are adjusting our weights and
  • 00:21:07 biases and we add that activation
  • 00:21:09 function in now we can have a way more
  • 00:21:11 complex function as opposed to just
  • 00:21:14 having the kind of linear regression
  • 00:21:15 straight line which is what we've I've
  • 00:21:18 talked about in my other machine
  • 00:21:19 learning courses so if this is kind of
  • 00:21:21 going a little bit over your head
  • 00:21:23 it may be my lack of explaining it I'd
  • 00:21:25 love to hear in the comments below how
  • 00:21:26 you think this explanation but
  • 00:21:28 essentially that's what the activation
  • 00:21:29 function does now another activation
  • 00:21:31 function that is very popular and is
  • 00:21:33 actually used way more than sigmoid
  • 00:21:34 nowadays is known as rectified linear
  • 00:21:37 unit and what this does is it having
  • 00:21:40 taught in red actually so if we can see
  • 00:21:42 it better is it takes all of the values
  • 00:21:44 that are negative and automatically puts
  • 00:21:46 them to zero and takes all of the values
  • 00:21:49 that are positive and just makes them
  • 00:21:51 more positive essentially or like to
  • 00:21:54 some level positive right
  • 00:21:55 and what this again is gonna do is it's
  • 00:21:58 a nonlinear function so it's going to
  • 00:21:59 enhance the complexity of our model and
  • 00:22:02 just make our data points in between the
  • 00:22:05 range zero and positive infinity which
  • 00:22:07 is better than having between negative
  • 00:22:08 infinity and positive infinity for when
  • 00:22:10 were calculating air all right
  • 00:22:14 last thing to talk about for neural
  • 00:22:15 networks in this video I'm trying to
  • 00:22:16 kind of get everything like briefly into
  • 00:22:18 one long video is a loss function so
  • 00:22:23 this is again gonna help us understand
  • 00:22:25 how these weights and these biases are
  • 00:22:27 actually adjusted so we know that
  • 00:22:29 they're adjusted and we know that what
  • 00:22:30 we do is we look at the output and we
  • 00:22:32 compare it to what the output should be
  • 00:22:35 from our test data and then we say okay
  • 00:22:38 well it's adjust the weights and the
  • 00:22:39 biases accordingly but how do we adjust
  • 00:22:41 that and how do we know how far off we
  • 00:22:44 are how much to tune by if an adjustment
  • 00:22:47 even needs to be made well we use what's
  • 00:22:49 known as a loss function so a loss
  • 00:22:52 function essentially is a way of
  • 00:22:54 calculating error now there's a ton of
  • 00:22:56 different loss loss functions some of
  • 00:22:58 them are like mean squared error that's
  • 00:23:00 the name of one of them I think one is
  • 00:23:01 like I can't even remember the name of
  • 00:23:05 this one but there's there's a bunch of
  • 00:23:06 very popular ones if you know some leave
  • 00:23:08 them in the comments love to hear all
  • 00:23:09 the different ones but anyways what the
  • 00:23:12 loss function will do is tell you how
  • 00:23:14 wrong your answer is because like let's
  • 00:23:18 think about this right if you get an
  • 00:23:19 answer of let's say maybe our output is
  • 00:23:22 like zero point seven nine and the
  • 00:23:24 actual answer was one well that's pretty
  • 00:23:27 close
  • 00:23:27 like that's pretty close to one but
  • 00:23:29 right now all we're gonna get is the
  • 00:23:30 fact that we were zero point two one off
  • 00:23:33 okay so zero point two went off so
  • 00:23:35 adjust the weights a certain degree
  • 00:23:36 based on zero point two one but the
  • 00:23:38 thing is what if we get like zero point
  • 00:23:42 eight five
  • 00:23:44 well is this like this is significantly
  • 00:23:46 better than zero point seven nine but
  • 00:23:47 this is only gonna say that we were
  • 00:23:49 better by what is this is zero point one
  • 00:23:52 five so we're still gonna do significant
  • 00:23:54 amount of justing to the weights and the
  • 00:23:55 biases so what we need to do is need to
  • 00:23:57 apply a loss function to this that will
  • 00:23:59 give us a better kind of degree of like
  • 00:24:03 how wrong or how right we were now these
  • 00:24:05 loss functions are again not linear loss
  • 00:24:09 functions which means that we're gonna
  • 00:24:10 add a higher degree of complexity to our
  • 00:24:12 model which will allow us to create way
  • 00:24:15 more complex models and neural networks
  • 00:24:17 that can solve better problems I don't
  • 00:24:19 really want to talk about loss functions
  • 00:24:20 too much because I'm definitely no
  • 00:24:22 expert on how they work but essentially
  • 00:24:25 what you do is you're comparing the
  • 00:24:26 output to the
  • 00:24:28 what the output should be so like
  • 00:24:29 whatever the model generated based what
  • 00:24:31 it should be and then you're gonna get
  • 00:24:32 some value and based on that value you
  • 00:24:34 are going to adjust the biases and the
  • 00:24:36 weights accordingly the reason we use go
  • 00:24:38 loss function again is because we want a
  • 00:24:40 higher degree of complexity they're
  • 00:24:41 nonlinear and you know if you get zero
  • 00:24:43 if you're 99% like say your point one
  • 00:24:46 away from the correct answer we probably
  • 00:24:48 want to adjust the weights very very
  • 00:24:50 little but if you're like way off the
  • 00:24:53 answer your two whole points may be our
  • 00:24:55 answer is negative one we want it to be
  • 00:24:56 one
  • 00:24:57 well we want to adjust the model like
  • 00:24:59 crazy right because that model was
  • 00:25:00 horribly wrong it wasn't even close so
  • 00:25:02 we would adjust it way more than just
  • 00:25:04 like two points of adjustment right we
  • 00:25:07 adjust it based on whatever that loss
  • 00:25:09 function gave to us so anyways this has
  • 00:25:12 kind of been my explanation of a neural
  • 00:25:14 network I want to Barrett I want to stay
  • 00:25:16 right here for everyone that I am no pro
  • 00:25:18 on neural networks this is my
  • 00:25:20 understanding there might be some stuff
  • 00:25:21 that's a little bit flawed or some areas
  • 00:25:23 that I skipped over and quickly actually
  • 00:25:27 because I know some people probably
  • 00:25:28 gonna say this when you're creating
  • 00:25:29 neural networks as well you have another
  • 00:25:32 thing that is called hidden layers so
  • 00:25:34 right now we've only been using two
  • 00:25:36 layers but in most neural networks what
  • 00:25:38 you have is a ton of different input
  • 00:25:40 neurons that connect to what's known as
  • 00:25:42 a hidden layer or multiple hidden layers
  • 00:25:44 of neurons so let's say we have like an
  • 00:25:46 architecture maybe that looks something
  • 00:25:47 like this so all these connections and
  • 00:25:50 then these ones connect to this and what
  • 00:25:52 this allows you to do is have way more
  • 00:25:54 complex models that can solve way more
  • 00:25:56 difficult problems because you can
  • 00:25:58 generate different combinations of
  • 00:26:00 inputs and he didn't what is known as
  • 00:26:02 hidden layered neurons to solve your
  • 00:26:05 problem and have more weights and more
  • 00:26:06 biases to adjust which means you can on
  • 00:26:09 average be more accurate to produce
  • 00:26:12 certain models so you can have crazy
  • 00:26:14 neural networks that look something like
  • 00:26:16 this but with way more neurons and way
  • 00:26:18 more layers and all this kind of stuff I
  • 00:26:20 just wanted to show a very basic network
  • 00:26:23 today because I didn't want to go in and
  • 00:26:25 talk about like a ton of stuff
  • 00:26:27 especially cuz I know a lot of people
  • 00:26:28 that watch my videos are not pro math
  • 00:26:31 guys are just trying to get a basic
  • 00:26:32 understanding and be able to implement
  • 00:26:34 some of this stuff
  • 00:26:39 now in today's video what we're gonna be
  • 00:26:40 doing is actually getting our hands
  • 00:26:42 dirty and working with a bit of code and
  • 00:26:43 loading in our first data set so we're
  • 00:26:46 not actually gonna do anything with the
  • 00:26:47 model right now we're gonna do that in
  • 00:26:49 the next video
  • 00:26:49 this video is gonna be dedicated to
  • 00:26:51 understanding data the importance of
  • 00:26:53 data how we can scale that data look at
  • 00:26:55 it and understand how that's going to
  • 00:26:57 affect our model when training the most
  • 00:27:00 important part of machine learning at
  • 00:27:01 least in my opinion is the data and it's
  • 00:27:04 also one of the hardest things to
  • 00:27:05 actually get done correctly training the
  • 00:27:08 model and testing the model and using it
  • 00:27:10 is actually very easy and you guys will
  • 00:27:11 see that as we go through but getting
  • 00:27:13 the right information to our model and
  • 00:27:15 having it in the correct form is
  • 00:27:17 something that is way more challenging
  • 00:27:19 than it may seem with these initial data
  • 00:27:21 sets that we're gonna work with things
  • 00:27:23 are gonna be very easy because the
  • 00:27:24 datasets are gonna be given to us but
  • 00:27:26 when we move on into future videos –
  • 00:27:27 using our own data we're gonna have to
  • 00:27:29 pre-process it we're gonna have to put
  • 00:27:31 it in its correct form we're gonna have
  • 00:27:32 to send it into an array I'm gonna have
  • 00:27:35 to make sure that the data makes sense
  • 00:27:36 so we're not adding things that
  • 00:27:37 shouldn't be there or we're not omitting
  • 00:27:39 things that need to be there so anyways
  • 00:27:41 I'm just gonna quickly say here that I
  • 00:27:43 am kind of working off of this
  • 00:27:45 tensorflow 2.0 tutorial that is on
  • 00:27:47 tensor flows website now I'm kind of
  • 00:27:50 gonna stray from it quite a bit to be
  • 00:27:52 honest but I'm just using the data sets
  • 00:27:54 that they have and a little bit of the
  • 00:27:55 code that they have here because it's a
  • 00:27:57 very nice introduction to machine
  • 00:27:59 learning and neural networks but there's
  • 00:28:01 a lot of stuff in here that they don't
  • 00:28:02 talk about and it's not very in-depth so
  • 00:28:05 that's what I'm can I get to be adding
  • 00:28:06 and the reason why maybe you'd want to
  • 00:28:07 watch my version of this as opposed to
  • 00:28:09 just reading this off the website
  • 00:28:10 because if you have no experience with
  • 00:28:12 neural networks it is kind of confusing
  • 00:28:15 some of the stuff they do here and they
  • 00:28:16 don't really talk about why they use
  • 00:28:18 certain things or whatnot so anyways the
  • 00:28:20 data set we're gonna be working with
  • 00:28:21 today is known as the fashion m-miss
  • 00:28:24 data set so you may have heard of the
  • 00:28:26 old does so which is image image
  • 00:28:29 classification but it was like digits so
  • 00:28:31 like you had digits from 0 to 9 and the
  • 00:28:33 neural network about classified digits
  • 00:28:35 this one's very similar principle except
  • 00:28:37 we're gonna be doing it with like
  • 00:28:38 t-shirts and pants and
  • 00:28:41 what-do-you-call-it sandals and all that
  • 00:28:43 so these are kind of some examples what
  • 00:28:44 the images look like and we'll be
  • 00:28:46 showing them as well in in the code so
  • 00:28:49 that's enough about it I felt like I
  • 00:28:50 should tell you guys that the first
  • 00:28:52 thing that we
  • 00:28:52 doing before we can actually start
  • 00:28:55 working with tensorflow is we obviously
  • 00:28:56 need to install it now actually maybe
  • 00:28:58 I'll grab the install command here so
  • 00:29:00 I'll have to copy it but this is the
  • 00:29:02 install command for tensorflow 2.0 so
  • 00:29:05 I'm just gonna copy it here link will be
  • 00:29:06 in the description as well as on my
  • 00:29:07 website and you can see pink pip install
  • 00:29:10 – q tensorflow equals equals 2.0 point o
  • 00:29:13 – alpha zero now I already have this
  • 00:29:16 installed but I'm gonna go ahead and hit
  • 00:29:17 enter anyways in the – Q I believe just
  • 00:29:20 means don't give any output when you're
  • 00:29:22 installing so if this runs and you don't
  • 00:29:25 see any output whatsoever then you have
  • 00:29:27 successfully installed tensorflow 2.0
  • 00:29:29 now I ran into an issue where I couldn't
  • 00:29:31 install it because I had a previous
  • 00:29:33 version of numpy installed in my system
  • 00:29:34 so if for some reason this doesn't work
  • 00:29:37 and there's something with numpy I would
  • 00:29:39 just PIP uninstall numpy and reinstall
  • 00:29:42 so do pip uninstall numpy like that I'm
  • 00:29:45 always not gonna run that but if you did
  • 00:29:47 that and then you tried to reinstall
  • 00:29:49 town photo 2.0 that should work for you
  • 00:29:51 and it should actually install its own
  • 00:29:52 version of the most updated version of
  • 00:29:54 numpy now another thing we're going to
  • 00:29:56 install here is going to be matplotlib
  • 00:29:59 now matplotlib is a nice library for
  • 00:30:02 just graphing and showing images and
  • 00:30:04 different information that we'll use a
  • 00:30:05 lot through this series so let's install
  • 00:30:07 that I already have it installed but go
  • 00:30:09 ahead and do that and then finally we
  • 00:30:11 will install pandas which we may be
  • 00:30:14 using in later videos in the series so I
  • 00:30:16 figured we might as well install it now
  • 00:30:17 so pip install pandas and once you've
  • 00:30:20 done that you should be ready to
  • 00:30:21 actually go here and start getting our
  • 00:30:23 data loaded in and looking at the data
  • 00:30:26 so I'm just gonna be working on sub line
  • 00:30:28 text and executing my Python files from
  • 00:30:31 the command line just because this is
  • 00:30:33 something that will work for everyone no
  • 00:30:35 matter what but feel free to work in
  • 00:30:36 ideally feel for you to work in PI tram
  • 00:30:38 as long as you understand how to set up
  • 00:30:40 your environment so that you have the
  • 00:30:42 necessary packages like tensorflow and
  • 00:30:44 all that then you should be good to go
  • 00:30:46 so let's start by importing tensorflow
  • 00:30:48 so import tensorflow as TF like that I
  • 00:30:52 don't know why it always short forms
  • 00:30:55 when I try to do this but anyways we're
  • 00:30:56 gonna import or actually sorry from
  • 00:30:59 tensorflow
  • 00:31:00 will import care as now Kara's is an API
  • 00:31:05 for tensorflow which essentially
  • 00:31:06 allows us to write less code it does a
  • 00:31:10 lot of stuff for us like you'll see when
  • 00:31:12 we set up the model we use Kara's and
  • 00:31:14 it'll be really nice and simple and just
  • 00:31:16 like a high-level API that's the way
  • 00:31:18 that they describe it that makes things
  • 00:31:19 a lot easier for people like us that
  • 00:31:21 aren't going to be defining our own
  • 00:31:22 tensors and writing our own code from
  • 00:31:25 scratch essentially now another thing we
  • 00:31:27 need to import is numpy so we're going
  • 00:31:29 to say import if I could get this here
  • 00:31:33 import numpy as NP and finally we will
  • 00:31:37 import matplotlib so Matt plot lib in
  • 00:31:41 this case thought pipe lot as P ot and
  • 00:31:45 this again is just going to allow us to
  • 00:31:47 graph some things here all right so now
  • 00:31:49 what we're gonna do is we're actually
  • 00:31:50 gonna get our data set loaded in so the
  • 00:31:52 way that we can load in our data set is
  • 00:31:54 using Chara's so to do this I'm just
  • 00:31:56 gonna say data equals in this case
  • 00:31:57 Kara's dot data sets dot fashion
  • 00:32:01 underscore m-miss and this is just the
  • 00:32:05 name of the data set there's a bunch of
  • 00:32:06 other data sets inside of Kara's that we
  • 00:32:08 will be using in the future now whenever
  • 00:32:11 we have data it's very important that we
  • 00:32:13 split our data into testing and training
  • 00:32:16 data now you may have heard this you
  • 00:32:18 talked about this in the previous
  • 00:32:20 machine learning tutorials I did but
  • 00:32:22 essentially what you want to do with any
  • 00:32:23 kind of machine learning algorithm
  • 00:32:25 especially a neural network is you don't
  • 00:32:27 want to pass all of your data into the
  • 00:32:29 network when you train it you want to
  • 00:32:31 pass about 90 80 percent of your data to
  • 00:32:34 the network to train it and then you
  • 00:32:36 want to test the network for accuracy
  • 00:32:38 and making sure that it works properly
  • 00:32:39 on the rest of your data that it hasn't
  • 00:32:41 seen yet
  • 00:32:42 now the reason you'd want to do this and
  • 00:32:45 a lot of people would say why don't I
  • 00:32:46 just give all my dad's the network and
  • 00:32:48 it'd make it better not necessarily and
  • 00:32:50 that's because if you test your data on
  • 00:32:53 if you test your network on data it's
  • 00:32:55 already seen then you can't be sure that
  • 00:32:58 it's not just simply memorizing the data
  • 00:33:00 it's seen right for example if you show
  • 00:33:02 me five images and then like you tell me
  • 00:33:05 the classes of all of them and then you
  • 00:33:06 show me that the same image again you
  • 00:33:08 say what's the class and I get it right
  • 00:33:10 well did I get it right because I
  • 00:33:12 figured out how to analyze the images
  • 00:33:13 properly or because I'd already seen it
  • 00:33:15 and I knew what it was right I just
  • 00:33:17 memorized what it was so that's
  • 00:33:19 something we wanted
  • 00:33:20 try to avoid with our models so whenever
  • 00:33:22 we have our data we're gonna split it up
  • 00:33:23 into testing and training data and
  • 00:33:25 that's what we're gonna do right here so
  • 00:33:26 to do this I'm going to say train in
  • 00:33:29 this case train underscore images and
  • 00:33:31 trained under store the labels combi in
  • 00:33:35 this case test underscore images
  • 00:33:38 call-out test underscore in labels and
  • 00:33:41 then we say this is equal to data don't
  • 00:33:43 get underscore data so not get load
  • 00:33:46 unders we're done now the reason we can
  • 00:33:49 do this is just because this load data
  • 00:33:50 method is gonna return information in a
  • 00:33:52 way where we can kind of split it up
  • 00:33:54 like this in most cases when you're
  • 00:33:55 writing your own models for your own
  • 00:33:57 data you're gonna have to write your own
  • 00:33:59 arrays and for loops and load and data
  • 00:34:01 and do all this fancy stuff but Chara's
  • 00:34:03 makes it nice and easy for us just by
  • 00:34:05 allowing us to write this line here
  • 00:34:06 which will get us our training and
  • 00:34:08 testing data in the for kind of
  • 00:34:11 variables that we need so quickly let me
  • 00:34:14 talk about what labels are now so for
  • 00:34:16 this specific data set there are ten
  • 00:34:18 labels and that means each image that we
  • 00:34:20 have will have a specific label assigned
  • 00:34:22 to it now if I actually I'll show you by
  • 00:34:25 just printing out if I print for example
  • 00:34:27 train underscore labels and let's just
  • 00:34:29 print like these zero if I guess the
  • 00:34:31 first training label so let me just run
  • 00:34:33 this file so python tutorial 1 you can
  • 00:34:38 see that we simply get the number 9 now
  • 00:34:41 this is just what is represented like
  • 00:34:43 the label representation so obviously
  • 00:34:46 it's not giving us a string but let's
  • 00:34:47 say if I pick for example 6 and I hit
  • 00:34:50 enter here you can see that the label is
  • 00:34:52 7 so the labels are between 0 & 9 so 10
  • 00:34:57 labels in total another thing is that's
  • 00:35:00 not very useful to us because we don't
  • 00:35:01 really know what label 0 is will label 9
  • 00:35:03 is so what I'm gonna do is create a list
  • 00:35:05 that will actually define what those
  • 00:35:07 labels are so I'm gonna have to copy it
  • 00:35:10 from here because I actually don't
  • 00:35:11 remember the labels but you can see it
  • 00:35:13 says here what they are so for example
  • 00:35:16 the label 0 is a t-shirt label 1 is a
  • 00:35:19 trouser 9 is an ankle boot and you can
  • 00:35:21 see what they all are
  • 00:35:22 so we just need to define exactly this
  • 00:35:24 list here so class names so that we can
  • 00:35:26 simply take whatever value is returned
  • 00:35:28 to us from the model of what label it
  • 00:35:30 thinks it is and then just throw that as
  • 00:35:32 an index to this list so we
  • 00:35:34 get what label dis all right sweet so
  • 00:35:37 that is how we're getting the data now
  • 00:35:40 so now I want to show you what some of
  • 00:35:42 these images look like and talk about
  • 00:35:44 the architecture of the neural network
  • 00:35:46 we might use in the next video
  • 00:35:48 so I'm gonna use PI plot just to show
  • 00:35:51 you some of these images and explain
  • 00:35:53 kind of the input and the output and all
  • 00:35:55 that so if I if you want to show an
  • 00:35:56 image using matplotlib you can do this
  • 00:35:58 by just doing PLT imshow and then in
  • 00:36:02 here simply putting the image so for
  • 00:36:03 example if i do train not labels images
  • 00:36:07 and let's say we do the seventh image
  • 00:36:09 and then I do PLT dot show if I run this
  • 00:36:13 now you guys will see what this image is
  • 00:36:16 so let's run this and you can see that
  • 00:36:19 we get this is actually I believe like a
  • 00:36:21 pullover or a hoodie now I know it looks
  • 00:36:23 weird and you've got all this like green
  • 00:36:26 and purple that's just because of the
  • 00:36:28 way that kind of matplotlib shows these
  • 00:36:30 images if you want to see it properly
  • 00:36:31 what you do is I believe you do see map
  • 00:36:33 equals in this case
  • 00:36:35 PLT dot see it I think it's like cm de
  • 00:36:39 binary or something I gotta have a look
  • 00:36:41 here because I forget yeah CMB binary so
  • 00:36:44 if we do this and now we decide to
  • 00:36:46 display the image it should look a
  • 00:36:47 little bit better let's see here
  • 00:36:50 and there you go we can see now we're
  • 00:36:52 actually getting this like black and
  • 00:36:53 white kind of image now this is great
  • 00:36:56 and all but let me show you actually
  • 00:36:57 what our image looks like so like how
  • 00:36:59 was I just able to show like how was I
  • 00:37:01 just able to do this image well the
  • 00:37:03 reason I'm able to do that is because
  • 00:37:04 all of our images are actually a raise
  • 00:37:06 of 28 by 28 pixels so let me print one
  • 00:37:09 out for you here so if I do train
  • 00:37:11 underscore images let's do seven the
  • 00:37:13 same example here and print that to the
  • 00:37:14 screen I'll show you what the data
  • 00:37:16 actually looks like give it a second and
  • 00:37:20 there we go so you can see this is
  • 00:37:21 obviously what our data looks like it's
  • 00:37:23 just a bunch of lists
  • 00:37:25 so one list for each row and it just has
  • 00:37:27 pixel values and these pixel values are
  • 00:37:30 simply representative of I believe like
  • 00:37:32 how much I don't actually know the scale
  • 00:37:35 that they're on but I think it's like an
  • 00:37:37 RGB value but in greyscale right so for
  • 00:37:40 example we have like 0 to 255 where 255
  • 00:37:43 is black and 0 is white and I'm pretty
  • 00:37:46 sure that's how getting
  • 00:37:47 the information in someone can correct
  • 00:37:49 me if I'm wrong but I'm almost certain
  • 00:37:50 that that's how this actually works so
  • 00:37:52 this is gray Knoll but this is these are
  • 00:37:55 large numbers and remember I was saying
  • 00:37:57 before in the previous video that's
  • 00:37:59 typically a good idea to shrink our data
  • 00:38:01 down so that it's with it within a
  • 00:38:03 certain range that is a bit smaller so
  • 00:38:05 in this case what I'm actually going to
  • 00:38:07 do is I'm gonna modify this information
  • 00:38:09 a little bit so that we only have each
  • 00:38:11 value out of one so we instead of having
  • 00:38:14 no 255 we have it out of one so the way
  • 00:38:16 to do that is to divide every single
  • 00:38:18 pixel value by 255 now because these
  • 00:38:21 trained images are actually stored in
  • 00:38:24 what's known as a numpy array we can
  • 00:38:26 simply just divide it by 255 to achieve
  • 00:38:31 that so we'll say train images equals
  • 00:38:32 train images / 255 and we'll do the same
  • 00:38:35 thing here with our test images as well
  • 00:38:37 now obviously we don't have to modify
  • 00:38:39 the labels as well also because they're
  • 00:38:41 just between 0 & 9 and that's how the
  • 00:38:43 labels work but for our images we're
  • 00:38:45 going to divide those values so that
  • 00:38:47 it's a bit nicer so now let me show you
  • 00:38:49 what it looks like so if I go python
  • 00:38:50 tutorial 1 del pine and now you can see
  • 00:38:54 that we're getting these decimal values
  • 00:38:55 and that our shirt looks well the same
  • 00:38:57 but exactly like we've just shrunk down
  • 00:39:00 our data so it's gonna be easier to work
  • 00:39:01 with in the future with our model now
  • 00:39:04 that's about it I think that I'm gonna
  • 00:39:05 show you guys in terms of this data now
  • 00:39:08 we have our data loaded in and we're
  • 00:39:09 pretty much ready to go in terms of
  • 00:39:11 making a model now if you have any
  • 00:39:13 questions about the data please don't
  • 00:39:14 hesitate to leave a comment down below
  • 00:39:16 but essentially again the way it works
  • 00:39:18 is we're gonna have 28 by 28 pixel
  • 00:39:20 images and they're gonna come in as an
  • 00:39:22 array just as I've showed you here so
  • 00:39:23 these are all the values that we're
  • 00:39:25 gonna have we're gonna pass that to our
  • 00:39:26 model and then our model is gonna spit
  • 00:39:28 out what class it thinks it is and those
  • 00:39:30 classes are going to be between 0 & 9
  • 00:39:32 obviously 0 is going to represent a
  • 00:39:34 t-shirt where 9 is going to represent
  • 00:39:35 ankle boots and we will deal with that
  • 00:39:38 all in the next video
  • 00:39:43 now in today's video we're actually
  • 00:39:45 gonna be working with the neural network
  • 00:39:47 so we're gonna be setting up a model
  • 00:39:48 we're gonna be training that model we're
  • 00:39:49 gonna be testing that model to see how
  • 00:39:51 well it performed we will also use it to
  • 00:39:53 predict on individual images and all of
  • 00:39:56 that fun stuff so without further ado
  • 00:39:58 let's get started now the first thing
  • 00:40:00 that I want to do before we really get
  • 00:40:02 into actually writing any code is talk
  • 00:40:04 about the architecture of the neural
  • 00:40:05 network we're going to create now I
  • 00:40:07 always found in tutorials that I watch
  • 00:40:09 they never really explained exactly what
  • 00:40:11 the layers were doing what they looked
  • 00:40:13 like and why we chose such layers and
  • 00:40:16 that's what I'm hoping to give to you
  • 00:40:18 guys right now so if you remember before
  • 00:40:20 we know now that our images they come in
  • 00:40:22 essentially as like 28 by 28 pixels and
  • 00:40:25 the way that we have them is we have an
  • 00:40:27 array and we have another array inside
  • 00:40:30 it's like a two dimensional array and
  • 00:40:31 house pixel values so maybe it's like
  • 00:40:32 0.1 0.3 which is the grayscale value and
  • 00:40:37 this goes and there's times 28 and each
  • 00:40:39 row of these these pixels now there's 28
  • 00:40:43 rows obviously because well 28 by 28
  • 00:40:45 pixels so in here again we have the same
  • 00:40:48 thing more pixel values and we go down
  • 00:40:51 28 times right and that's what we have
  • 00:40:54 and that's what our array looks like now
  • 00:40:56 that's what our input data is that's
  • 00:40:57 fine but this isn't really gonna work
  • 00:40:59 well for our neural network what are we
  • 00:41:02 gonna do we're gonna have one neuron and
  • 00:41:03 we're just gonna pass this whole thing
  • 00:41:05 to it I don't think so that's not gonna
  • 00:41:06 work very well so what we need to
  • 00:41:08 actually do before we can even like
  • 00:41:10 start talking about the neural network
  • 00:41:12 is figure out a way that we can change
  • 00:41:15 this information into a way that we can
  • 00:41:16 give it to the neural network so what
  • 00:41:18 I'm actually gonna do and what I mean
  • 00:41:21 most people do is they they do what's
  • 00:41:23 called flatten the data so actually
  • 00:41:25 maybe we'll go I can't even go back once
  • 00:41:27 I clear it but flattening the data
  • 00:41:28 essentially is taking any like interior
  • 00:41:31 list so let's say we have like lists
  • 00:41:32 like this and just like squishing them
  • 00:41:34 all together so rather than so let's say
  • 00:41:37 this is like 1 2 3 if we were to flatten
  • 00:41:40 this what we would do is while we remove
  • 00:41:43 all of these interior arrays or lists or
  • 00:41:46 whatever it is so we would just end up
  • 00:41:47 getting data it looks like 1 2 3 and
  • 00:41:50 this actually turns out to work just
  • 00:41:53 fine for us so in this instance we only
  • 00:41:55 had like 1
  • 00:41:56 in each array but when we're dealing
  • 00:41:58 with 28 elements in each sorry list
  • 00:42:01 listen array they're interchangeable
  • 00:42:02 just in case I keep saying those what
  • 00:42:05 will essentially have is will flatten
  • 00:42:07 the data so we get a list of length 784
  • 00:42:12 and I believe that is because well I
  • 00:42:14 mean I know this house because 28 times
  • 00:42:16 28 equals 784 so when we flatten that
  • 00:42:20 data so 28 rows of 28 pixels then we end
  • 00:42:23 up getting 784 pixels just one after
  • 00:42:26 each other and that's what we're gonna
  • 00:42:27 feed in as the input to our neural
  • 00:42:29 network so that means that our initial
  • 00:42:32 input layer is gonna look something like
  • 00:42:33 this we're gonna have a bunch of neurons
  • 00:42:35 and they're gonna go all the way down so
  • 00:42:37 we're gonna have 784 neurons so let's
  • 00:42:40 say this is 7 8 4 I know you could
  • 00:42:44 probably hardly read that but you get
  • 00:42:45 the point and this is our input layer
  • 00:42:47 now before we even talk about any kind
  • 00:42:50 of hidden layers let's talk about our
  • 00:42:52 output layer so what is our output
  • 00:42:54 well our output is gonna be a number
  • 00:42:56 between 0 and 9 ideally that's what we
  • 00:42:59 want so what we're actually gonna do for
  • 00:43:01 our output layer is rather than just
  • 00:43:02 having one neuron that we use kind of in
  • 00:43:04 the last the two videos ago as an
  • 00:43:06 example is we're actually gonna have 10
  • 00:43:08 neurons each one representing one of
  • 00:43:11 these different classes right so we have
  • 00:43:13 0 to 9
  • 00:43:14 so obviously 10 neurons or 10 classes so
  • 00:43:17 let's have 10 neurons so 1 2 3 4 5 6 7 8
  • 00:43:21 9 10 now what's gonna happen with these
  • 00:43:25 neurons is each one of them is going to
  • 00:43:28 have a value and that value is gonna
  • 00:43:30 represent how much the network thinks
  • 00:43:32 that it is each neuron so for example
  • 00:43:35 say we're classifying the image that
  • 00:43:38 looks like a t-shirt or maybe like a
  • 00:43:40 pair of pants so those are pretty easy
  • 00:43:42 to draw so let's say this is the image
  • 00:43:43 we're given a little pair of pants
  • 00:43:44 what's gonna happen is let's say pants
  • 00:43:47 is like this one like this is the one it
  • 00:43:49 actually should be all of these will be
  • 00:43:51 lit up a certain amount so essentially
  • 00:43:53 maybe we'll say like we think it's 0.05
  • 00:43:56 percent this we have like a degree of
  • 00:43:59 certainty that it's 10 percent this one
  • 00:44:01 and then it is like we think it's 75
  • 00:44:04 percent pants so what we'll do when we
  • 00:44:06 are looking at this output layer is
  • 00:44:08 essentially we'll just find whatever
  • 00:44:10 is the greatest so whatever probability
  • 00:44:12 is the greatest and then say that's the
  • 00:44:13 one that the network predicts is the
  • 00:44:16 class of the given object right so when
  • 00:44:19 we're training the network what we'll do
  • 00:44:20 essentially is we'll say okay well we're
  • 00:44:23 giving the pants so we know that this
  • 00:44:26 one should be one right this should be a
  • 00:44:28 hundred percent it should be one that's
  • 00:44:30 what it should be and all these other
  • 00:44:31 ones should be zero right because it
  • 00:44:33 should be a zero percent chance it's
  • 00:44:35 anything else because we know that it is
  • 00:44:36 pants and then that work will look at
  • 00:44:38 all this and adjust all the weights and
  • 00:44:40 biases accordingly so that we get it so
  • 00:44:42 that it lights this one up directly as
  • 00:44:44 one at least that's our goal right so
  • 00:44:46 once we do that so now we've talked with
  • 00:44:49 the input layer and the output layer now
  • 00:44:51 it's time to talk about our hidden
  • 00:44:52 layers so we could technically train a
  • 00:44:55 network that would just be two layers
  • 00:44:57 right and we just have all these inputs
  • 00:44:58 that go to some kind of outputs but that
  • 00:45:00 wouldn't really do much for us because
  • 00:45:03 essentially that would just mean we're
  • 00:45:04 just gonna look at all the pixels and
  • 00:45:05 based on that configuration of pixels
  • 00:45:08 will point to you know these output
  • 00:45:10 layers and that means we're only gonna
  • 00:45:12 have which I know it sounds only 784
  • 00:45:15 times 10 weights and biases so 784 times
  • 00:45:18 10 which means that we're only gonna
  • 00:45:19 have 7840 weights right weights and
  • 00:45:23 biases things to address so what we're
  • 00:45:26 actually gonna do is we're gonna add a
  • 00:45:27 hidden layer inside of here now you can
  • 00:45:31 kind of arbitrarily arbitrarily pick how
  • 00:45:33 many neurons you're gonna have in your
  • 00:45:35 hidden layer it's a good idea to kind of
  • 00:45:37 go off based on percentages from your
  • 00:45:38 input layer but what we're gonna have is
  • 00:45:40 we're gonna have a hidden layer and in
  • 00:45:43 this case this hidden layer is gonna
  • 00:45:44 have a hundred and twenty eight neurons
  • 00:45:46 so we'll say this is 128 and this is
  • 00:45:49 known as our hidden layer so what will
  • 00:45:52 happen now is we're gonna have our
  • 00:45:53 inputs connecting to the hidden layer so
  • 00:45:56 fully connected and then the hidden
  • 00:45:58 layer will be connected to all of our
  • 00:46:00 output neurons which will allow for much
  • 00:46:02 more complexity of our network because
  • 00:46:04 we're gonna have a ton more biases and a
  • 00:46:07 ton more weights connecting to this
  • 00:46:08 middle layer which maybe we'll be able
  • 00:46:10 to figure out some patterns like maybe
  • 00:46:12 to look for like a straight line that
  • 00:46:14 looks like a pant sleeve it looks like
  • 00:46:16 an arm sleeve maybe they'll look for
  • 00:46:18 concentration of a certain area in the
  • 00:46:21 picture right and not
  • 00:46:22 we're hoping that our hidden lair will
  • 00:46:24 maybe be able to do for us maybe pick on
  • 00:46:26 pick up on some kind of patterns and
  • 00:46:28 then maybe with these combination of
  • 00:46:30 patterns we can pick out what specific
  • 00:46:33 image it actually is now we don't really
  • 00:46:36 know what the hidden network or hidden
  • 00:46:38 layer is gonna do we just kind of have
  • 00:46:40 some hopes for it and by picking 128
  • 00:46:42 neurons we're saying okay we're going to
  • 00:46:44 allow this hidden layer to kind of
  • 00:46:46 figure its own way out and figure out
  • 00:46:48 some way of analyzing this image and
  • 00:46:50 then that's essentially what we're gonna
  • 00:46:52 do so if you have any questions about
  • 00:46:54 that please do not hesitate to ask but
  • 00:46:57 the hidden layers are pretty arbitrary
  • 00:46:59 sorry I just dropped my pen which means
  • 00:47:01 that you know you can kind of experiment
  • 00:47:03 with them kind of tweak with them
  • 00:47:04 there's some that are known to be to do
  • 00:47:06 well but typically when you're picking a
  • 00:47:08 hidden layer you pick one and you
  • 00:47:10 typically go at like maybe 15-20 percent
  • 00:47:12 of the input size but again it really
  • 00:47:14 depends on the application that you're
  • 00:47:16 you're using so let's now actually just
  • 00:47:18 start working with our data and creating
  • 00:47:22 a model so if we want to create a model
  • 00:47:24 the first thing that we need to do is
  • 00:47:26 define the architecture or the layers
  • 00:47:28 for a model and that's what we've just
  • 00:47:29 done so I'm gonna type it out fairly
  • 00:47:31 quickly here and again you guys will see
  • 00:47:33 how this works so I'm gonna say model
  • 00:47:35 equals in this case Kara's dot
  • 00:47:38 sequential believe that's how you spell
  • 00:47:41 it and then what we're gonna do is
  • 00:47:42 inside here put a list and we're gonna
  • 00:47:44 start defining our different layers so
  • 00:47:46 we're gonna say care s dot layers and
  • 00:47:49 our first layer is gonna be an input
  • 00:47:51 layer but it's gonna be a flattened
  • 00:47:53 input layer and the input underscore
  • 00:47:56 shape is gonna be equal to 28 by 28 so
  • 00:47:59 remember I talked about that initially
  • 00:48:01 what we need to do is well we need to
  • 00:48:03 flatten our data so that it is passable
  • 00:48:06 to all those different neurons right so
  • 00:48:09 essentially I gotta spell shape correct
  • 00:48:10 shape correctly so essentially whenever
  • 00:48:13 you're passing in information that's in
  • 00:48:14 like a 2d or 3d array you need to
  • 00:48:16 flatten that information so that you're
  • 00:48:18 gonna be able to pass it to an
  • 00:48:19 individual neuron as opposed to like
  • 00:48:21 sending a whole list into one neuron
  • 00:48:23 right now the next layer that we're
  • 00:48:25 gonna have is going to be what's known
  • 00:48:27 as a dense layer now a dense layer
  • 00:48:29 essentially just means a fully connected
  • 00:48:31 layer which means that what we've showed
  • 00:48:33 so far which is only fully connected
  • 00:48:35 neural
  • 00:48:35 that works that's what we're gonna have
  • 00:48:37 so each node or each neuron is connected
  • 00:48:39 to every other neuron in the next
  • 00:48:40 network so I'm gonna say layers dot
  • 00:48:42 dense and in this case we're gonna give
  • 00:48:44 it a hundred twenty-eight neurons that's
  • 00:48:46 what we've talked about and we're gonna
  • 00:48:47 set the activation function which we
  • 00:48:49 talked about before as well to be
  • 00:48:51 rectified linear unit now again this
  • 00:48:55 activation function is somewhat
  • 00:48:56 arbitrary in the fact that you can kick
  • 00:48:58 different ones but rectifier linear unit
  • 00:49:00 is a very fast activation function and
  • 00:49:03 it works well for a variety of
  • 00:49:04 applications and that is why we are
  • 00:49:06 picking that now the next layer is gonna
  • 00:49:08 be another dense layer which means
  • 00:49:09 essentially another fully connected
  • 00:49:12 layer sorry and we're gonna have ten
  • 00:49:14 neurons and this is gonna be our output
  • 00:49:16 layer and we're gonna have an activation
  • 00:49:18 of softmax now what softmax does is
  • 00:49:23 exactly what i explained when showing
  • 00:49:25 you that kind of architecture picture it
  • 00:49:27 will pick values for each neurons so
  • 00:49:31 that all of those values add up to one
  • 00:49:33 so essentially it is like the
  • 00:49:34 probability of the network thinking it's
  • 00:49:38 a certain value so it's like I believe
  • 00:49:40 that it's 80 percent this two percent is
  • 00:49:42 five percent this but all of the neurons
  • 00:49:44 there those values will add up to one
  • 00:49:47 and that's what the softmax softmax
  • 00:49:49 function does so that actually means
  • 00:49:51 that we can look at the last layer and
  • 00:49:54 we can see the probability or what the
  • 00:49:56 network thinks for each given class and
  • 00:49:59 say maybe those are two classes that are
  • 00:50:01 like 45% each we can maybe tweak the
  • 00:50:03 output of the network to say like I am
  • 00:50:05 Not sure
  • 00:50:06 rather than predicting a specific value
  • 00:50:08 right all right so now what we're gonna
  • 00:50:10 do is we're gonna just set up some
  • 00:50:12 parameters for our model so I'm gonna
  • 00:50:14 say model compile and in this case we're
  • 00:50:17 gonna use an optimizer of atom now I'm
  • 00:50:19 not really going to talk about the
  • 00:50:20 optimizer Adam is typically like pretty
  • 00:50:23 standard especially for something like
  • 00:50:25 this we're gonna use the loss function
  • 00:50:26 of sparse and in this case underscore in
  • 00:50:29 categorical believe I spelled that
  • 00:50:32 correctly and then cross-entropy
  • 00:50:34 now if you're interested in what these
  • 00:50:36 do and how they work in terms like the
  • 00:50:38 math kind of side of them just look them
  • 00:50:39 up there's their very famous and popular
  • 00:50:41 and there again are somewhat arbitrary
  • 00:50:44 in terms of how you pick them now when I
  • 00:50:47 do metrics I'm gonna say metrics
  • 00:50:49 accuracy and again this is gonna define
  • 00:50:51 what we're looking at when we're testing
  • 00:50:53 the model in this case we care about the
  • 00:50:55 accuracy or how low we can get this loss
  • 00:50:57 function to be so yeah you guys can look
  • 00:51:00 these up there's tons of different loss
  • 00:51:01 functions some of them have different
  • 00:51:03 applications and typically when you're
  • 00:51:05 making a neural network your mess around
  • 00:51:06 with different loss functions different
  • 00:51:08 optimizers and in some cases different
  • 00:51:10 metrics so now it is actually time to
  • 00:51:13 train our model so to train our model
  • 00:51:16 what we're gonna do is model dot fit and
  • 00:51:17 when we fit it all we're gonna do is
  • 00:51:19 give it our train images and our train
  • 00:51:22 labels now we're gonna set the amount of
  • 00:51:26 epochs so now it's time to talk about
  • 00:51:29 epochs now epochs are actually fairly
  • 00:51:31 straightforward you've probably heard of
  • 00:51:32 the word epoch before but essentially it
  • 00:51:34 means how many times the model is gonna
  • 00:51:36 see this information so what an epoch is
  • 00:51:40 gonna do is it's gonna kind of randomly
  • 00:51:42 pick images and labels obviously
  • 00:51:45 correspond to each other and it's gonna
  • 00:51:47 feed that through the neural network so
  • 00:51:49 how many epochs you decide is how many
  • 00:51:52 times you're gonna see the same image so
  • 00:51:54 the reason we do this is because the
  • 00:51:57 order in which images come in will
  • 00:51:59 influence how parameters and things are
  • 00:52:01 tweaked with the network maybe seeing
  • 00:52:03 like 10 images that are pants is gonna
  • 00:52:06 tweak it differently than if it sees
  • 00:52:08 like a few better pants and a few that
  • 00:52:09 are a shirt and some that are sandals so
  • 00:52:12 this is a very simple explanation of how
  • 00:52:14 the epochs work but essentially it just
  • 00:52:16 is giving the same images in a different
  • 00:52:19 order and then maybe if it got one image
  • 00:52:21 wrong it's gonna see it again and be
  • 00:52:22 able to tweak and it's just a way to
  • 00:52:24 increase hopefully the accuracy of our
  • 00:52:27 model that being said giving more epochs
  • 00:52:29 does not always necessarily increase the
  • 00:52:31 accuracy of your model it's something
  • 00:52:33 that you kind of have to play with and
  • 00:52:34 anyone that does any machine learning or
  • 00:52:36 neural networks will tell you that they
  • 00:52:38 can't really like they don't know the
  • 00:52:39 exact number epoch they have to play
  • 00:52:41 with it and tweak it and see what gives
  • 00:52:43 them the best accuracy so anyways now it
  • 00:52:46 is time to actually well we can run this
  • 00:52:48 but let's first get some kind of output
  • 00:52:50 here so I'm gonna actually evaluate this
  • 00:52:52 model directly after we run it so that
  • 00:52:55 we can see how it works on our test data
  • 00:52:57 so right now what this is doing is
  • 00:52:58 actually just training the model on our
  • 00:53:00 training data which means we're tweaked
  • 00:53:02 all the weights and biases we're
  • 00:53:04 applying all those activation functions
  • 00:53:06 and we're defining like a mean function
  • 00:53:08 for the model but if we actually want to
  • 00:53:10 see how this works we can't really just
  • 00:53:13 test it on the training images and
  • 00:53:15 labels for the same reason I talked
  • 00:53:16 about before so we have to test it on
  • 00:53:18 the test images and the test labels and
  • 00:53:20 essentially see how many it gets correct
  • 00:53:23 so the way we do this is we're gonna say
  • 00:53:25 test underscore loss test underscore AC
  • 00:53:29 which stands for accuracy equals model
  • 00:53:31 dot evaluate is that how you spell it
  • 00:53:36 maybe and then we're gonna do test
  • 00:53:38 images test underscore labels and I
  • 00:53:41 believe that is the last parameter yes
  • 00:53:44 it is so now if we want to see the
  • 00:53:45 accuracy of our model we can simply
  • 00:53:48 print out test underscore ACC and we'll
  • 00:53:52 just say like tested ACC just so we know
  • 00:53:56 because there is gonna be some other
  • 00:53:57 metrics that are gonna be printing our
  • 00:53:59 test when we run this all right so now
  • 00:54:02 that we've done that let's actually run
  • 00:54:03 our file and see how this works so this
  • 00:54:07 is it this whole part here is all we
  • 00:54:09 actually need to do to create a neural
  • 00:54:10 network and do a model now actually let
  • 00:54:13 me just quickly say that this Karis not
  • 00:54:15 sequential what this does is it means a
  • 00:54:17 like a sequence of layers so you're just
  • 00:54:20 defining them in order where you say the
  • 00:54:21 first layer obviously is gonna be your
  • 00:54:23 input layer we're flattening the data
  • 00:54:25 then we're adding two dense layers which
  • 00:54:27 are fully connected to the input layer
  • 00:54:29 as well and that's what our model looks
  • 00:54:31 like and this is typically how you go
  • 00:54:34 about creating a neural network all
  • 00:54:36 right so let's run this now and see what
  • 00:54:38 we get so this will take a second or two
  • 00:54:42 to run just because obviously there is
  • 00:54:45 what we have 60,000 images in this data
  • 00:54:48 set so you know it's got a run through
  • 00:54:49 them it's doing all the epochs and you
  • 00:54:51 can see that we're getting metrics here
  • 00:54:53 on our accuracy and our loss
  • 00:54:56 now our test accuracy was 87 percent so
  • 00:54:58 you can see that it's actually slightly
  • 00:55:00 lower than what do you call it like the
  • 00:55:03 accuracy here oh it's the exact same oh
  • 00:55:04 it actually Auto tested on some data
  • 00:55:07 sets but anyways so essentially that is
  • 00:55:11 how this works you can see that the
  • 00:55:13 first five epochs which are these ones
  • 00:55:15 here
  • 00:55:16 ran and they increase typically with
  • 00:55:19 each epoch now again we could try like
  • 00:55:21 10 bucks 20 bucks and see what it does
  • 00:55:23 but there is a point where the more
  • 00:55:25 epochs you do the actual like the less
  • 00:55:27 reliable your model becomes and you can
  • 00:55:30 see that our accuracy was started at
  • 00:55:32 88.9 essentially and that was on like
  • 00:55:36 that's what it said our model accuracy
  • 00:55:37 was when we were training the model but
  • 00:55:39 then once we actually tested it which of
  • 00:55:41 these two lines here it was lower than
  • 00:55:43 the be tested or like the trained
  • 00:55:47 accuracy which shows you that you
  • 00:55:49 obviously have to be testing on
  • 00:55:50 different images because when we tested
  • 00:55:51 it here it said well it was 89% but then
  • 00:55:54 here we only got 87 percent right so
  • 00:55:56 let's do a quick tweak here and just see
  • 00:55:58 what we get maybe if we add like 10
  • 00:55:59 epochs I don't think this will take a
  • 00:56:01 crazy long amount of time so we'll run
  • 00:56:03 this and see maybe if it makes a massive
  • 00:56:05 difference or if it starts leveling out
  • 00:56:07 or it starts going lower or whatnot
  • 00:56:10 let me let this run here for a second
  • 00:56:12 and obviously you can see the tweaked
  • 00:56:14 accuracy as we continue to go I'm
  • 00:56:16 interested to see here we're gonna
  • 00:56:17 increase by much or if it's just kind of
  • 00:56:19 gonna stay at the same level all right
  • 00:56:22 so we're hitting about 90% and let's see
  • 00:56:25 here 91 okay so we got up to 91% but you
  • 00:56:30 can see that it was kind of diminishing
  • 00:56:32 returns as soon as we ended up getting
  • 00:56:33 to about 70 parks even yeah even like a
  • 00:56:36 epochs after this we only increased by a
  • 00:56:39 marginal amount and our accuracy on the
  • 00:56:41 testing data was slightly better but
  • 00:56:44 again for the amount of epochs five
  • 00:56:46 extra epochs it did not give us a five
  • 00:56:48 times better result right so it's
  • 00:56:49 something you got to play with and see
  • 00:56:55 now in today's video what we're gonna be
  • 00:56:57 doing is just simply using our model to
  • 00:56:59 actually predict information on specific
  • 00:57:01 images and see how you actually use the
  • 00:57:03 model I find a lot of tutorials series
  • 00:57:05 don't show you how to actually
  • 00:57:07 practically use the model but what's the
  • 00:57:09 point of creating a model if you can't
  • 00:57:10 use it now quickly before I get too far
  • 00:57:12 into the video I would just like to show
  • 00:57:14 you guys something that I'm super
  • 00:57:15 excited to announce because I've been
  • 00:57:16 waiting for them to come for a long time
  • 00:57:18 and it is the official tech with Tim
  • 00:57:20 mugs so you guys can see them here I
  • 00:57:22 just wanted to quickly show them to you
  • 00:57:24 guys if you'd like to support the
  • 00:57:25 channel and get an awesome looking mug I
  • 00:57:27 actually really like them then you guys
  • 00:57:29 can purchase them just by I believe
  • 00:57:30 underneath the video it shows like the
  • 00:57:32 teespring link um but yeah they're
  • 00:57:34 awesome they look really good and the
  • 00:57:35 reason I've been holding out on showing
  • 00:57:36 them to you guys is cuz I wanted to wait
  • 00:57:38 till I recieved mine
  • 00:57:39 to make sure that it was up to quality
  • 00:57:41 and that it looked good enough to sell
  • 00:57:43 to you guys essentially so if you'd like
  • 00:57:44 to support the channel
  • 00:57:45 um you can get one of those if not
  • 00:57:47 that's fine but if you do decide to buy
  • 00:57:49 one please send me like a DM on Twitter
  • 00:57:51 Instagram or something and let me know
  • 00:57:53 so I can say thank you to you guys so
  • 00:57:54 anyways let's get into the video um so
  • 00:57:58 what I'm gonna do actually is I'm gonna
  • 00:58:00 we need to continually train the model
  • 00:58:03 every time we run the program which I
  • 00:58:05 know seems like a pain but unless we
  • 00:58:08 want to save the model which I guess I
  • 00:58:09 could actually show in this video later
  • 00:58:10 as well we just have to train it and
  • 00:58:13 then we can use it directly after so
  • 00:58:15 after we've you know tested this we
  • 00:58:17 don't need to do this evaluate any more
  • 00:58:19 we are trained the model we can use it
  • 00:58:20 to use it we actually just need to use a
  • 00:58:22 method called predict but I'm gonna talk
  • 00:58:24 about kind of how this works because it
  • 00:58:26 is a little finicky
  • 00:58:27 we're not even just finicky but just not
  • 00:58:29 intuitive so essentially when you want
  • 00:58:32 to make a prediction using the model I'm
  • 00:58:34 gonna set up just a variable prediction
  • 00:58:36 here you simply use model dot predict
  • 00:58:39 and then you pass it a list now what you
  • 00:58:43 would think you would do is just pass it
  • 00:58:45 like the input right so in this case we
  • 00:58:48 just pass it some input that's in the
  • 00:58:49 form twenty eight twenty eight and it
  • 00:58:51 would predict but that's not actually
  • 00:58:52 how it works and when you want to make a
  • 00:58:54 prediction what you need to do is put
  • 00:58:57 whatever your input shape is inside of a
  • 00:59:00 list or actually well you can do it
  • 00:59:02 inside of the list but you can also do
  • 00:59:04 it inside an NP array as well like an
  • 00:59:06 umpire right and the
  • 00:59:08 to do that is because what predict does
  • 00:59:10 is it gives you a group of predictions
  • 00:59:12 so it's expecting you to pass in a bunch
  • 00:59:14 of different things and it predicts all
  • 00:59:17 of them using the model so for example
  • 00:59:19 if I want to do the predictions on all
  • 00:59:21 of my test images to see what they are I
  • 00:59:23 can do prediction equals multi-purpose
  • 00:59:25 images and if I print out like
  • 00:59:27 prediction you guys will see what this
  • 00:59:30 looks like so let's run this here and
  • 00:59:32 see what we get so obviously we have to
  • 00:59:35 train them all each time which is a
  • 00:59:37 little bit annoying but we can save it
  • 00:59:39 later on and obviously this one runs
  • 00:59:40 pretty quickly so it's not a huge deal
  • 00:59:44 all right so there we go so now you can
  • 00:59:46 see this is actually what our
  • 00:59:47 predictions look like now this is a
  • 00:59:49 really weird kind of like looking
  • 00:59:51 prediction thing I mean we're getting a
  • 00:59:54 bunch of different lists now that's
  • 00:59:56 because right our output layer is ten
  • 00:59:58 neurons so we're actually getting an
  • 01:00:00 output of ten different values and these
  • 01:00:02 different values are representing how
  • 01:00:05 much the model thinks that each picture
  • 01:00:07 is a certain class right so you can see
  • 01:00:10 we're getting like 2.6 to the e to the
  • 01:00:12 negative 0-6 which means that obviously
  • 01:00:15 a very small number so it doesn't think
  • 01:00:17 whatsoever that it's that and then I'm
  • 01:00:19 trying to find if we can see ones that
  • 01:00:21 aren't like to the e but apparently it's
  • 01:00:23 we didn't really get lucky enough with
  • 01:00:25 it showing because it just cuts some of
  • 01:00:26 them off here but if I print out let's
  • 01:00:28 say like prediction zero and I guess
  • 01:00:32 we're gonna have to run this again I
  • 01:00:33 probably should have thought of that
  • 01:00:34 then you guys will see exactly what the
  • 01:00:36 prediction list looks like and I'm gonna
  • 01:00:38 show you how we can actually interpret
  • 01:00:39 this to determine what class it is
  • 01:00:41 because this means nothing to us we want
  • 01:00:43 to know is it a sandal is it a shoe is
  • 01:00:45 it a shirt like what is it right so
  • 01:00:47 there you go so this is what the list
  • 01:00:48 looks like so if we look through the
  • 01:00:50 list here we can see these are all the
  • 01:00:51 different probabilities that are our
  • 01:00:54 network is predicting so what we're
  • 01:00:55 actually gonna do essentially is we're
  • 01:00:57 gonna take whatever the highest number
  • 01:00:59 is there we're gonna say that is the
  • 01:01:01 predicted value so to do that what we do
  • 01:01:04 is we say n P dot Arg max okay and we
  • 01:01:09 just put it around this list now what
  • 01:01:12 this does is it just gets the largest
  • 01:01:14 value and finds like the index of that
  • 01:01:17 so in this case since we have ten
  • 01:01:19 neurons the first one is representing
  • 01:01:21 obviously
  • 01:01:21 T
  • 01:01:22 the last photos representing ankle boot
  • 01:01:24 it'll find whatever neuron is the
  • 01:01:25 largest value and give us the index of
  • 01:01:27 that neuron so if it's like the third
  • 01:01:29 neuron then it's gonna give us pull over
  • 01:01:31 right and and that's how that works so
  • 01:01:34 if we want to see the actual like name
  • 01:01:36 though rather than just the index then
  • 01:01:39 what we need to do is just take this
  • 01:01:41 value and pass it into class names so
  • 01:01:43 let's say class underscore names and
  • 01:01:45 then we'll index whatever the value is
  • 01:01:47 that this NPR max prediction zero gives
  • 01:01:50 us right so let's run this and see what
  • 01:01:53 we get now all right so there we go so
  • 01:01:56 we can see that now we're actually
  • 01:01:57 getting ankle boot as our prediction
  • 01:01:58 which makes a lot more sense for us
  • 01:02:00 right rather than just giving us like
  • 01:02:03 that prediction array or whatever it was
  • 01:02:05 okay so that's great but the thing is
  • 01:02:06 how do we calculate validate this is
  • 01:02:08 actually working well what we need to do
  • 01:02:10 now or not what we need to do but what
  • 01:02:12 we should do now is show the input and
  • 01:02:15 then show what the predicted value is
  • 01:02:16 and that way we as the humans which know
  • 01:02:18 obviously which is which can validate
  • 01:02:20 that so what I'm gonna do actually just
  • 01:02:22 set up a very basic for loop and what
  • 01:02:24 this for loop is gonna do is loop
  • 01:02:26 through a few different images in our
  • 01:02:27 test images and show them on the screen
  • 01:02:30 and then also show the prediction so
  • 01:02:33 show what they actually are and then
  • 01:02:34 show the prediction as well so to do
  • 01:02:36 this I'm just gonna say for I guess in
  • 01:02:38 this case I in range five and what we'll
  • 01:02:43 do is I'm gonna say PLT dot grid I'm
  • 01:02:46 just gonna set up a very basic like plot
  • 01:02:48 to show the image I'm gonna aim it show
  • 01:02:50 our test underscore images I right I'm
  • 01:02:55 gonna do the see map thing so I say see
  • 01:02:57 map equals in this case PLT dot cm dot
  • 01:03:00 binary which is just gonna give us like
  • 01:03:02 the grayscale and then I'm gonna say P
  • 01:03:04 ot dot X label which just means
  • 01:03:07 underneath and I'm gonna say is equal to
  • 01:03:09 actual and in this case I'm gonna say
  • 01:03:12 plus and what do we want to do we need
  • 01:03:15 to get the actual label of our test
  • 01:03:17 image which would be in test underscore
  • 01:03:19 labels I and then what I'm gonna do is
  • 01:03:22 add a header and say this is what the
  • 01:03:25 model predicted so to do this I'm gonna
  • 01:03:27 say PLT dot I believe it's sorry not
  • 01:03:30 header it's about title and the title
  • 01:03:33 will simply be
  • 01:03:34 prediction plus in this case we're gonna
  • 01:03:38 say prediction and then I now the reason
  • 01:03:42 we can do this we're sorry we're gonna
  • 01:03:44 have to literally copy this this whole
  • 01:03:46 argument and we'll put that here except
  • 01:03:50 instead of zero we're gonna put I and
  • 01:03:51 just that way it will show all of the
  • 01:03:54 different image just right so now what
  • 01:03:56 I'm going to do is for each loop here
  • 01:03:58 I'm gonna peel T dot show which means
  • 01:03:59 I'm gonna show those images so we can
  • 01:04:01 see exactly what they look like so quick
  • 01:04:03 recap in case I kind of skimmed over
  • 01:04:05 some stuff all we're doing is setting up
  • 01:04:07 a way to see the image as well as what
  • 01:04:09 it actually is versus what the model
  • 01:04:11 predicted so we as the humans can kind
  • 01:04:13 of validate this is actually working and
  • 01:04:15 we see okay this is what the image and
  • 01:04:16 the input is and this is what the output
  • 01:04:18 was from the model so let's run this and
  • 01:04:22 wait for it to train I'll fast forward
  • 01:04:25 through this and then we will show the
  • 01:04:26 images okay so a quick fix here I just
  • 01:04:29 ran this and I got an air I mean to do
  • 01:04:31 class names and then test labels I and
  • 01:04:33 that's obviously because the test labels
  • 01:04:35 are gonna have like the index of all
  • 01:04:37 these so I can't just put like the
  • 01:04:39 number value have to put the class names
  • 01:04:41 so that we get the correct thing anyways
  • 01:04:43 I hope that makes sense to you guys
  • 01:04:44 let's run this now you can see that was
  • 01:04:46 the air I ran into again fast forward
  • 01:04:48 and then I would back alright so I am
  • 01:04:52 back now this is a little bit butchered
  • 01:04:53 and how I'm actually showing it but you
  • 01:04:55 can see that it's saying the prediction
  • 01:04:57 for this was the ankle boot and it
  • 01:04:58 actually is an ankle boot now if I close
  • 01:05:01 this it'll just show four more because
  • 01:05:03 that's the way I've set it up so now you
  • 01:05:04 can see that prediction pullover it
  • 01:05:06 actually was a pullover all right we see
  • 01:05:09 we get prediction trouser it actually
  • 01:05:11 was a trouser and prediction trouser
  • 01:05:14 actual trouser prediction shirt actual
  • 01:05:17 shirt and obviously if you want to see
  • 01:05:19 more you could keep looping through all
  • 01:05:21 of these and doing that now say you just
  • 01:05:24 want to predict on one image well what
  • 01:05:26 you could do for example is alright and
  • 01:05:28 this is kind of a weird way what I'm
  • 01:05:29 about to do but you'll see let's say we
  • 01:05:32 wanted to just predict like what the
  • 01:05:33 seventh image was well then what I would
  • 01:05:35 do is just say test images 7 which is
  • 01:05:37 gonna give us that 28 by 28 array and
  • 01:05:41 then I would just put it inside of a
  • 01:05:42 list so that that way it gets it's given
  • 01:05:46 the way that it's supposed to look
  • 01:05:47 but that also means that our prediction
  • 01:05:49 list right we're gonna get is equal to
  • 01:05:52 this it's gonna look like prediction and
  • 01:05:54 then it's gonna have this and then
  • 01:05:55 inside it's gonna have all those
  • 01:05:57 different values so it's gonna have like
  • 01:05:59 0.001 0.9 but it's gonna be a list
  • 01:06:03 inside of a list so that's just
  • 01:06:04 something to keep in mind when you're
  • 01:06:05 working with these predictions because
  • 01:06:07 that is really the only way to do it and
  • 01:06:09 that this is exactly what tensorflow
  • 01:06:11 recommends on their website as well if
  • 01:06:13 you're just predicting for one item just
  • 01:06:15 put it inside of a list so that it's
  • 01:06:16 gonna work fine so anyways that has kind
  • 01:06:18 of been it on using the model to predict
  • 01:06:21 stuff and future videos we'll get into a
  • 01:06:23 little bit more advanced stuff this was
  • 01:06:24 a very easy classification problem just
  • 01:06:27 really meant to give you an introduction
  • 01:06:29 and personally I think if you never
  • 01:06:31 worked with any machine learning stuff
  • 01:06:32 this is pretty cool that in a few
  • 01:06:34 minutes of just kind of writing a little
  • 01:06:35 bit of code whether you understand it or
  • 01:06:37 not you can create a simple model that
  • 01:06:39 can classify fashion items like a shirt
  • 01:06:42 and a t-shirt and I don't know that's
  • 01:06:44 pretty cool to me and in future videos
  • 01:06:46 obviously we're gonna be doing a lot
  • 01:06:47 cooler stuff it's gonna be a little bit
  • 01:06:48 more advanced but hopefully you guys can
  • 01:06:50 stick with it I'd love to know what you
  • 01:06:51 guys think of this series so far so
  • 01:06:53 please leave a comment down below it
  • 01:06:55 helps me to kind of tweak my lessons and
  • 01:06:57 all that as we go forward if you guys
  • 01:06:58 enjoyed the video please leave a like
  • 01:07:00 and subscribe and I will see you again
  • 01:07:02 [Music]
  • 01:07:18 now in today's video what we're gonna be
  • 01:07:20 doing is talking about text
  • 01:07:21 classification with tensorflow 2.0 now
  • 01:07:25 what I'm gonna be doing just to be full
  • 01:07:26 fully transparent with you guys here is
  • 01:07:28 following along with the actual official
  • 01:07:30 tutorials on the tensorflow 2.0 tutorial
  • 01:07:32 now I find that these are actually the
  • 01:07:35 best in terms of like kind of a
  • 01:07:37 structure to start with to understand
  • 01:07:39 very basic neural networks for some
  • 01:07:41 pretty simple tasks I would say and then
  • 01:07:43 we're gonna stray away from those we're
  • 01:07:45 gonna start using our own data our own
  • 01:07:46 networks our own architecture and we'll
  • 01:07:48 start talking about kind of some of the
  • 01:07:49 issues you have when you actually start
  • 01:07:51 applying these to real data so so far
  • 01:07:54 you guys have noticed and I've seen some
  • 01:07:56 comments on already that the data is
  • 01:07:57 really easy to load in and even
  • 01:07:59 pre-processing it like in the last one
  • 01:08:02 we just divided everything by 255 like
  • 01:08:04 that's really simple in the real world
  • 01:08:05 your data is definitely not that nice
  • 01:08:08 and there's a lot of stuff that you need
  • 01:08:09 to play with and modify to make it
  • 01:08:11 actually usable so anyways we'll follow
  • 01:08:13 along this one for today and essentially
  • 01:08:15 the way that it works is we're gonna
  • 01:08:17 have movie reviews and we're just gonna
  • 01:08:19 classify them use their as either
  • 01:08:20 positive or negative now what we'll do
  • 01:08:23 is we'll just look at some of the movie
  • 01:08:25 reviews and then we'll talk about the
  • 01:08:27 data we'll talk about the architecture
  • 01:08:28 using stuff to predict some issues might
  • 01:08:30 run into and all of that now I don't
  • 01:08:32 know how many video parts this is gonna
  • 01:08:34 be I'm gonna try to record it all at
  • 01:08:35 once and just split it up based on how
  • 01:08:37 long it takes but with that being said
  • 01:08:38 enough talking let's get started so what
  • 01:08:42 we're gonna do obviously is start in our
  • 01:08:44 file here and again this is gonna be
  • 01:08:46 really nice because we can just steal
  • 01:08:47 kind of the data from cara's what we'll
  • 01:08:49 start by doing is just importing
  • 01:08:52 tensorflow as TF we're gonna say from
  • 01:08:55 tensorflow import Chara's and then we're
  • 01:08:59 going to say import numpy as NP now
  • 01:09:03 before I start I ran into a quick issue
  • 01:09:06 when I was actually trying to do this
  • 01:09:08 just following along with the official
  • 01:09:09 tutorial and that was that the data that
  • 01:09:11 I want to grab here actually doesn't
  • 01:09:14 work with the current version at numpy
  • 01:09:15 that comes with tensorflow
  • 01:09:16 it's on their github as an issue but
  • 01:09:19 anyways to fix this what we need to do
  • 01:09:21 is install the previous version of numpy
  • 01:09:23 so to do this what I'm actually gonna do
  • 01:09:26 is just say pip
  • 01:09:28 like pip numpy version or something
  • 01:09:30 because I want to see what version it is
  • 01:09:34 incorrectly pip version number I want to
  • 01:09:37 find what version it is and then just go
  • 01:09:38 down to that version okay so I found the
  • 01:09:41 version of numpy what we're gonna do now
  • 01:09:43 is actually just install the correct
  • 01:09:45 version an UMP ID to make it work for
  • 01:09:47 this tutorial now this should be fine
  • 01:09:49 for everything going forward and if you
  • 01:09:50 want to install the most recent version
  • 01:09:51 of numpy after doing this go feel free
  • 01:09:53 but to do this all I'm gonna do is just
  • 01:09:55 say pip install and then numpy equals in
  • 01:09:58 this case one point one six point one I
  • 01:10:01 believe the version we're using right
  • 01:10:02 now is 0.3 at least at the time of
  • 01:10:04 recording this but just change it to
  • 01:10:05 this version and hopefully in the future
  • 01:10:07 they'll fix that issue so that we don't
  • 01:10:09 have to do this but anyways I'm going to
  • 01:10:10 install that yeah you're gonna have to
  • 01:10:13 add two equal signs and I already have
  • 01:10:15 this installed so that should just not
  • 01:10:17 do anything but you guys just make sure
  • 01:10:19 you do that I'll leave the command in
  • 01:10:20 the description now after we do that
  • 01:10:22 what I'm gonna do is just load in the
  • 01:10:24 data go say data equals in this case
  • 01:10:26 care as dot data sets dot I am what is
  • 01:10:31 it
  • 01:10:31 I am DB now I believe this stands for
  • 01:10:34 like some something movie database I
  • 01:10:36 don't really know but anyways that's
  • 01:10:38 what the database is and we're gonna do
  • 01:10:39 the same thing we did in the previous
  • 01:10:40 tutorial which is just split this into
  • 01:10:43 training and testing data so to do that
  • 01:10:44 I'm gonna say train underscore data
  • 01:10:45 train underscore labels comma and then
  • 01:10:48 in this case we'll say test underscore
  • 01:10:50 at data and then test underscore labels
  • 01:10:53 equals in this case data load underscore
  • 01:10:56 data now we're just gonna add one thing
  • 01:10:58 in here which is num underscore words
  • 01:11:01 equals in this case 10,000 now the
  • 01:11:04 reason I'm doing this is because this
  • 01:11:06 data set contains like a ton of
  • 01:11:07 different words and what we're gonna
  • 01:11:09 actually do by saying num words equals
  • 01:11:11 10,000 is only take the words that are
  • 01:11:13 the 10,000 most frequent which means
  • 01:11:16 we're gonna leave out words that usually
  • 01:11:18 are only occurring like once or twice do
  • 01:11:19 it the entire data set because we don't
  • 01:11:21 want to throw those into our model and
  • 01:11:23 have things like be more difficult than
  • 01:11:25 they have to be and just have data
  • 01:11:26 that's kind of irrelevant because we're
  • 01:11:28 gonna be comparing obviously movie
  • 01:11:31 reviews and there's some words that are
  • 01:11:33 only in like one review we should
  • 01:11:35 probably just omit them because there's
  • 01:11:37 nothing to really compare them to in
  • 01:11:38 other data sets anyways I hope that kind
  • 01:11:41 of makes them
  • 01:11:42 but that's not super important there's
  • 01:11:43 gonna be numbers equals 10,000 it's also
  • 01:11:46 shrinks our data a little bit which
  • 01:11:47 makes it a bit nicer now what we're
  • 01:11:49 gonna do next is we're actually gonna
  • 01:11:51 show how we can display this data now if
  • 01:11:54 I start by actually just showing you
  • 01:11:56 like the Train underscore data and let's
  • 01:11:58 pick like the zero with one so I guess
  • 01:12:00 the first one and I print this out to
  • 01:12:02 the screen so I'll just go if I could
  • 01:12:05 get 2pi 'the python and i guess in this
  • 01:12:07 case we'll have to do i probably should
  • 01:12:08 just type this to start tutorial two
  • 01:12:13 when this actually prints out probably
  • 01:12:16 going to take a second here just to
  • 01:12:17 download the data set you can see that
  • 01:12:19 what we have is actually just a bunch of
  • 01:12:21 numbers now this doesn't really look
  • 01:12:23 like a movie review to me does it well
  • 01:12:26 what this actually is is integer encoded
  • 01:12:28 words so essentially each of these
  • 01:12:31 integers point to a certain word and
  • 01:12:33 what we've done just to make it wait
  • 01:12:36 easier for our model to actually
  • 01:12:37 classify these and work with these is
  • 01:12:39 we've given each word one integer so in
  • 01:12:41 this case maybe like the word the
  • 01:12:43 integer one stands or something the
  • 01:12:44 integer 14 stands for something and all
  • 01:12:47 we've done is just added those integers
  • 01:12:48 into a list that represents where these
  • 01:12:51 words are located in the movie review
  • 01:12:53 now this is nice for the computer but
  • 01:12:55 it's not very nice for us if we actually
  • 01:12:57 want to read these words so we have to
  • 01:12:59 do is find the mappings for these words
  • 01:13:01 and then find some way to actually
  • 01:13:02 display this so that you know we can
  • 01:13:04 have a look at it now I'll be honest
  • 01:13:06 here I'm just gonna take this from what
  • 01:13:07 they have on the tensorflow website on
  • 01:13:09 how to do this typically you would
  • 01:13:11 create your own mappings for words with
  • 01:13:13 your own dictionary and you just already
  • 01:13:14 have that information but fortunately
  • 01:13:16 for us tensorflow already does that so
  • 01:13:18 to do that I'm gonna say word underscore
  • 01:13:19 index equals in this case I am vb.net
  • 01:13:22 underscore word underscore index like
  • 01:13:25 this now what this does it's actually
  • 01:13:27 going to give us a dictionary that has
  • 01:13:29 those keys and those mappings so that
  • 01:13:32 what we can do is well figure out what
  • 01:13:34 you know what these integers actually
  • 01:13:36 mean so when we want to print it out
  • 01:13:37 later we can have a look at them so I'm
  • 01:13:40 gonna say now is word underscore index
  • 01:13:42 equals in this case K : and then we're
  • 01:13:47 gonna say what do you call it the plus
  • 01:13:50 three for K the
  • 01:13:53 in word underscore index dot items so I
  • 01:13:58 might have been incorrect here this
  • 01:13:59 doesn't actually give us a dictionary
  • 01:14:00 this just gives us like tuples that have
  • 01:14:02 the string and the word in them I
  • 01:14:06 believe and then what we're doing here
  • 01:14:08 is we're gonna say instead of C sorry
  • 01:14:10 this should be B my apologies is we're
  • 01:14:12 gonna get word just break that tupple up
  • 01:14:13 into K and V which stands for key and
  • 01:14:15 value and the key will be the word the
  • 01:14:17 value will be obviously the integer yes
  • 01:14:20 that's what it will be and we're gonna
  • 01:14:22 say for word items and index we'll break
  • 01:14:24 that up and then we're just gonna add a
  • 01:14:25 bunch of different keys into our data
  • 01:14:26 set now the reason we're gonna start at
  • 01:14:28 plus three is because we're gonna have
  • 01:14:30 actually one key or three keys that are
  • 01:14:33 gonna be like special characters for our
  • 01:14:35 word mapping and you guys will see how
  • 01:14:37 those work in a second so I'm gonna
  • 01:14:38 start by just saying word index and in
  • 01:14:40 this case I'm gonna put in here pad
  • 01:14:44 we're gonna talk about this in a second
  • 01:14:45 so don't worry if you guys are kind of
  • 01:14:46 like what are you doing right now
  • 01:14:47 I'm gonna say word index and in this
  • 01:14:50 case starts equals one let's say a word
  • 01:14:55 underscore index and in this case I
  • 01:14:57 believe it's like UNK yeah that's
  • 01:15:00 correct when I say UNK equals two
  • 01:15:02 now UNK just stands for unknown and I'm
  • 01:15:05 gonna explain all this in a second but
  • 01:15:06 it's easier just to type it out first
  • 01:15:08 and we're gonna say word index in this
  • 01:15:10 case inside this tag
  • 01:15:12 I'm used we're going to say equals three
  • 01:15:15 so what I'm doing essentially is all of
  • 01:15:17 the words in our training and testing
  • 01:15:19 data set have like keys and values
  • 01:15:23 associated with them starting at one so
  • 01:15:25 what I'm doing is I'm just gonna add
  • 01:15:26 three to all of those values so that
  • 01:15:29 what I can actually do is assign my own
  • 01:15:31 kind of values that are gonna stand for
  • 01:15:34 padding start unknown and unused so that
  • 01:15:37 if we get values that are not valid we
  • 01:15:40 can just assign them to this essentially
  • 01:15:42 in the dictionary now what I'm gonna use
  • 01:15:43 for padding you guys will see in just a
  • 01:15:45 second essentially it's just so we can
  • 01:15:47 make our all our movie sets the same
  • 01:15:49 length so we'll add this what's known as
  • 01:15:51 pad tag and we'll do that by adding 0
  • 01:15:53 into our actual movie review list so
  • 01:15:56 that we're gonna make each movie review
  • 01:15:58 the same length and the way we do that
  • 01:15:59 essentially is if they're not the same
  • 01:16:01 length so maybe one's a hundred maybe
  • 01:16:02 ones 200 we want all them to be 200 the
  • 01:16:05 100 length
  • 01:16:07 list will for what we'll do is we'll
  • 01:16:09 just add a bunch of padding to the end
  • 01:16:11 of it to make it length 200 and then
  • 01:16:14 obviously our model will hopefully be
  • 01:16:16 able to differentiate the fact that that
  • 01:16:17 is padding and then we don't care about
  • 01:16:19 the padding and that we shouldn't even
  • 01:16:21 bother they're really like looking at
  • 01:16:22 that right alright so now what I'm gonna
  • 01:16:24 do is add this kind of complicated line
  • 01:16:27 here just to I don't even know why they
  • 01:16:30 have this to be quite honest this is the
  • 01:16:31 way the tensor flows has decided to do
  • 01:16:33 they're like word mappings but
  • 01:16:35 apparently you need to add this reverse
  • 01:16:37 underscore underscore word underscore
  • 01:16:40 index which is equal to dictionary and
  • 01:16:43 then in here we're gonna say value comma
  • 01:16:47 key for key comma value in word
  • 01:16:56 underscore index I believe that's
  • 01:16:58 correct and what this is gonna do
  • 01:17:00 actually sorry not word index word index
  • 01:17:02 dot items what this is gonna do is okay
  • 01:17:05 I understand now now that I've typed it
  • 01:17:07 out you just swap all the values in the
  • 01:17:09 keys so that right now we actually have
  • 01:17:11 a dictionary that has all of the like
  • 01:17:15 the keys first which is gonna be the
  • 01:17:16 word and then the values where we
  • 01:17:19 actually want it the other way around so
  • 01:17:20 we have like the integer pointing to the
  • 01:17:22 word because we're gonna have our data
  • 01:17:24 set that is gonna contain just integers
  • 01:17:26 like we've seen here and we want these
  • 01:17:28 integers to be able to point to a word
  • 01:17:29 as opposed to the other way around so
  • 01:17:31 what we're doing is just reversing this
  • 01:17:33 with a reverse word index list just our
  • 01:17:37 dictionary sorry essentially that's what
  • 01:17:38 this is doing here
  • 01:17:39 all right now that we've done that the
  • 01:17:42 last step is just to add a function and
  • 01:17:44 what this function will do is actually
  • 01:17:47 decode essentially all of this training
  • 01:17:49 and testing data into human readable
  • 01:17:52 words so there's different ways to do
  • 01:17:54 this again I'm just gonna take this
  • 01:17:55 right from the tensorflow website
  • 01:17:57 because this part's not super important
  • 01:17:59 and I'd rather just you know do it
  • 01:18:00 quickly than spend too much time on it
  • 01:18:02 so if we're just gonna say return blank
  • 01:18:04 string dot join and in this case we're
  • 01:18:07 gonna say reverse word index dog gets in
  • 01:18:11 this case we're gonna say I comma
  • 01:18:13 question mark now what this does
  • 01:18:16 essentially if you don't know how the
  • 01:18:17 gap works is we're gonna try
  • 01:18:18 to get index I which we're gonna define
  • 01:18:20 in a second if we can't find a value for
  • 01:18:23 that then what we'll do is just put
  • 01:18:25 question mark and so which is a default
  • 01:18:27 value which means we won't crash we're
  • 01:18:29 having like a key error in our
  • 01:18:32 dictionary and we're gonna say for in
  • 01:18:35 this case I in text I don't know why
  • 01:18:39 where I have text typed I think I might
  • 01:18:42 have messed something up here so one
  • 01:18:44 second here ootek sir is the parameter
  • 01:18:46 my apologies
  • 01:18:47 so anyways that's what this gonna do is
  • 01:18:48 just going to return to us essentially
  • 01:18:50 all of the the keys that we want or the
  • 01:18:53 human readable words my apologies so now
  • 01:18:55 what we'll do is what simply just print
  • 01:18:57 out D code review and I'm just gonna
  • 01:18:59 give it some test status let's say test
  • 01:19:01 for example zero and I guess we're gonna
  • 01:19:03 have to test underscore data it doesn't
  • 01:19:05 really matter if you train or test data
  • 01:19:07 but let's just have a look at test out
  • 01:19:08 of zero and see what that actually looks
  • 01:19:10 like let's run that assuming I make any
  • 01:19:13 mistakes we should actually get some
  • 01:19:14 valid output in just a second this
  • 01:19:16 usually takes a minute to run up I am DB
  • 01:19:19 is not defined what did I type here I
  • 01:19:22 typed that as data my apologies so where
  • 01:19:25 we say IMDB which is right here we just
  • 01:19:27 need to replace that with data in my
  • 01:19:31 other file I called it IMDB so that's
  • 01:19:32 why I made a mistake there but let's run
  • 01:19:34 that again and hopefully now we will get
  • 01:19:36 some better looking output so let's wait
  • 01:19:38 for this and see dict object has no
  • 01:19:41 attribute items this needs to be items
  • 01:19:43 classic typos by Jim one more time third
  • 01:19:48 time is a charm hopefully let's see and
  • 01:19:50 there we go so now we can see that we're
  • 01:19:52 actually getting all of this decoded
  • 01:19:55 into well this text now I'll allow you
  • 01:19:57 guys to read through it but you can see
  • 01:19:59 that we have these kind of keys that
  • 01:20:01 we've added so start which is one which
  • 01:20:03 will automatically be added at the
  • 01:20:05 beginning of all of our text and then we
  • 01:20:07 have these un Ches which stand for
  • 01:20:08 unknown character essentially and then
  • 01:20:11 we don't have any other keys in here but
  • 01:20:12 say for example we had like some padding
  • 01:20:15 we had added to this we would see those
  • 01:20:16 Pat tags as well in here it's not
  • 01:20:19 essentially how that works if you'd like
  • 01:20:21 to look at some other reviews just mess
  • 01:20:23 around with kind of the values and the
  • 01:20:25 index here throw them into deep code
  • 01:20:26 review and then we can actually see what
  • 01:20:28 they look like
  • 01:20:28 now something to note quickly is that
  • 01:20:30 our review
  • 01:20:31 our different lengths now I've talked
  • 01:20:33 about this already but let's just
  • 01:20:35 compare two reviews to really test that
  • 01:20:36 I'm not just making this up so I'm gonna
  • 01:20:38 say test underscore data why I have a
  • 01:20:42 capital here test underscore down to
  • 01:20:44 zero so the length of test under core
  • 01:20:45 data is zero and the length of let's try
  • 01:20:48 test underscore data one just to prove
  • 01:20:51 to you guys that these are actually
  • 01:20:52 different lengths which means there's
  • 01:20:54 something kind of fancy we're gonna have
  • 01:20:55 to do with that padding tag which I was
  • 01:20:58 talking about there so let's go into
  • 01:20:59 text class classification
  • 01:21:00 let's go CMD and then Python in this
  • 01:21:03 case tutorial to PI now I guess we're
  • 01:21:07 gonna get that output again which is
  • 01:21:08 probably what's causing this to just
  • 01:21:09 take a second to run you can see that we
  • 01:21:11 have length 68 and we have length to 60
  • 01:21:14 now this is not gonna work for our model
  • 01:21:17 and the reason this doesn't work is
  • 01:21:18 because we need to know what our inputs
  • 01:21:22 shut shape sorry and size is gonna be
  • 01:21:24 just like I talked about before we
  • 01:21:26 define the input nodes or the input
  • 01:21:28 neurons and the output neurons so we
  • 01:21:30 have to determine how many input neurons
  • 01:21:32 there's going to be and how many output
  • 01:21:34 neurons there's gonna be now if we're
  • 01:21:36 like we don't know how large our data is
  • 01:21:38 gonna be and it's different for each
  • 01:21:39 woody call it's entry then that's an
  • 01:21:43 issue so we need to do something to fix
  • 01:21:45 that so what we're gonna do is we're
  • 01:21:47 gonna use this padding tag to
  • 01:21:49 essentially set a definite length for
  • 01:21:52 all of our data now we could go ahead
  • 01:21:54 and pick the longest review and say that
  • 01:21:56 will make all of the reviews that length
  • 01:21:58 but what I'm gonna do is just pick an
  • 01:22:00 arbitrary number in this case we'll just
  • 01:22:01 do like 250 and say that that's the
  • 01:22:03 maximum amount of words we're gonna
  • 01:22:05 allow in one review which means that if
  • 01:22:07 you have more than 250 words in your
  • 01:22:09 review we're just gonna get rid of all
  • 01:22:10 those and if you don't have 256 words or
  • 01:22:14 250 words or whatever it is we're just
  • 01:22:16 gonna add these padding tags to the end
  • 01:22:18 of it until eventually we reach that
  • 01:22:21 value so the way to do this is again
  • 01:22:25 using those fancy tensorflow functions
  • 01:22:27 now if you don't like these functions
  • 01:22:30 and like what these do for you and how
  • 01:22:32 they just kind of save you some time go
  • 01:22:33 ahead and try to write them yourself and
  • 01:22:35 if you want help on how to do that feel
  • 01:22:37 free to reach out to me on dis corridor
  • 01:22:39 in the comments or whatever but I
  • 01:22:41 personally just use them because it
  • 01:22:42 saves me a quite quite a bit of time in
  • 01:22:44 terms of like typing out
  • 01:22:45 functions and I already know how to do a
  • 01:22:47 lot of what these functions do so for me
  • 01:22:49 it doesn't really make sense to just
  • 01:22:50 retype them out when I can just use
  • 01:22:52 these kind of fancy tools so what we're
  • 01:22:55 going to say is we're gonna redefine our
  • 01:22:57 training and testing data and what we're
  • 01:22:59 gonna do is just trim that data so that
  • 01:23:00 it's only at or kind of normalized that
  • 01:23:04 data so it's at 250 words so to do that
  • 01:23:07 I'm gonna say train underscore data
  • 01:23:08 equals in this case Kara's got
  • 01:23:12 pre-processing no idea if that's how you
  • 01:23:16 spell it we'll have to check that in a
  • 01:23:17 second dot sequence dot pad underscore
  • 01:23:21 sequence so pre-processing I think
  • 01:23:25 that's correct I guess we'll see and
  • 01:23:27 then in here we have to define a few
  • 01:23:29 different parameters so what we'll first
  • 01:23:30 do is we'll give that train underscore
  • 01:23:32 data we're gonna say value equals which
  • 01:23:35 will be the pad value so what we add to
  • 01:23:37 the end of in this case our numpy array
  • 01:23:39 to pad it per se and in this case we'll
  • 01:23:42 just use this pad tag so we'll say
  • 01:23:44 literally word index pad so let's copy
  • 01:23:48 that and put that there we're gonna say
  • 01:23:51 our padding equals in this case post we
  • 01:23:54 just means we're gonna Pat after as
  • 01:23:56 opposed to before we also could pad
  • 01:23:59 before but that doesn't really make too
  • 01:24:00 much sense for this and then what we'll
  • 01:24:02 say is max in this case Len equals and
  • 01:24:05 then you pick your number that you want
  • 01:24:06 to make all of the values equal to now
  • 01:24:10 tensorflow did like 256 I'm just gonna
  • 01:24:12 do 250 and see if this makes a
  • 01:24:14 difference in terms of our accuracy for
  • 01:24:15 the model and I'm literally just gonna
  • 01:24:17 copy this and change these values now to
  • 01:24:19 test underscore data instead of Train
  • 01:24:21 underscore data and this will do the
  • 01:24:23 same thing on our other data set oops
  • 01:24:27 didn't mean to do that so test
  • 01:24:28 underscore date/time like that so quick
  • 01:24:32 recap here because we are at 17 minutes
  • 01:24:34 now essentially what we've done is we've
  • 01:24:36 loaded in our data we've looked at our
  • 01:24:37 data we've created the word mappings
  • 01:24:40 essentially for our data so that we can
  • 01:24:42 actually figure out what at least
  • 01:24:43 integers mean we've created a little
  • 01:24:45 function here that will decode the
  • 01:24:47 mappings for us so we just pass it a
  • 01:24:50 word review that's integer encoded it
  • 01:24:52 decodes it and then if we can print that
  • 01:24:54 information out to the screen to have a
  • 01:24:55 look at it what we've just done now is
  • 01:24:57 we've done what's called
  • 01:24:58 reprocessing our data which means just
  • 01:25:00 making it into a form that our model can
  • 01:25:03 actually accept and that's consistent
  • 01:25:04 and that's what you're always gonna want
  • 01:25:06 to do with any data that you have
  • 01:25:07 typically it's gonna take you a bit more
  • 01:25:09 work than what we have because it's only
  • 01:25:11 two lines to pre-process our data
  • 01:25:12 because Cara's kind of does it for us
  • 01:25:14 but for the purpose of this example
  • 01:25:17 that's fine
  • 01:25:17 all right so now that we've done that
  • 01:25:20 it's actually time to define our model
  • 01:25:22 now I'll show you quickly just to make
  • 01:25:25 sure you know you guys believe me here
  • 01:25:27 that this is working in terms of
  • 01:25:29 pre-processing pre-processing our data
  • 01:25:31 so it's actually gonna make things the
  • 01:25:32 same length so we'll say train
  • 01:25:33 underscore data test underscore data
  • 01:25:35 let me just print this out to the screen
  • 01:25:37 so python tutorial 2 again we're gonna
  • 01:25:39 get these integer mappings but we'll get
  • 01:25:41 the length at the end as well and
  • 01:25:43 another error of course we need to add
  • 01:25:46 an S to these sequences again my
  • 01:25:48 apologies guys on that classic typos
  • 01:25:52 here so anyways I had pre process
  • 01:25:54 processing sequence we need sequences
  • 01:25:56 and now if I run this you can see that
  • 01:25:59 we have a length of 250 and 250 so we've
  • 01:26:02 kept that consistent now for some oh I'm
  • 01:26:04 printing I don't know why this is
  • 01:26:06 printing toot oh it's because I'm
  • 01:26:08 printing it here and then I'm printing
  • 01:26:09 it here but you guys get the idea in
  • 01:26:11 that we've now made them actually the
  • 01:26:13 same size so let me remove these print
  • 01:26:15 statements all of them so we can stop
  • 01:26:18 printing trained out of 0 up here as
  • 01:26:19 well and now let's start defining our
  • 01:26:21 model
  • 01:26:22 so I'll just say model down here is a
  • 01:26:25 little comment just to help us out so
  • 01:26:27 what I'm gonna do now is similar to what
  • 01:26:28 I've done before except in the last one
  • 01:26:30 you might have noticed that the way I
  • 01:26:31 define my model was okay I'll show you
  • 01:26:34 in a second once I've finished typing
  • 01:26:36 this so we did Kara's dot sequential and
  • 01:26:38 then what we actually did was just had a
  • 01:26:40 list in here that had all the layers
  • 01:26:41 that's fine you can do that but in this
  • 01:26:43 case we're gonna have a few more layers
  • 01:26:44 so what we're gonna do actually is add
  • 01:26:47 these layers just by doing the model dot
  • 01:26:50 add it's precisely the same thing as
  • 01:26:52 before except instead of adding them in
  • 01:26:53 this list we're just gonna do it using
  • 01:26:55 this method so now we're going to say
  • 01:26:57 Kara's dog layers dot in this case and
  • 01:27:00 bedding and I'll talk about what these
  • 01:27:01 layers do in a second we're gonna 10016
  • 01:27:04 and then we're just gonna actually copy
  • 01:27:07 this four times and just change these
  • 01:27:10 layers and the
  • 01:27:12 parameters as well so now we're gonna
  • 01:27:14 say global average pooling 1d and then
  • 01:27:23 do that and then we're gonna add a dense
  • 01:27:25 layer here and another dense layer and
  • 01:27:29 change these parameters so we'll say
  • 01:27:33 dense and we'll say in this case 16 will
  • 01:27:36 say activation equals Lilu are rectified
  • 01:27:42 linear unit whatever you guys want to
  • 01:27:44 call it and then we'll do down here 1
  • 01:27:45 and activation equals rectified linear
  • 01:27:48 unit as well actually sorry not really
  • 01:27:51 really we're gonna do Sigma my apologies
  • 01:27:53 so now we'll actually talk about the
  • 01:27:55 architecture of this model and how I
  • 01:27:57 came up with picking these layers and
  • 01:27:59 well what these layers are well what we
  • 01:28:01 want essentially is we want the final
  • 01:28:03 output to be whether the review is good
  • 01:28:05 or whether the review is bad I think I
  • 01:28:07 mentioned that at the beginning of the
  • 01:28:09 video so what we're actually gonna do is
  • 01:28:11 just have either that like what one
  • 01:28:13 output neuron and that neuron should be
  • 01:28:15 either 0 or 1 we're somewhere in between
  • 01:28:18 there to give us kind of a probability
  • 01:28:20 of like we think it's like 20% 1 80% 0
  • 01:28:24 something along those lines now we can
  • 01:28:26 accomplish that by using sigmoid because
  • 01:28:28 what it will do again we've talked with
  • 01:28:30 the sigmoid function is it'll squish
  • 01:28:31 everything so whatever our value is in
  • 01:28:34 between 0 & 1 which will give us a nice
  • 01:28:36 way to test if our models actually
  • 01:28:38 working properly and to get it the value
  • 01:28:41 that we want
  • 01:28:46 hey guys so now it's time to talk about
  • 01:28:48 word embeddings and this embedding layer
  • 01:28:51 and then what the global average pooling
  • 01:28:53 one D layer is doing now we already have
  • 01:28:55 an idea of what these dense layers are
  • 01:28:57 with these activation functions like
  • 01:28:58 real uu and sigmoid but what we're
  • 01:29:00 actually gonna do today or I guess just
  • 01:29:02 in this video is talking about the
  • 01:29:04 architecture of this network kind of how
  • 01:29:06 it works on a high-level understanding
  • 01:29:08 and then in the next video we'll do is
  • 01:29:10 actually get into training and using the
  • 01:29:12 network so what I'm gonna do first is
  • 01:29:14 just start by talking about these first
  • 01:29:16 two layers and specifically what this
  • 01:29:18 embedding layer is because it's very
  • 01:29:19 important and then we will draw the
  • 01:29:22 whole network or the whole I guess
  • 01:29:24 network is right we're way to put it the
  • 01:29:26 whole architecture and talk about how it
  • 01:29:28 fits together and what it's actually
  • 01:29:29 doing so let's get started now the
  • 01:29:33 easiest way to kind of explain this is
  • 01:29:35 to use an example of two very similar
  • 01:29:37 sentences so I'm just gonna say the
  • 01:29:39 first sentence is have a great day and
  • 01:29:45 the next sentence will be have a good
  • 01:29:49 day now I know my handwriting is
  • 01:29:52 horrible so just give me a break on that
  • 01:29:54 it's also hard to kind of write with
  • 01:29:56 this tablet so that's my excuse but
  • 01:29:58 anyways these two sentences looking at
  • 01:30:00 them as human beings we can tell pretty
  • 01:30:02 quickly that they're very similar now
  • 01:30:05 yes great and good maybe one has more
  • 01:30:07 emphasis on having an amazing day
  • 01:30:09 whatever it is but they're very similar
  • 01:30:11 and they pretty well have the same
  • 01:30:12 meaning right maybe we know when we
  • 01:30:14 would use the sentence in kind of the
  • 01:30:16 context in which like these words great
  • 01:30:18 and good are used and day and day and
  • 01:30:20 all this right it just we understand
  • 01:30:22 what they are now the computer doesn't
  • 01:30:24 have that same understanding at least
  • 01:30:26 right off the bat when looking at these
  • 01:30:28 two sentences now in our case we've
  • 01:30:30 actually integer encoded all of our
  • 01:30:33 different values so what we end up
  • 01:30:34 having are all of our different words
  • 01:30:36 sorry is our sentences end up looking
  • 01:30:38 something like this so we're gonna have
  • 01:30:39 this first word will represent a zero a
  • 01:30:41 will be one great will be two and day
  • 01:30:43 will be 3 so then down here we'll have 0
  • 01:30:45 1 in this case we're gonna say good is 4
  • 01:30:48 and day is 3 as well so this means if we
  • 01:30:51 integer encode these sentences we have
  • 01:30:53 some lists that look something like this
  • 01:30:55 now this one clearly is the first
  • 01:30:57 sentence and this one down here will
  • 01:30:59 the second sentence now if we just look
  • 01:31:03 at this and we pretend that you know we
  • 01:31:05 don't even know what these words
  • 01:31:06 actually are all we can really tell is
  • 01:31:09 the fact that two is different from four
  • 01:31:12 now notice what I just said there two is
  • 01:31:15 different from four when in reality if
  • 01:31:17 we look at these two words we know that
  • 01:31:20 they're pretty similar yes they're
  • 01:31:22 different words yes they're different
  • 01:31:23 lengths whatever it is but we know that
  • 01:31:25 they have a similar meaning and the
  • 01:31:27 context in which they're used in this
  • 01:31:29 sentence is the same now our computer
  • 01:31:32 obviously doesn't know that because all
  • 01:31:33 it gets to see is this so what we want
  • 01:31:36 to do is try to get it to have an
  • 01:31:37 understanding of words that have similar
  • 01:31:40 meanings and to kind of group those
  • 01:31:42 together in a similar form or in a
  • 01:31:44 similar way because obviously in our
  • 01:31:46 application here of classifying movie
  • 01:31:48 reviews the types of words that are used
  • 01:31:50 in the context in which they are used
  • 01:31:52 really makes a massive difference to
  • 01:31:54 trying to classify that as either a
  • 01:31:56 positive or a negative review and if we
  • 01:31:58 look at great and good and we say that
  • 01:32:00 these are two completely different words
  • 01:32:02 well that's gonna be a bit of an issue
  • 01:32:04 when we're trying to do some
  • 01:32:05 classifications so this is where our
  • 01:32:06 embedding layer comes in now again just
  • 01:32:09 to say here one more time like we know
  • 01:32:11 these are different but we also would
  • 01:32:13 know for example say if we replace this
  • 01:32:15 four with a three well all our computer
  • 01:32:17 again would know is that two is
  • 01:32:18 different from three just like four is
  • 01:32:20 different from two it doesn't know how
  • 01:32:21 different they are and that's what I'm
  • 01:32:23 trying to get at here is our embedding
  • 01:32:25 layer is going to try to group words in
  • 01:32:28 a similar kind of way so that we know
  • 01:32:30 which ones are similar to each other so
  • 01:32:32 let me now talk about specifically the
  • 01:32:35 embedding layer so let me just draw a
  • 01:32:37 little grid here now what our embedding
  • 01:32:39 layer actually does kind of like I don't
  • 01:32:41 want to say the formal definition but
  • 01:32:43 the more mathy definition is it finds
  • 01:32:45 word vectors for each word that we pass
  • 01:32:48 it or generates word vectors and uses
  • 01:32:51 those word vectors to pass to the future
  • 01:32:54 layers now a word vector can be in any
  • 01:32:57 kind of dimensional space now in this
  • 01:32:59 case we've picked 16 dimensions for each
  • 01:33:02 word vector which means that we're gonna
  • 01:33:04 have vectors maybe something like this
  • 01:33:06 and a vector again it's just a straight
  • 01:33:07 line with a bunch of different
  • 01:33:08 coefficients in some kind of space that
  • 01:33:12 is
  • 01:33:12 in this case 16 dimensions so let's
  • 01:33:15 pretend that this is a 16 dimensional
  • 01:33:16 vector and this is the word vector for
  • 01:33:19 the word half now in our computer it
  • 01:33:22 wouldn't actually be have it would be
  • 01:33:24 zero because again we have integer
  • 01:33:26 encoded stuff but you kind of you get
  • 01:33:28 the point so we'll say this is the word
  • 01:33:30 vector for half now what we're gonna do
  • 01:33:33 immediately when we create this
  • 01:33:35 embedding layer is let me actually get
  • 01:33:37 out of this quickly for one second is we
  • 01:33:39 initially create 10,000 word factors for
  • 01:33:42 every single word and in this case every
  • 01:33:44 single number that represents a word so
  • 01:33:46 what we're gonna do is when we start
  • 01:33:48 creating this embedding layer we see
  • 01:33:51 that we've have an embedding layer is
  • 01:33:52 we're gonna draw 10,000 word vectors and
  • 01:33:55 just kind of some random way that are
  • 01:33:58 just there and each one represents one
  • 01:34:00 word and what happens when we call the
  • 01:34:02 embedding layer is it's gonna grab all
  • 01:34:04 of those word vectors for whatever input
  • 01:34:07 we have and use that as the data that we
  • 01:34:10 pass on to the next layer now how do we
  • 01:34:13 create these word vectors and how do we
  • 01:34:15 group words well this is where it gets
  • 01:34:17 into a bit complicated math I'm not
  • 01:34:19 really gonna go through any equations or
  • 01:34:21 anything like that but I'll kind of give
  • 01:34:22 you an idea of how we do it now we want
  • 01:34:25 to so let me get rid of this word have
  • 01:34:26 because this is not the best word vector
  • 01:34:29 example and let's say that this word
  • 01:34:31 vector is great
  • 01:34:32 now upon creating our word vector our
  • 01:34:35 embedding layer we have two vectors we
  • 01:34:37 have great and we have good and we can
  • 01:34:39 see that these vectors are kind of far
  • 01:34:41 apart from each other and we determine
  • 01:34:43 that by looking at the angle between
  • 01:34:44 them and we say that this angle maybe
  • 01:34:46 it's like I don't know 70 degrees or
  • 01:34:48 something like that and we can kind of
  • 01:34:50 determine that great and good are not
  • 01:34:51 that close to each other but in reality
  • 01:34:54 we want them to be pretty close to each
  • 01:34:56 other we want the computer to look at
  • 01:34:57 great and good and be like these are
  • 01:34:59 similar words let's treat them similarly
  • 01:35:01 in our neural network so what we want to
  • 01:35:04 do hopefully is have these words and
  • 01:35:06 these vectors kind of move closer
  • 01:35:08 together whether it's good going all the
  • 01:35:10 way to great or great going all the way
  • 01:35:12 to good or vice versa right we just want
  • 01:35:14 them to get close together and kind of
  • 01:35:16 be in some form of a group so what we do
  • 01:35:19 is we try to look at the context in
  • 01:35:21 which these words are used rather than
  • 01:35:23 just the content of the words which
  • 01:35:25 would just be what this
  • 01:35:26 looks like we want to figure out how
  • 01:35:28 they how they're used so we'll look at
  • 01:35:30 the words around it and determine that
  • 01:35:32 you know when we have a and day and a
  • 01:35:35 and day maybe that means that these are
  • 01:35:37 like related in some way and then we'll
  • 01:35:39 try to group these words now it's way
  • 01:35:41 more complicated than that don't get me
  • 01:35:43 wrong but it's kind of like a very basic
  • 01:35:46 way of how they group together is we
  • 01:35:48 look at the words that surround it and
  • 01:35:51 just different properties of the
  • 01:35:52 sentence involving that word and then we
  • 01:35:55 can kind of get an idea of where these
  • 01:35:56 words go and which ones are close to
  • 01:35:58 each other so maybe after we've done
  • 01:36:00 some training what happens is our word
  • 01:36:03 embeddings are what is known as learned
  • 01:36:04 just like we're learning and teaching
  • 01:36:07 our neural network and we get we end up
  • 01:36:09 getting great and good very close
  • 01:36:11 together and these are what their word
  • 01:36:12 vector representations are we can tell
  • 01:36:15 that they're close again by looking at
  • 01:36:16 the angle in between here maybe it's
  • 01:36:17 like 0.2 degrees and what that means is
  • 01:36:20 these two vectors which are just a bunch
  • 01:36:22 of numbers essentially are very close
  • 01:36:24 together so when we feed them into our
  • 01:36:26 neural network they should hopefully
  • 01:36:27 give us a similar output at least for
  • 01:36:30 that specific neuron that we give it to
  • 01:36:32 now I know this might be a little bit
  • 01:36:34 confusing but I'm gonna go we're gonna
  • 01:36:36 talk about this a bit more with another
  • 01:36:37 drawing of the whole network but I hope
  • 01:36:39 you're getting the idea the whole point
  • 01:36:40 is embedding layer is to make word
  • 01:36:42 vectors that and then group those word
  • 01:36:45 vectors or kind of like make them close
  • 01:36:46 together based on words that are similar
  • 01:36:48 and that are different so again just
  • 01:36:51 like we would have grading good here we
  • 01:36:53 would hope that a word vector like bad
  • 01:36:55 would be down here where it has a big
  • 01:36:58 difference from great and good so that
  • 01:37:00 we can tell that these words are not
  • 01:37:01 related whatsoever all right so that's
  • 01:37:03 how the embedding layer works now what
  • 01:37:06 ends up happening when we have this
  • 01:37:07 embedding layer is we get an output
  • 01:37:09 dimension of what's known as 16
  • 01:37:10 dimensions and that's just how many
  • 01:37:12 coefficients essentially we have for our
  • 01:37:15 vector so just like if you have a 2d
  • 01:37:17 line so like if this our grid in 2d and
  • 01:37:19 we say that this is X and this is y we
  • 01:37:22 can represent any line by just having
  • 01:37:24 like some values like ax plus B y equals
  • 01:37:30 C now this is the exact same thing that
  • 01:37:32 we can do in in n dimensions which means
  • 01:37:35 like any amount of dimensions so for a
  • 01:37:37 16 dimensional line I'm not gonna draw
  • 01:37:39 them all but we would start with
  • 01:37:40 like ax plus B y plus cz plus DW and so
  • 01:37:47 on and we would just have again 16 of
  • 01:37:49 these coefficients and then some kind of
  • 01:37:51 constant value
  • 01:37:53 maybe we call it lambda that is like
  • 01:37:56 what it's what it equals to what the
  • 01:37:58 equation equals to and that's how we
  • 01:37:59 define a line I'm pretty sure I'm doing
  • 01:38:02 this correctly in in n dimensions so
  • 01:38:05 anyways once we create that line what we
  • 01:38:07 actually want to do is we want to scale
  • 01:38:09 the dimension down a little bit now
  • 01:38:11 that's just because 16 dimensions is a
  • 01:38:13 lot of data especially when we have like
  • 01:38:15 a ton of different words coming into our
  • 01:38:17 network we want to scale it down to make
  • 01:38:19 it a little bit easier to actually
  • 01:38:21 compute and to train our network so
  • 01:38:23 that's where this global average pooling
  • 01:38:26 1d layer comes in now I'm not gonna talk
  • 01:38:28 about this in two depth in too much
  • 01:38:29 depth but essentially the way to think
  • 01:38:31 of the global average pooling 1d is that
  • 01:38:34 it just takes whatever dimension our
  • 01:38:36 data is in and just puts it in a lower
  • 01:38:38 dimension now there's a specific way
  • 01:38:39 that it does that but again I'm not
  • 01:38:41 gonna talk about that it's not super
  • 01:38:43 important if you care about that a lot
  • 01:38:44 just look it up and it's not like crazy
  • 01:38:46 hard but I just I don't feel the need to
  • 01:38:48 go into it in this video so anyways
  • 01:38:50 let's now start drawing what our network
  • 01:38:52 actually looks like after understanding
  • 01:38:54 how this embedding layer works so we're
  • 01:38:57 gonna initially feed in a sequence and
  • 01:38:59 we'll just say that this is like our
  • 01:39:00 sequence of encoded words okay so say
  • 01:39:03 this is our input and maybe it's
  • 01:39:06 something like zero seven nine like a
  • 01:39:08 thousand two hundred a thousand twenty
  • 01:39:11 we have like nine again maybe we have
  • 01:39:14 eight just a bunch of different
  • 01:39:16 essentially numbers right so we're gonna
  • 01:39:18 pass this into our embedding layer and
  • 01:39:20 all this is gonna do is it's gonna find
  • 01:39:22 the representation of these words in our
  • 01:39:25 embedding layer so maybe are embedding
  • 01:39:28 layer well it's gonna have the same
  • 01:39:29 amount of words in our vocabulary so to
  • 01:39:31 look up say zero it'll say maybe zero
  • 01:39:34 means zero is vector is like 0.2 0.3 and
  • 01:39:39 it goes to 16 dimensions but I'm just
  • 01:39:41 gonna do like 2 for this example here
  • 01:39:42 maybe 7 its vector is like 7 and 9.0 and
  • 01:39:48 it just keeps going like this and it
  • 01:39:50 looks up all these vectors so it takes
  • 01:39:53 our input data and it just turns them
  • 01:39:55 into a bunch of vectors and just spits
  • 01:39:57 those out into our next layer now our
  • 01:39:59 next layer what this does is it just
  • 01:40:03 takes these vectors and just averages
  • 01:40:04 them out and it just means it kind of
  • 01:40:06 shrinks them they're down it down so
  • 01:40:07 we'll do like a little smaller thing
  • 01:40:09 here and we'll just say like average ok
  • 01:40:12 so it called this one embedding and that
  • 01:40:16 one is average
  • 01:40:17 now this average layer now is where we
  • 01:40:20 go into the actual neural network well
  • 01:40:22 obviously this is a neural network but
  • 01:40:23 we go into the dense layers which will
  • 01:40:25 actively perform our classification so
  • 01:40:27 what we're gonna do is we're gonna start
  • 01:40:28 with 16 neurons and this is just again
  • 01:40:30 an arbitrary number that we picked for
  • 01:40:33 our network you can mess around with
  • 01:40:35 different values for this and I
  • 01:40:36 encourage you to do that but 16 is what
  • 01:40:38 tensorflow decided to use and what I'm
  • 01:40:39 just following along with so we're gonna
  • 01:40:41 have 16 neurons and we're gonna pass all
  • 01:40:44 of our now 16 dimensional data or
  • 01:40:46 whatever dimensional data it is into
  • 01:40:48 these neurons like this now this is
  • 01:40:51 where we start I'm doing the dense layer
  • 01:40:53 so we have this dense layer and this is
  • 01:40:54 connected to one output neuron like this
  • 01:40:58 so what we end up having is this
  • 01:41:01 embedding layer which is gonna have all
  • 01:41:04 these word vectors that represent
  • 01:41:05 different words we average them out we
  • 01:41:08 pass them into this 16 neuron layer that
  • 01:41:11 then goes into an output layer which
  • 01:41:13 will spit out a value between 0 and 1
  • 01:41:16 using the sigmoid function which I
  • 01:41:18 believe I have to correct myself because
  • 01:41:20 in other videos I said it did between
  • 01:41:21 negative 1 and 1
  • 01:41:22 it just takes any value we have and puts
  • 01:41:24 it in between 0 & 1
  • 01:41:26 like that all right
  • 01:41:29 so that is kind of how our network works
  • 01:41:31 so let me talk about what this dense
  • 01:41:34 layer is doing just a little bit before
  • 01:41:35 we move on to the next video so what
  • 01:41:37 this dense layer is going to attempt to
  • 01:41:39 do essentially is look for patterns of
  • 01:41:42 words and try to classify them using the
  • 01:41:45 same methods we talked about before into
  • 01:41:48 either a positive review or a negative
  • 01:41:50 review I'm going to take all these word
  • 01:41:52 vectors which again are gonna be like
  • 01:41:54 similarly grouped words like great good
  • 01:41:56 are gonna be similar input to this dense
  • 01:41:58 layer right because we've averaged them
  • 01:42:01 out and embedded them in all this and
  • 01:42:02 then what we're gonna do is we're gonna
  • 01:42:04 try to determine based on
  • 01:42:06 what words we have and what order they
  • 01:42:08 come in what our text is and we hope
  • 01:42:11 that this layer of 16 neurons is able to
  • 01:42:14 pick up on patterns of certain words and
  • 01:42:16 where they occur in the sentence and
  • 01:42:18 give us a accurate classification again
  • 01:42:21 it's gonna do that by tweaking and
  • 01:42:22 modifying these weights and all of the
  • 01:42:25 biases that are on you know all of these
  • 01:42:27 different what-do-you-call-it
  • 01:42:29 layers or all of these connections or
  • 01:42:30 whatever they are and then it's going to
  • 01:42:32 give us some output and some level of
  • 01:42:33 accuracy for our network
  • 01:42:39 all right so now it's time to compile
  • 01:42:41 and train our model now the first thing
  • 01:42:44 we have to do is just define the model
  • 01:42:45 give it an optimizer give it a loss
  • 01:42:47 function and then I think we have to
  • 01:42:49 define the metrics as well so we're
  • 01:42:51 gonna do is gonna say model equals in
  • 01:42:53 this case or so not model equals model
  • 01:42:56 dot compile if I spell compile correctly
  • 01:42:59 and then here we're gonna say optimizer
  • 01:43:02 we're gonna use the atom optimizer again
  • 01:43:05 I'm not really going to talk about what
  • 01:43:06 these are that much if you're interested
  • 01:43:08 in the octave I'll just look them up and
  • 01:43:09 then for the loss function where you're
  • 01:43:12 going to use the binary underscore cross
  • 01:43:15 and Trippi
  • 01:43:17 now what this one essentially is is well
  • 01:43:19 binary means like two options right and
  • 01:43:21 our case we want to have two options for
  • 01:43:24 the output neuron which is 0 or 1 so
  • 01:43:27 what's actually happening here is we
  • 01:43:29 have the sigmoid function which means
  • 01:43:30 our numbers gonna be between 0 & 1 but
  • 01:43:32 what the loss function will do is pretty
  • 01:43:34 well calculate the difference between
  • 01:43:36 for example say our output neuron is
  • 01:43:39 like 0.2 and the actual answer was 0
  • 01:43:42 well it will give us a certain function
  • 01:43:44 that can calculate the loss so how much
  • 01:43:46 of a difference is 0.2 is from 0 and
  • 01:43:50 that's kind of how that works again I'm
  • 01:43:53 not gonna talk about them too much and
  • 01:43:55 they're not like I mean they are
  • 01:43:56 important but nots really like memorize
  • 01:43:59 per se like you kind of just mess with
  • 01:44:00 different ones but in this case binary
  • 01:44:03 cross-entropy works well because we have
  • 01:44:04 two possible values 0 1 so rather than
  • 01:44:07 using the other one that we used before
  • 01:44:09 which I don't even remember what it was
  • 01:44:11 called something cross-entropy we're
  • 01:44:12 using binary cross-entropy
  • 01:44:14 okay so now what we're gonna do is we're
  • 01:44:16 actually gonna split our training data
  • 01:44:17 into two sets and the first set of our
  • 01:44:20 training data is gonna be called
  • 01:44:21 validation data or really I guess you
  • 01:44:24 can think of it as a second the word it
  • 01:44:25 doesn't really matter but what we're
  • 01:44:26 gonna do is just get some validation
  • 01:44:27 data and what validation data is is
  • 01:44:30 essentially we can check how well our
  • 01:44:33 model is performing based on the tunes
  • 01:44:35 and tweaks we're doing on the training
  • 01:44:37 data on new data now the reason we do
  • 01:44:39 that is so that we can get a more
  • 01:44:40 accurate sense of how well our model is
  • 01:44:43 because we're gonna be testing new data
  • 01:44:46 to get the accuracy each time rather
  • 01:44:48 than testing it on data that we already
  • 01:44:50 seen before which again means that the
  • 01:44:52 can't simply just memorize each review
  • 01:44:55 and give us either a zero or one for
  • 01:44:56 that it has to actually have some degree
  • 01:44:58 of I don't know like thinking or
  • 01:45:01 operation so that it can work on new
  • 01:45:02 data so we're gonna do is gonna say X
  • 01:45:04 underscore Val equals and all we're
  • 01:45:06 gonna do is just grab the train data and
  • 01:45:09 we're just gonna cut it to a thousand or
  • 01:45:11 ten thousand entries so there's actually
  • 01:45:12 twenty five thousand entries or I guess
  • 01:45:15 reviews in our training data so we're
  • 01:45:17 just gonna take ten thousand of it and
  • 01:45:19 say we're gonna use that as validation
  • 01:45:21 data now in terms of the size of
  • 01:45:22 validation data it doesn't really matter
  • 01:45:24 that much this is what tensorflow is
  • 01:45:27 using so I'm just kind of going with
  • 01:45:29 that but again mess with these numbers
  • 01:45:30 and see what happens to your model
  • 01:45:32 everything with our neural networks and
  • 01:45:33 machine learning really is gonna come
  • 01:45:35 down to very fine what's known as hyper
  • 01:45:38 parameters or like hyper tuning which
  • 01:45:40 means just changing individual
  • 01:45:42 parameters each time until we get a
  • 01:45:44 model that is well just better and more
  • 01:45:46 accurate so we're gonna say that x value
  • 01:45:49 is that but then we're also gonna have
  • 01:45:50 to modify our X train data to be trained
  • 01:45:53 under squared data and in this case
  • 01:45:55 we're just gonna do the other way around
  • 01:45:56 so 10,000 : now I'll just copy this and
  • 01:45:59 we're just gonna replace this again with
  • 01:46:01 instead of test actually oh we have to
  • 01:46:05 do this with labels sorry what am I
  • 01:46:06 thinking so we're just gonna train
  • 01:46:08 change this to be labels and then
  • 01:46:11 instead of X values gonna be Y value and
  • 01:46:14 then wide train so yeah we're not
  • 01:46:16 touching the test data because we're
  • 01:46:17 gonna use all that test data to test our
  • 01:46:20 model and then we're just gonna use the
  • 01:46:22 the training stuff or the validation
  • 01:46:24 data to validate the model all right so
  • 01:46:26 now that we've done that it is actually
  • 01:46:28 time to fit the model so I'm just gonna
  • 01:46:29 say like fit model and you'll see what
  • 01:46:34 I'd name this something different in a
  • 01:46:35 second it's gonna be equal to model dot
  • 01:46:37 fit and in this case what we're gonna do
  • 01:46:39 is gonna say X underscore train Y
  • 01:46:42 underscore train we're gonna say epochs
  • 01:46:45 is equal to angle tell you spell it 40
  • 01:46:49 and again you can mess with this number
  • 01:46:51 and see what we get based on that I'm
  • 01:46:52 going to say batch underscore size
  • 01:46:54 equals 512 which I'll talk about in a
  • 01:46:56 second and then finally we're gonna say
  • 01:46:58 validation underscore data equals and in
  • 01:47:02 here we're gonna say X underscore Val
  • 01:47:05 why underscore Valley and I think that's
  • 01:47:07 it let me just check here quickly a one
  • 01:47:10 last thing that I forgot to do we're
  • 01:47:11 gonna say verbose equals one verbose
  • 01:47:17 equals one now I'm not gonna lie I
  • 01:47:19 honestly don't know what verbose is I
  • 01:47:21 probably should've looked it up before
  • 01:47:22 the video but I have no idea what that
  • 01:47:23 is so someone knows please let me know
  • 01:47:24 but the batch size is essentially how
  • 01:47:27 many what do you call it movie reviews
  • 01:47:31 we're gonna do each time or how many
  • 01:47:34 we're gonna load in at once because this
  • 01:47:36 thing is it's kind of I mean we're
  • 01:47:38 loading all of our reviews into memory
  • 01:47:40 but in some cases we won't be able to do
  • 01:47:44 that and we won't be able to like feed
  • 01:47:45 the model all of our reviews on each
  • 01:47:47 single cycle so we just set up a batch
  • 01:47:50 size that's gonna define essentially how
  • 01:47:52 many at once we're gonna give and I know
  • 01:47:55 I'm kind of horribly explaining what a
  • 01:47:57 batch sizes but we'll get into more on
  • 01:47:59 batch sizes and how we can kind of do
  • 01:48:02 like buffering through our data and like
  • 01:48:04 going taking some from a text file and
  • 01:48:06 reading into memory in later videos when
  • 01:48:08 we have like hundreds of gigabytes of
  • 01:48:10 data that we're gonna be working with
  • 01:48:11 okay so finally we're gonna say results
  • 01:48:13 equals and in this case I believe it is
  • 01:48:16 model dot evaluates and then we're gonna
  • 01:48:19 evaluate this obviously on our test data
  • 01:48:22 so we're gonna give it test data and
  • 01:48:24 test labels so test underscore data test
  • 01:48:27 underscore labels like that and then
  • 01:48:31 finally what I'm gonna do is just
  • 01:48:33 actually print out the results so we can
  • 01:48:36 see what our accuracy is so say print
  • 01:48:38 results and then get that value so let
  • 01:48:42 me run this quickly neural networks text
  • 01:48:45 classification let's go see MD and then
  • 01:48:49 python text or that's not even when
  • 01:48:53 we're using a reasoning tutorial – sorry
  • 01:48:54 and let's see what we get with this this
  • 01:48:56 will take a second to run through the
  • 01:48:57 epoch so I'll fast-forward through that
  • 01:48:59 so you guys don't have to wait alright
  • 01:49:02 so we just finished doing the epochs now
  • 01:49:04 and essentially our accuracy was 87
  • 01:49:08 percent and this first number I believe
  • 01:49:11 is the loss which is 0.33 and then you
  • 01:49:14 can see that actually here we get the
  • 01:49:17 accuracy values and
  • 01:49:18 notice that the accuracy from our last
  • 01:49:20 epoch was actually greater than the
  • 01:49:22 accuracy on the test data which again
  • 01:49:24 shows you that sometimes you know when
  • 01:49:27 you test it on new data you're gonna be
  • 01:49:29 getting a less accurate model or in some
  • 01:49:31 cases you might even get a more accurate
  • 01:49:33 model it really just you can't strictly
  • 01:49:35 go based off what you're getting on your
  • 01:49:36 training data you really do need to have
  • 01:49:38 some test and validation data to make
  • 01:49:41 sure that the models correctly working
  • 01:49:43 so that's essentially what we've done
  • 01:49:45 there and yeah I mean that that's the
  • 01:49:49 model we tested it's 87 percent accurate
  • 01:49:52 so now let's actually have let's
  • 01:49:54 interpret some of these results a little
  • 01:49:55 bit better and let's show some reviews
  • 01:49:58 let's do a prediction on some of the
  • 01:50:00 reviews and then see like if this our
  • 01:50:02 model kind of makes sense for what's
  • 01:50:04 going on here so what I'm gonna do is
  • 01:50:05 I'm just going to actually just copy
  • 01:50:06 some output that I have here just save
  • 01:50:09 us a bit of time because I am gonna wrap
  • 01:50:10 up the video in a minute here but
  • 01:50:12 essentially what this does it just takes
  • 01:50:13 the first review from test data gets the
  • 01:50:17 model to predict that because we
  • 01:50:19 obviously we didn't train it on the test
  • 01:50:20 data so we can do that fine we're gonna
  • 01:50:22 say review and then we print out the
  • 01:50:25 decoded review we're gonna print out
  • 01:50:27 with the model predicted and then we're
  • 01:50:29 gonna print out what the actual label of
  • 01:50:31 that was so if I run this now I'll fast
  • 01:50:34 forward through the kind of training
  • 01:50:35 process and we will see the other all
  • 01:50:37 right
  • 01:50:37 so this is what essentially our review
  • 01:50:40 looks like so at least the one that we
  • 01:50:41 were testing it on and you can see that
  • 01:50:43 we have these little start tag and says
  • 01:50:44 please give this one a Miss for and then
  • 01:50:46 B R stands for like brake line or go to
  • 01:50:49 the next line so we could have actually
  • 01:50:51 added another tag for B R if we notice
  • 01:50:54 that this was used a lot in the review
  • 01:50:57 but we didn't do that so you see B R
  • 01:50:59 unless this is actually part of the
  • 01:51:01 review but I feel like that should be
  • 01:51:02 like brake line in terms of HTML anyways
  • 01:51:05 and we have some unknown characters
  • 01:51:06 which could be anything that we just
  • 01:51:08 didn't know what it was and it says and
  • 01:51:09 the rest of the cast rendered terrible
  • 01:51:11 performance as the show is flat flat
  • 01:51:13 flat brbr I don't know how Michael
  • 01:51:16 Madison could have allowed this one on
  • 01:51:18 his plate he almost seemed he what does
  • 01:51:21 it seem to know this wasn't going to
  • 01:51:22 work out and his performance was quite
  • 01:51:24 unknown so all yeah so anyway so you can
  • 01:51:26 see that this probably had like some
  • 01:51:28 emojis added or something and that's why
  • 01:51:30 we have all these unknowns
  • 01:51:31 and then obviously we made this review
  • 01:51:32 which was pretty short to be the full
  • 01:51:34 length of 2:50 so we see all these pads
  • 01:51:37 that did that for us and then we have a
  • 01:51:39 prediction and an actual value of zero
  • 01:51:41 so we did end up getting this one
  • 01:51:43 correct now I think it'd be interesting
  • 01:51:45 actually to write your own review and
  • 01:51:48 test it on this so in the next video
  • 01:51:50 what I'm gonna do is show you how we can
  • 01:51:51 save the model to avoid doing like all
  • 01:51:54 of this every time we want to run the
  • 01:51:56 code because realistically we don't wait
  • 01:51:58 like a minute or two before we can
  • 01:52:01 predict a movie review every time we
  • 01:52:03 just want it to happen
  • 01:52:04 instantly and we definitely can do that
  • 01:52:06 I just haven't showed that yet in the
  • 01:52:07 series because that's kind of in like
  • 01:52:09 later what you do after you have machine
  • 01:52:12 learning and obviously like this this
  • 01:52:14 model trained pretty quickly like we
  • 01:52:16 only had about what was it like 50,000
  • 01:52:19 test data set which I it seems like a
  • 01:52:21 large number but it's really not
  • 01:52:22 especially when you're talking about
  • 01:52:24 string data so in future videos we're
  • 01:52:26 gonna be training models that take like
  • 01:52:29 maybe a few days to Train at least
  • 01:52:32 that's the goal or maybe a few hours or
  • 01:52:34 something like that so in that case
  • 01:52:36 you're probably not gonna want to train
  • 01:52:37 it every time before you predict some
  • 01:52:38 information so that'll be useful to know
  • 01:52:40 how to save that
  • 01:52:45 so in today's video we're gonna be doing
  • 01:52:47 is talking about saving and loading our
  • 01:52:49 models and then we're going to be doing
  • 01:52:51 a prediction on some data that doesn't
  • 01:52:53 come from this actual data set now I
  • 01:52:55 know this might seem kind of trivial we
  • 01:52:57 already know how to do predictions but
  • 01:52:58 trust me when I tell you this is a lot
  • 01:53:00 harder than it looks because if we're
  • 01:53:02 just taking in string data that means we
  • 01:53:04 have to actually do the encoding all of
  • 01:53:06 the pre-processing removing certain
  • 01:53:08 characters making sure that that data
  • 01:53:10 looks the same as the data that our
  • 01:53:12 neural network is expecting which in
  • 01:53:14 this case is a list of encoded numbers
  • 01:53:17 right or of encoded words that is
  • 01:53:19 essentially just numbers so what were
  • 01:53:21 you do to start is just save our model
  • 01:53:23 so let's talk about that now so up until
  • 01:53:25 this point every time we've wanted to
  • 01:53:26 make a prediction we've had to retrain
  • 01:53:28 the model now on small models like this
  • 01:53:31 that's fine you have to wait a minute
  • 01:53:32 two minutes but it's not very convenient
  • 01:53:34 when you have models that maybe take you
  • 01:53:36 days weeks months years to Train right
  • 01:53:38 so what you want to do is when you're
  • 01:53:40 done training the model you want to save
  • 01:53:42 it or sometimes you even want to save it
  • 01:53:44 like halfway through a training process
  • 01:53:45 this is known as checkpointing the model
  • 01:53:48 so that you can go back and continue to
  • 01:53:50 train it later now in this video we're
  • 01:53:51 just gonna talk about saving the model
  • 01:53:53 once it's completely finished but in
  • 01:53:54 future videos when we have larger
  • 01:53:56 networks we will talk about
  • 01:53:57 checkpointing and how you to how to load
  • 01:53:59 your or train your model in like batches
  • 01:54:02 of a different size data and all that so
  • 01:54:04 what I'm gonna start by doing is just
  • 01:54:06 actually bumping the vocabulary size of
  • 01:54:08 this model up to 88000 now the reason
  • 01:54:11 I'm doing that is just because for our
  • 01:54:13 next exercise which is going to be
  • 01:54:15 making predictions on outside data we
  • 01:54:18 want to have as many words in our model
  • 01:54:20 as possible so that when it gets kind of
  • 01:54:22 some weirder words that aren't that
  • 01:54:23 common it knows what to do with them so
  • 01:54:25 I've done a few tests and I noticed that
  • 01:54:27 with the what he called with the
  • 01:54:29 vocabulary size bumped up it performs a
  • 01:54:31 little bit better so we're gonna do that
  • 01:54:32 so what I mean is we bump the vocabulary
  • 01:54:34 size and now after we train the model we
  • 01:54:36 need to save it now to save the model
  • 01:54:39 all we have to do is literally type the
  • 01:54:41 name of our model in this case model dot
  • 01:54:43 Save and then we give it a name so in
  • 01:54:44 this case let's call it model dot H 5
  • 01:54:47 now H 5 is just like an extension that
  • 01:54:50 means I don't know it's like I honestly
  • 01:54:54 don't know why they use H 5 but it's the
  • 01:54:57 extension for a saved model and care as
  • 01:54:59 tensorflow so we're just gonna work with
  • 01:55:01 that and that's as easy as this is it
  • 01:55:03 was just gonna save our model in binary
  • 01:55:05 data which means we'll be able to read
  • 01:55:07 it in really quickly and use the model
  • 01:55:09 when we want to actually make
  • 01:55:10 predictions let's go ahead and run this
  • 01:55:12 now and then we're gonna have the model
  • 01:55:14 saved and then from now on we won't have
  • 01:55:16 to continually train the model when we
  • 01:55:17 want to make predictions I'm gonna say
  • 01:55:19 python tutorial 2 and I'll be right back
  • 01:55:22 once this finishes finishes running all
  • 01:55:25 right so the model is finished training
  • 01:55:26 notice that our accuracy is slightly
  • 01:55:28 lower than it was in the previous video
  • 01:55:31 really kind of a negligible difference
  • 01:55:33 here but anyways just notice that
  • 01:55:36 because we did bump the vocabulary size
  • 01:55:37 so anyways now that we've saved the
  • 01:55:39 model we actually don't have to go
  • 01:55:41 through this tedious process every time
  • 01:55:42 we run the code of creating and training
  • 01:55:45 and fitting the model and in fact we
  • 01:55:47 don't actually need to save it as well
  • 01:55:48 either here to load our model in now
  • 01:55:50 that it's saved and you can see the file
  • 01:55:52 right here with all this this big
  • 01:55:54 massive binary blob here all we have to
  • 01:55:56 do to load this in is just type one line
  • 01:55:58 now the line is whatever the name of
  • 01:56:00 your model is it doesn't matter I'm just
  • 01:56:02 gonna call it model is equal to in this
  • 01:56:04 case Kara's dot models dot load
  • 01:56:08 underscore model and then here you just
  • 01:56:10 put the name of that file so in this
  • 01:56:11 case model dot h5 now what's really nice
  • 01:56:14 about this as well is you can actually
  • 01:56:15 train a bunch of different models and
  • 01:56:17 tweak like hyper parameters of them and
  • 01:56:19 only save the best one what I mean by
  • 01:56:22 that is like maybe you mess with for
  • 01:56:23 example the amount of neurons in the
  • 01:56:25 second activation layer or something
  • 01:56:27 like that or in the second hidden layer
  • 01:56:29 and then you train a bunch of models you
  • 01:56:31 figure out which one has the highest
  • 01:56:32 accuracy and then you only save that one
  • 01:56:34 that's nice as well and that's something
  • 01:56:36 we could do like overnight you could run
  • 01:56:38 like your script for a few hours train
  • 01:56:40 about your models figure out which one
  • 01:56:41 is the best only save and then use that
  • 01:56:43 one so anyways we're gonna load in this
  • 01:56:46 model notice that I've actually just
  • 01:56:48 commented out this aspect down here
  • 01:56:50 because we are not gonna use this
  • 01:56:51 anymore and now what we're gonna start
  • 01:56:53 doing is actually training or testing
  • 01:56:56 model on some outside data so I've gone
  • 01:56:59 ahead and picked a movie review for one
  • 01:57:01 of my favorite movies some of you guys
  • 01:57:02 can read this if you want but it's the
  • 01:57:04 Lion King absolutely love that movie so
  • 01:57:06 I've decided to go with this this review
  • 01:57:08 was a 10 out of 10 review so a positive
  • 01:57:10 review and we're going to test our model
  • 01:57:11 on this one now actually did take
  • 01:57:13 is off like the IMDB website or whatever
  • 01:57:16 that's called but the data set that they
  • 01:57:19 use is different so this is you guys
  • 01:57:21 will see it why this works a little bit
  • 01:57:23 differently and what we have to do with
  • 01:57:24 this so this is in a text file so what
  • 01:57:27 I'm gonna do is load in the text file
  • 01:57:29 here in code and then get that big blob
  • 01:57:31 that's string and convert it into a form
  • 01:57:33 that our model can actually use so the
  • 01:57:35 first step to do this obviously is to
  • 01:57:37 get that string so we're going to say
  • 01:57:38 with open and in this case I've called
  • 01:57:40 my file test dot txt and then I'm just
  • 01:57:44 gonna set the encoding because I was
  • 01:57:45 running into some issues here you guys
  • 01:57:46 probably don't have to do this I was
  • 01:57:48 gonna say UTF – 8 which is just kind of
  • 01:57:50 a standard text encoding and we're gonna
  • 01:57:52 say as f now again the reason i use
  • 01:57:55 width is just because that means I don't
  • 01:57:57 have to close the file afterwards
  • 01:57:59 better practice if you want to use that
  • 01:58:00 and now I'm gonna say 4 line in F dots
  • 01:58:04 read lines which essentially just means
  • 01:58:07 we're gonna get each line in this case
  • 01:58:09 we only have one line but if we wanted
  • 01:58:10 to throw in a few more reviews in here
  • 01:58:13 and do some predictions on those that
  • 01:58:15 would be very easy to do by keeping this
  • 01:58:17 code structure you just throw another
  • 01:58:18 line in there and now I'm just gonna say
  • 01:58:20 we're gonna grab this line and we're
  • 01:58:22 gonna start pre processing it so that we
  • 01:58:24 can actually feed it to our model now
  • 01:58:26 notice that this when we read this in
  • 01:58:29 all we're gonna get is a large string
  • 01:58:31 but that's no good to us we actually
  • 01:58:33 need to convert this into an encoded
  • 01:58:35 list of numbers right and essentially we
  • 01:58:38 need to say okay so of that's a word
  • 01:58:40 what number represents that put that in
  • 01:58:42 a list
  • 01:58:43 same with all same with the same with
  • 01:58:45 animation right and we keep going and
  • 01:58:47 keep going pretty well for all of the
  • 01:58:50 words in here and we also have to make
  • 01:58:52 sure that the size of our text is only
  • 01:58:54 at max 250 words because that's what we
  • 01:58:58 were using when we were training the
  • 01:58:59 data so it's expecting a size of that
  • 01:59:01 and if you give it something larger
  • 01:59:03 that's not gonna work
  • 01:59:04 or it might but you're gonna get a few
  • 01:59:06 errors with that so anyways the first
  • 01:59:08 step here is I'm going to say n line is
  • 01:59:10 equal to line dots and I'm gonna remove
  • 01:59:13 a bunch of characters that I don't want
  • 01:59:15 so I'm just gonna say dot replace I
  • 01:59:17 think this is the best way to do it but
  • 01:59:19 maybe not um and I'm gonna replace all
  • 01:59:22 the commas all of the periods all of the
  • 01:59:24 brackets and all of the colons
  • 01:59:26 and I'll talk about more why we want to
  • 01:59:28 do that in just one second so we'll do
  • 01:59:30 daughter place
  • 01:59:31 I guess this daughter place should
  • 01:59:33 probably be outside the bracket in the
  • 01:59:35 workplace with a bracket with nothing
  • 01:59:39 and I know this is there probably is a
  • 01:59:42 better way to do this but for our
  • 01:59:43 purposes it's not really that important
  • 01:59:45 and finally we will replace all our
  • 01:59:47 colons with nothing as well now again
  • 01:59:50 the reason I'm doing this is because
  • 01:59:52 let's go here if you have a look for
  • 01:59:55 example when when you split this because
  • 01:59:57 we're just gonna split this data by
  • 01:59:59 spaces and to get all the words what
  • 02:00:02 will end up happening is we're gonna get
  • 02:00:04 words like company comma we're gonna get
  • 02:00:06 words like I'm trying to find something
  • 02:00:08 it has a period like art dot and then a
  • 02:00:11 quotation mark right and we don't want
  • 02:00:12 those to be words in our list because
  • 02:00:15 there's no mapping for art period
  • 02:00:17 there's only a mapping for art which
  • 02:00:19 means that I need to remove all of these
  • 02:00:21 kind of symbols so that when we split
  • 02:00:23 our data we get the correct words now
  • 02:00:26 there'll be a few times where the split
  • 02:00:28 doesn't work correctly but that's ok as
  • 02:00:31 long as the majority of them are working
  • 02:00:32 well same thing with brackets right I
  • 02:00:34 can't have irons and then a closing
  • 02:00:36 bracket as one of my words so I need to
  • 02:00:37 get rid of that now this reminds me I
  • 02:00:39 need to remove quotation marks as well
  • 02:00:40 because they use quite a few of those in
  • 02:00:42 there I don't know why I closed that
  • 02:00:43 document so let's do that as well with
  • 02:00:45 one last replace justice a daughter
  • 02:00:47 place in this case we'll actually just
  • 02:00:50 do backslash quotation mark and then
  • 02:00:52 again with nothing now I'm adding a dot
  • 02:00:55 strip here to get rid of that backslash
  • 02:00:56 N and now we're gonna say dot split and
  • 02:00:59 in this case we'll split out of space
  • 02:01:01 now I know this is a long line but
  • 02:01:04 that's all we need to do to remove
  • 02:01:05 everything and now we actually need to
  • 02:01:07 encode and trim our data down to 250
  • 02:01:10 words so to encode our data I'm gonna
  • 02:01:12 say encode equals in this case and we're
  • 02:01:15 just literally will make a function
  • 02:01:17 called like review under floor in code
  • 02:01:20 and we'll pass in our end line now what
  • 02:01:24 review and code will do is look up the
  • 02:01:26 mappings for all of the words and return
  • 02:01:29 to us an encoded list and then finally
  • 02:01:31 what we're gonna do and we'll create
  • 02:01:33 this function in just a second don't
  • 02:01:34 worry it doesn't already exist is we're
  • 02:01:36 actually going to use what we've done up
  • 02:01:37 here with this test data train data
  • 02:01:39 Chara's pre processing stuff we're just
  • 02:01:41 going to apply this to in this case our
  • 02:01:43 encoded data so we add those pad tags or
  • 02:01:46 we trim it down to what it needs to be
  • 02:01:48 so this case will say
  • 02:01:49 encode equals Kara's dot pre-processing
  • 02:01:53 instead of train data we'll just pass in
  • 02:01:55 this case actually a list and then
  • 02:01:56 encode inside it because that's what
  • 02:01:58 it's expecting to get a list of list all
  • 02:02:01 right so now that we've done that our
  • 02:02:03 final step would be to use the model to
  • 02:02:05 actually make a prediction so we're
  • 02:02:06 gonna say model dot predict and then in
  • 02:02:09 this case we'll pass it's simply this
  • 02:02:11 encode right here which will be in the
  • 02:02:12 correct form now we'll save that under
  • 02:02:14 predict and then what we'll do is just
  • 02:02:17 simply print out the model so we'll say
  • 02:02:19 print or not the model sorry will print
  • 02:02:21 the original text which will be the
  • 02:02:23 review so in this case we'll print line
  • 02:02:25 and then we will print out the encoded
  • 02:02:28 review just we can have a look at what
  • 02:02:29 that is
  • 02:02:30 and then finally we will print the
  • 02:02:32 prediction so what whether the model
  • 02:02:34 thinks it's positive or negative so
  • 02:02:35 we'll just say predict and in this case
  • 02:02:37 we'll just put zero because we're only
  • 02:02:38 gonna be doing like one at a time right
  • 02:02:41 okay sweet so now the last thing that we
  • 02:02:44 need to do is just simply write this
  • 02:02:46 reviewing code function and it'll be
  • 02:02:48 good to go and start actually using our
  • 02:02:50 model so I'm just gonna say define
  • 02:02:51 review underscore encode this is gonna
  • 02:02:54 take a string we'll just call that s
  • 02:02:57 lowercase s and what we're gonna do in
  • 02:02:59 here is setup a new list that we're
  • 02:03:01 going to append some stuff into so I'm
  • 02:03:02 just gonna say like return let's just
  • 02:03:05 say like it encoded equals and then I'm
  • 02:03:08 gonna start this with 1 now the reason I
  • 02:03:10 start one in here is because all of our
  • 02:03:12 data here where it starts has a 1 so
  • 02:03:15 we're gonna start with 1 because we
  • 02:03:17 won't have added that in from the other
  • 02:03:20 way I hope you guys understand that just
  • 02:03:21 we're setting like a starting tag to be
  • 02:03:23 consistent with the rest of them and now
  • 02:03:25 what we're gonna do is we're gonna loop
  • 02:03:26 through every single word that's in our
  • 02:03:29 S here which will be passed in as Alyssa
  • 02:03:31 words we'll look up the numbers
  • 02:03:33 associated with those words and add them
  • 02:03:35 into this encoded list we're gonna say
  • 02:03:37 for word and in this case we're gonna
  • 02:03:39 say word a and s now you'll say if words
  • 02:03:44 in this case word underscore index and
  • 02:03:47 again we're gonna use word underscore
  • 02:03:49 index as opposed to reverse word index
  • 02:03:51 because word in
  • 02:03:53 stores all of the words corresponding to
  • 02:03:55 the letters or not the letters the
  • 02:03:57 numbers which means that we can
  • 02:03:58 literally just throw our data into word
  • 02:04:00 index and it'll give us the number
  • 02:04:03 associated with each of those words so
  • 02:04:04 we're gonna say if word in word index
  • 02:04:06 then we'll say encoded got append and in
  • 02:04:09 this case we'll simply append in this
  • 02:04:11 case word index word now otherwise what
  • 02:04:16 we'll do is we'll say
  • 02:04:17 encoded dot append – now what will
  • 02:04:21 happen is we're gonna check here if word
  • 02:04:23 if the word is actually in our
  • 02:04:25 vocabulary which is represented by word
  • 02:04:28 index which is just a dictionary of all
  • 02:04:30 the words corresponding to all the
  • 02:04:32 numbers that represent those words now
  • 02:04:34 if it's not what we'll do is we'll add
  • 02:04:36 in that unknown tag so that the program
  • 02:04:38 knows that this is an unknown word
  • 02:04:40 otherwise we'll simply add the number
  • 02:04:42 associated with that word now one last
  • 02:04:44 thing to do is actually just do word dot
  • 02:04:46 lower here just to make sure that if we
  • 02:04:48 get any words that have some weird
  • 02:04:49 capitalization they are still found in
  • 02:04:52 our vocabulary so like words at the
  • 02:04:54 beginning of a sentence and stuff like
  • 02:04:55 that uh and now with that being done I
  • 02:04:58 believe we're actually finished and
  • 02:05:00 ready to run this code so what's nice
  • 02:05:02 about this is now that we've saved the
  • 02:05:04 model we don't have to train it again so
  • 02:05:05 I can literally just run this and it
  • 02:05:06 should happen fairly quickly
  • 02:05:08 fingers crossed let's see all right must
  • 02:05:14 be a list of integrals found non
  • 02:05:16 iterable object so what air is that here
  • 02:05:19 um
  • 02:05:19 in code encoding coded code all right so
  • 02:05:24 print reviewing code ah well it would be
  • 02:05:26 helpful if I returned the encoded list
  • 02:05:30 and that would have been our issue there
  • 02:05:31 so let's round that one more time and
  • 02:05:32 see what we're getting there
  • 02:05:33 and there we go sweet so this is
  • 02:05:37 actually the review I know it's very
  • 02:05:39 really hard to read here but if you guys
  • 02:05:40 want to go ahead and read it
  • 02:05:41 feel free since it's on the line King
  • 02:05:43 it's obviously a positive review and
  • 02:05:45 then you can see this is what we've
  • 02:05:46 ended up with so our review has been
  • 02:05:48 translated into this which means we'd
  • 02:05:50 actually trimmed quite a bit of the
  • 02:05:51 review and you can see that wherever it
  • 02:05:53 says – that is actually a word that we
  • 02:05:55 didn't know or that wasn't in our
  • 02:05:56 vocabulary for represents the that's why
  • 02:05:59 there's a lot of fours and then all the
  • 02:06:01 other words have their correspondence
  • 02:06:02 right now fortunately for us we picked a
  • 02:06:06 8,000 vocabulary which means that we can
  • 02:06:08 get indexes like 20,000 whereas before
  • 02:06:10 would all been under 10,000 and you can
  • 02:06:12 see that our prediction here is now 96%
  • 02:06:15 positive which means that obviously like
  • 02:06:18 we were going between 0 where 0 is a
  • 02:06:20 negative review and once a positive
  • 02:06:21 review so this classified correctly as
  • 02:06:24 very positive review and we could try
  • 02:06:26 this on all other kinds of reviews and
  • 02:06:28 see what we get but that is how you go
  • 02:06:30 about kind of transforming your data
  • 02:06:32 into the form that the network expects
  • 02:06:34 and that's where I'm trying to get you
  • 02:06:36 guys at right now is to understand that
  • 02:06:38 yes it's really easy when we're doing it
  • 02:06:41 with this kind of data that just comes
  • 02:06:44 in like IMDB like Cara's load data but
  • 02:06:47 as soon as you actually have to start
  • 02:06:48 using your own data there's a quite a
  • 02:06:49 bit of manipulation that you have to do
  • 02:06:51 and things that you might not think
  • 02:06:53 about when you're actually feeding it to
  • 02:06:55 the network and in most cases you can
  • 02:06:57 probably be sure that your network is
  • 02:06:59 not actually the thing that's happening
  • 02:07:01 incorrectly but it's the data that
  • 02:07:03 you're feeding it is not in the correct
  • 02:07:04 form and it can be tricky to figure out
  • 02:07:07 what's wrong with that data so with that
  • 02:07:09 being said that has been it for this
  • 02:07:10 video I hope you guys enjoyed that's
  • 02:07:12 going to wrap up the text classification
  • 02:07:13 aspect to your of neural networks
  • 02:07:20 hey guys so in today's video I'm gonna
  • 02:07:22 be showing you how to install tensorflow
  • 02:07:23 2.0 GPU version on Anna bunt 2 / Linux
  • 02:07:27 machine now this should work for any
  • 02:07:29 version of Linux or any Linux operating
  • 02:07:31 system although the one I am gonna be
  • 02:07:33 showing you on is a bun to 18 point 0.4
  • 02:07:36 now you may notice that I'm actually on
  • 02:07:38 a Windows machine right now and that
  • 02:07:39 this is actually just in a bunch of
  • 02:07:41 terminal that's open now I'm actually
  • 02:07:43 just SS aged into a server that I have
  • 02:07:45 that contains to 1080 graphics card so
  • 02:07:48 GTX 1080s and that's how I'm gonna be
  • 02:07:50 showing you how to do this now quickly
  • 02:07:52 if you don't understand the difference
  • 02:07:53 between the CPU and the GPU version the
  • 02:07:56 CPU version is essentially just way
  • 02:07:57 slower and you would only really use the
  • 02:08:00 CPU version if you don't have a graphics
  • 02:08:03 card in your computer that is capable of
  • 02:08:04 running tensorflow 2.0 GPU so quickly
  • 02:08:07 before we go forward and you guys get
  • 02:08:09 frustrated with not being able to
  • 02:08:10 install this make sure that you have a
  • 02:08:12 graphics card that actually works for
  • 02:08:14 this programmer for this module that
  • 02:08:16 means you have to have a graphics card
  • 02:08:18 that is a gtx 1050 TI or higher those
  • 02:08:22 are the ones that are listed on
  • 02:08:23 tensorflow website as compatible with
  • 02:08:26 tensorflow 2.0 GPU if you want to have a
  • 02:08:28 quick thing without having to go to the
  • 02:08:30 website to see if yours works even as 4
  • 02:08:32 gigs of video RAM and is a GTX
  • 02:08:35 generation card or higher it most likely
  • 02:08:37 works with tensorflow 2.0 now I don't
  • 02:08:40 know about all the different cards but
  • 02:08:41 you have any questions leave them below
  • 02:08:42 I'll try to answer that for you but any
  • 02:08:44 1060 1070 1080 or r-tx cards they have
  • 02:08:48 CUDA cores on them will work for this
  • 02:08:51 essentially you just need a CUDA enable
  • 02:08:53 GPU so you can check if yours meets that
  • 02:08:55 requirement before moving forward now to
  • 02:08:57 do this I'm just gonna be following the
  • 02:08:59 steps listed on the tensorflow website
  • 02:09:01 now you may run into some issues while
  • 02:09:03 doing this but for Ubuntu this is pretty
  • 02:09:06 straightforward and I'm essentially just
  • 02:09:07 gonna be copying these commands and
  • 02:09:08 pasting them in my terminal now if you'd
  • 02:09:11 like to just try to do this without
  • 02:09:12 following along with video go ahead but
  • 02:09:14 I will be kind of showing you some fixes
  • 02:09:16 that I ran into while I was doing this
  • 02:09:17 um so let's go ahead and get started so
  • 02:09:20 actually let me just split the screen up
  • 02:09:22 so we can have a look at both them at
  • 02:09:23 once I'm in my Linux machine right now
  • 02:09:26 you just have to get to the terminal you
  • 02:09:27 notice that I don't even have a desktop
  • 02:09:29 and I'm literally just gonna start
  • 02:09:30 copying and pasting these commands now
  • 02:09:32 the first thing that we need
  • 02:09:33 to install is actually CUDA now CUDA is
  • 02:09:37 what allows us to use the CUDA cores on
  • 02:09:40 our GPU to actually run the code so just
  • 02:09:43 go ahead and keep copying these commands
  • 02:09:46 it will take a second and I actually
  • 02:09:47 already have this installed on my
  • 02:09:49 machine so I'm gonna go through the
  • 02:09:50 steps with you guys but again if
  • 02:09:52 anything is different on my machine
  • 02:09:53 that's probably because it's already
  • 02:09:55 installed so if you don't know how to
  • 02:09:56 copy it into a window like this you just
  • 02:09:59 right click on your mouse and it'll copy
  • 02:10:01 if you're using a server like I am but
  • 02:10:04 anyways we'll just go through all of
  • 02:10:05 these and keep going now I will have all
  • 02:10:08 these commands listed in my description
  • 02:10:11 as well and that should show you guys
  • 02:10:13 you know if the website goes down at any
  • 02:10:15 point you can just copy it from there as
  • 02:10:16 well
  • 02:10:17 so yeah it literally just keep going all
  • 02:10:18 we're doing here is adding the video
  • 02:10:20 packages we're gonna make sure we have
  • 02:10:21 the Nvidia drivers for our graphics card
  • 02:10:24 that are correct and then we're gonna go
  • 02:10:26 ahead and install tensorflow 2.0 so yep
  • 02:10:29 just go through these commands there's
  • 02:10:31 not really much for me to say as I copy
  • 02:10:32 these in and eventually we will get
  • 02:10:35 through them all alright so now we're
  • 02:10:39 gonna install the Nvidia driver you can
  • 02:10:41 see that's all commented out on this
  • 02:10:42 tensorflow website here copy that and I
  • 02:10:48 can just continue go I don't really have
  • 02:10:50 any commentary for you guys here so
  • 02:10:52 we'll copy this this is gonna install
  • 02:10:54 obviously the development and runtime
  • 02:10:55 libraries which we need and it says
  • 02:10:57 minimum four gigs or approximately four
  • 02:10:59 gigabytes which will mean that's how
  • 02:11:00 long it how many gigabytes is getting
  • 02:11:02 take up on a machine so this will take a
  • 02:11:03 second and I'll fast-forward through
  • 02:11:05 these stuff if it does take a while
  • 02:11:06 finally we're going to install tensor RT
  • 02:11:09 I don't even know what this is but
  • 02:11:11 apparently it's required and then after
  • 02:11:13 we're done this we should actually be
  • 02:11:14 finished installing everything that we
  • 02:11:16 need for tensorflow 2.0 to work again if
  • 02:11:19 you guys want to go through this just go
  • 02:11:20 to the website copy all of these
  • 02:11:22 commands in order paste them into here
  • 02:11:24 and they should work properly
  • 02:11:25 now finally what we have to do is
  • 02:11:27 actually install tensorflow 2.0 so we've
  • 02:11:30 got all the dependence the dependencies
  • 02:11:32 installed and now to install tensorflow
  • 02:11:34 2.0 we're just going to say pip 3
  • 02:11:36 install tensorflow and i believe we're
  • 02:11:40 gonna say – GPU and then equals equals
  • 02:11:43 2.0 point
  • 02:11:45 I gotta find it up here
  • 02:11:46 to make sure that we do it correctly
  • 02:11:48 two-point 0.0 – alpha zero like that
  • 02:11:53 so then we'll do that and that should
  • 02:11:55 install tensorflow 2.0 for us now I
  • 02:11:58 already have this installed but this
  • 02:11:59 will actually take a few minutes to
  • 02:12:01 install because there is quite a bit of
  • 02:12:02 stuff that it needs to download on your
  • 02:12:04 computer
  • 02:12:05 so one of means that has been it for
  • 02:12:06 installing tensorflow 2.0 on your
  • 02:12:08 computer using the GPU version again
  • 02:12:10 throughout the rest of the neural
  • 02:12:11 network series I'm gonna be going
  • 02:12:12 forward doing this on in a bunting
  • 02:12:15 machine so running all of the code I'll
  • 02:12:16 do the development and windows throw the
  • 02:12:18 files on my server train the model train
  • 02:12:21 the models excuse me and then take the
  • 02:12:23 models off and use them on my Windows
  • 02:12:25 machine so if you want to validate if
  • 02:12:27 this is working you can really quickly
  • 02:12:29 just do Python 3 in Linux and then you
  • 02:12:31 can cite do import tensorflow
  • 02:12:34 and doing that you shouldn't get any
  • 02:12:37 errors if you don't get any errors then
  • 02:12:38 you have successfully installed tends to
  • 02:12:40 flow 2.0 now a few errors here because
  • 02:12:43 they're still listening and stuff wasn't
  • 02:12:44 working if for some reason when you
  • 02:12:46 install tensorflow
  • 02:12:47 and you notice that it's not using your
  • 02:12:48 GPU go ahead and uninstall the CPU
  • 02:12:51 version of tensorflow so just pip 3 on
  • 02:12:54 on the install and then tensorflow and I
  • 02:12:58 guess you'd have to just do just
  • 02:13:00 tensorflow like that and that will
  • 02:13:02 install the CPU version if it is
  • 02:13:03 installed in your machine so anyways
  • 02:13:05 that has been it for how to install
  • 02:13:06 tensorflow 2.0 GPU version on a bun –
  • 02:13:09 pretty straightforward just go through
  • 02:13:10 copies commands if you guys have any
  • 02:13:12 questions or errors please just leave
  • 02:13:13 them in the comments below and I will
  • 02:13:15 try my best to help you up