- 00:00:02 hello welcome to this video
- 00:00:04 in this video we will get started with
- 00:00:06 python for data analysis
- 00:00:08 so if you are working with excel for
- 00:00:10 example and you always wondered
- 00:00:12 how can i use something like python to
- 00:00:14 analyze my data well
- 00:00:16 then this is exactly what we will dive
- 00:00:18 into now
- 00:00:19 because we will have a look at how we
- 00:00:21 can set up python correctly to make sure
- 00:00:23 we can use it for data analysis
- 00:00:25 we'll then have a look at how we can
- 00:00:27 access a csv file a
- 00:00:29 typical excel use case for example so
- 00:00:32 how we can access the csv file
- 00:00:33 and how we can access specific columns
- 00:00:35 or rows in that file
- 00:00:37 and finally how we can also plot a quick
- 00:00:39 chart using python
- 00:00:41 that's it actually that's just the
- 00:00:44 basics of course but you have to get
- 00:00:46 started somehow so let's get started
- 00:00:48 together
- 00:00:48 in this video
- 00:00:52 to get started we have to install python
- 00:00:55 first
- 00:00:56 and we basically have two options to do
- 00:00:58 so
- 00:00:59 one option would be to directly download
- 00:01:02 python
- 00:01:03 the alternative would be to use a python
- 00:01:05 distribution
- 00:01:06 now what's the difference and what do i
- 00:01:08 recommend well
- 00:01:09 the direct download would mean that you
- 00:01:11 go to python.org
- 00:01:13 download the well python language
- 00:01:15 basically there
- 00:01:16 and follow the installation guidelines
- 00:01:18 pretty straightforward actually
- 00:01:21 there is also no problem about doing it
- 00:01:23 like that the only
- 00:01:24 problem especially for beginners is that
- 00:01:27 python is an
- 00:01:27 open source language and python is a
- 00:01:30 really powerful language
- 00:01:31 this means you can use it for data
- 00:01:33 analysis for machine learning
- 00:01:34 but also for things like web development
- 00:01:36 or even for creating desktop
- 00:01:38 applications this means that
- 00:01:41 using the plain vanilla python such as
- 00:01:43 the
- 00:01:44 bare the pure code we have can work
- 00:01:47 but it would be nicer to have some
- 00:01:49 tailored specific
- 00:01:50 functionalities which better meet your
- 00:01:53 requirements
- 00:01:54 because if you want to build a web page
- 00:01:55 or if you want to analyze data
- 00:01:57 well as you can imagine there are
- 00:01:58 different things that you would like to
- 00:02:00 do
- 00:02:00 with that language and for that python
- 00:02:03 comes with a lot of different packages
- 00:02:06 and libraries
- 00:02:07 which add such additional
- 00:02:08 functionalities and such
- 00:02:10 convenience functionalities to that
- 00:02:12 language the problem is that the
- 00:02:14 direct download python version doesn't
- 00:02:16 come with these packages
- 00:02:18 it does come with pip python's
- 00:02:21 integrated package manager though
- 00:02:23 and with that pip install command just
- 00:02:25 in case you sought it somewhere already
- 00:02:27 you can install the different packages
- 00:02:29 you need so this is working
- 00:02:31 nothing wrong about that but using a
- 00:02:33 python distribution
- 00:02:35 is a lot more convenient especially
- 00:02:37 because of these
- 00:02:38 packages because you could go to
- 00:02:40 anaconda.com for example
- 00:02:42 download the python distribution from
- 00:02:45 there we'll have a look at that in a few
- 00:02:47 seconds
- 00:02:48 and with that you install python and a
- 00:02:51 lot of the most common packages
- 00:02:53 that are used together with python and
- 00:02:56 because of that
- 00:02:57 increased convenience i definitely
- 00:03:00 recommend using the python distribution
- 00:03:02 if you just get started with python now
- 00:03:05 let's go to anaconda.com together and
- 00:03:07 see how this basically works
- 00:03:08 because it works pretty straightforward
- 00:03:10 to be honest
- 00:03:11 you just go to anaconda.com and now you
- 00:03:14 can scroll down a bit
- 00:03:15 here you can see some anaconda products
- 00:03:18 and the product of your interest
- 00:03:20 should be the anaconda distribution
- 00:03:22 right here as you can see
- 00:03:23 it's the most popular python data
- 00:03:25 science distribution so
- 00:03:26 i think that's not the worst choice in
- 00:03:28 our case just hit download now right
- 00:03:30 here
- 00:03:31 and select your operating system in my
- 00:03:33 case this is the mac
- 00:03:35 and down here you can now download the
- 00:03:37 installer choose
- 00:03:38 python 3 right here by the way that's
- 00:03:40 what i would recommend
- 00:03:41 and then you can decide if you want to
- 00:03:43 use the graphical installer
- 00:03:44 or the command line installer i would
- 00:03:46 use the graphical installer it's a nicer
- 00:03:48 interface
- 00:03:49 and then you can simply follow the
- 00:03:51 installation instructions
- 00:03:53 and with that you are already done so
- 00:03:56 you are basically ready to use python
- 00:03:57 now
- 00:03:58 after installing this python
- 00:04:00 distribution
- 00:04:01 sounds quite awesome sounds quite easy
- 00:04:03 it definitely is but before we dive into
- 00:04:05 the python code we have to think about
- 00:04:07 one more thing
- 00:04:08 what is the working environment we want
- 00:04:11 to use in python because
- 00:04:12 writing python code can be done in a lot
- 00:04:14 of different ways
- 00:04:16 one way would be the rebel rebel stands
- 00:04:19 for
- 00:04:19 read eval evaluate print and loop
- 00:04:23 and basically means that we can write
- 00:04:25 python code
- 00:04:26 in our command prompt or in the terminal
- 00:04:28 on the mac
- 00:04:30 this is not a big issue in general you
- 00:04:32 can do that and it's quite nice to get
- 00:04:34 started
- 00:04:34 you only have to type python free in
- 00:04:37 your terminal
- 00:04:38 and then you can basically get started
- 00:04:40 there is no additional installation of
- 00:04:42 an ide
- 00:04:43 or of some code editor needed but it's
- 00:04:46 rather a code
- 00:04:47 playground so as i said nice to get
- 00:04:50 started
- 00:04:50 nice to play around but not what we will
- 00:04:52 use in here
- 00:04:54 the second alternative would be to use
- 00:04:56 an ide
- 00:04:57 or a code editor pycharm or vs code for
- 00:05:00 example
- 00:05:02 these code editors are nice because they
- 00:05:05 come with some additional convenience
- 00:05:07 features
- 00:05:07 like version control or debugging and
- 00:05:10 especially if you're coming from a web
- 00:05:12 development world
- 00:05:13 you're quite used to such code editors
- 00:05:15 so nothing wrong about these by the way
- 00:05:17 you can use code editors but as we don't
- 00:05:20 want to create a web page but as we want
- 00:05:22 to analyze data
- 00:05:23 there is a third and in my case also
- 00:05:26 preferable option
- 00:05:27 especially if you want to get started
- 00:05:29 with data analysis and that is
- 00:05:31 using a jupiter notebook as our code
- 00:05:34 writing environment a jupiter notebook
- 00:05:38 in simple terms well simply means
- 00:05:40 running python code
- 00:05:41 in the browser with a tailored or
- 00:05:44 python-specific
- 00:05:45 interface the cool thing is that it runs
- 00:05:48 in the browser but it runs
- 00:05:50 locally on our machine and if this
- 00:05:53 sounds strange to you no worries we'll
- 00:05:54 have a look at that in a few seconds
- 00:05:56 but more information about it can be
- 00:05:58 found on jupiter.org
- 00:06:00 and as i said the cool thing is that
- 00:06:03 it's an interactive
- 00:06:04 browser interface for the python code we
- 00:06:08 can see
- 00:06:08 our input and our output at the same
- 00:06:11 time
- 00:06:11 so if i enter code i can see
- 00:06:15 the result of this code also in this
- 00:06:18 browser window
- 00:06:19 this by the way also includes visuals as
- 00:06:21 you will see throughout this video
- 00:06:23 and therefore it's in my opinion the
- 00:06:25 best environment to get started with
- 00:06:27 python
- 00:06:28 another question is how can we install
- 00:06:30 such a jupyter notebook
- 00:06:32 well you can go to jupiter.org and
- 00:06:34 download it
- 00:06:36 or you can simply open the terminal or
- 00:06:39 the command prompt
- 00:06:40 and now type conda list what does this
- 00:06:43 mean
- 00:06:44 well we installed anaconda and just as
- 00:06:47 python comes with this pip
- 00:06:49 this integrated package manager anaconda
- 00:06:52 also comes with an integrated package
- 00:06:54 manager this is simply called
- 00:06:55 conda right here and with this
- 00:06:57 cornerlist command
- 00:06:58 and with hitting enter you can find
- 00:07:02 a list of all the packages we installed
- 00:07:05 as part of our anaconda distribution so
- 00:07:08 these packages are now installed
- 00:07:10 on your system and if we scroll
- 00:07:13 up a bit to the right here
- 00:07:16 we can see that we installed jupiter
- 00:07:19 already so this is exactly this jupiter
- 00:07:21 notebook i was referring to
- 00:07:22 and what does this mean for us then well
- 00:07:24 this simply means
- 00:07:25 that we can now immediately start
- 00:07:28 writing our first python code
- 00:07:29 in such a jupyter notebook for that i
- 00:07:32 will
- 00:07:33 create a or open a new tab like this up
- 00:07:36 here with new tab
- 00:07:38 and now just enter jupiter notebook
- 00:07:41 like this if we now hit enter the
- 00:07:44 jupyter notebook gets up and running and
- 00:07:47 now you should see
- 00:07:48 your jupyter notebook so this browser
- 00:07:51 window
- 00:07:51 running with that well specific tailored
- 00:07:55 interface in the end i navigated into a
- 00:07:58 project folder already so please do the
- 00:08:00 same because well you can
- 00:08:02 create that notebook then in the folder
- 00:08:05 of your choice
- 00:08:06 and if we now go right here to new to
- 00:08:09 the right part of that page
- 00:08:11 we can either create a new notebook
- 00:08:13 which is what we will do in a few
- 00:08:14 seconds
- 00:08:15 or you can also create a text file or a
- 00:08:16 folder for example
- 00:08:18 we don't need that we just want to
- 00:08:19 create a python free
- 00:08:21 notebook because that's the python
- 00:08:22 version we installed
- 00:08:24 so if we click onto that we see this new
- 00:08:27 window with a so-called
- 00:08:28 cell in here the cell is the part in the
- 00:08:31 jupyter notebook where we can write our
- 00:08:32 python code
- 00:08:33 and we can also rename that up here if
- 00:08:35 we click on to untitled and maybe call
- 00:08:37 it
- 00:08:38 python for data analysis
- 00:08:42 something like that no need to do that
- 00:08:43 though so this is how you can
- 00:08:45 rename such a jupyter notebook and in
- 00:08:47 here we can now write
- 00:08:49 our python code for example to start
- 00:08:51 really complicated
- 00:08:52 let's say two plus two like that
- 00:08:56 and now you can either hit run right
- 00:08:58 here or let's do
- 00:09:00 maybe two plus four maybe or
- 00:09:03 hit shift and enter this will basically
- 00:09:05 run the code
- 00:09:06 and really important show the output
- 00:09:09 right here immediately
- 00:09:11 that's also what i referred to when we
- 00:09:12 had a look at the slide
- 00:09:14 the notebook allows us to see both input
- 00:09:16 and output
- 00:09:17 immediately and on the same page and
- 00:09:19 this is quite cool actually
- 00:09:21 however there are two more things that
- 00:09:22 we have to change or understand
- 00:09:24 before we can finally dive into our data
- 00:09:26 analysis code
- 00:09:28 the first thing is we need an input file
- 00:09:31 a source file that we want to well get
- 00:09:33 access to
- 00:09:34 for that you can find a link to the
- 00:09:35 source file below the video
- 00:09:37 in the video description so simply click
- 00:09:39 onto that link and download the file
- 00:09:42 and then simply take that file it's
- 00:09:44 called revenue profit csv
- 00:09:47 and drag and drop it into the folder
- 00:09:50 where you created that jupyter notebook
- 00:09:53 in my case this is this
- 00:09:54 basics folder and in this folder you can
- 00:09:57 see
- 00:09:58 this python for data analysis ipynb file
- 00:10:02 that's the file type jupyter notebook
- 00:10:05 basically creates and uses
- 00:10:07 and into this folder just drag this
- 00:10:09 revenue profit csv file
- 00:10:11 so this was one thing this was adding
- 00:10:13 the source file but there is also one
- 00:10:15 second thing we have to understand
- 00:10:16 before we can finally start
- 00:10:18 analyzing our csv file and this brings
- 00:10:20 us to the last slide in this video
- 00:10:23 because we talked about it python is an
- 00:10:26 open source
- 00:10:27 language made for basically any kind of
- 00:10:30 purpose
- 00:10:30 and therefore we can install additional
- 00:10:33 packages i talked about that already
- 00:10:36 python is also coming with some built-in
- 00:10:38 modules though
- 00:10:39 you can find out more about these if you
- 00:10:41 google for the python standard
- 00:10:43 library but especially if for data
- 00:10:46 science purposes or for data analysis
- 00:10:48 purposes
- 00:10:49 you would install optional or
- 00:10:52 third-party packages and libraries
- 00:10:54 now there are lots and lots of libraries
- 00:10:56 and packages available
- 00:10:58 but three of the most common ones you
- 00:11:00 will well probably use
- 00:11:02 is numpy pandas and matplotlib
- 00:11:06 now what are these packages doing well
- 00:11:09 numpy simply adds
- 00:11:10 multi-dimensional array support so
- 00:11:12 basically being able to read columns and
- 00:11:14 rows in python
- 00:11:15 in simple words pandas allows us to
- 00:11:19 add improved or better data manipulation
- 00:11:23 and analysis
- 00:11:24 features to python and matplotlib
- 00:11:26 basically allows us to visualize
- 00:11:28 information
- 00:11:29 now if you think about what we want to
- 00:11:30 do we want to read data well analyze the
- 00:11:33 data and visualize the data
- 00:11:34 these packages don't sound too bad now
- 00:11:37 the great thing about these packages is
- 00:11:39 that we got them installed on our system
- 00:11:41 already
- 00:11:42 because remember what i said about
- 00:11:43 anaconda that anaconda comes with
- 00:11:46 well actually all of the most popular
- 00:11:48 packages
- 00:11:49 by default so if we go back to our
- 00:11:52 terminal right here so not into that
- 00:11:55 part but into that part where we had
- 00:11:57 that
- 00:11:57 conda list command and as we saw jupiter
- 00:12:00 right here
- 00:12:01 we can also see that we have for example
- 00:12:03 matplotlib
- 00:12:04 right here or pandas
- 00:12:08 right there or numpy right there not
- 00:12:10 right there
- 00:12:11 this one is numpy so this means we got
- 00:12:13 these packages installed in our system
- 00:12:15 or on our system we only need to import
- 00:12:18 these now
- 00:12:18 to our project to do that i'm back in my
- 00:12:22 jupyter notebook
- 00:12:23 and now we can simply type import numpy
- 00:12:27 as np that's the typical way how we
- 00:12:30 import
- 00:12:31 numpy to our projects and we can also
- 00:12:33 import
- 00:12:34 pandas as pd also the typical way
- 00:12:38 how we import the pandas package
- 00:12:41 hitting shift and enter will basically
- 00:12:43 well finish this import
- 00:12:44 this means we can now use this data and
- 00:12:48 specifically we will use the pandas
- 00:12:50 package right here
- 00:12:51 to access our csv data so let's maybe
- 00:12:54 call
- 00:12:55 this functionality this reading
- 00:12:56 functionality
- 00:12:58 content maybe like that and content
- 00:13:01 should be equal
- 00:13:02 to p d dot this basically means that
- 00:13:05 we now want to access a function
- 00:13:09 that is available in pandas and if we
- 00:13:12 now hit tab
- 00:13:13 you can find a lot of different
- 00:13:14 functionalities that are implemented
- 00:13:16 into this pandas package now if we
- 00:13:19 scroll down a bit
- 00:13:21 right here we can see this pd.read.csv
- 00:13:25 functionality
- 00:13:26 as we want to access a csv file this
- 00:13:29 doesn't sound like the worst plan
- 00:13:30 so if we click onto that we can
- 00:13:33 basically access the csv file now
- 00:13:35 we only need to tell python or pandas
- 00:13:38 the
- 00:13:38 file name or the path of our file for
- 00:13:41 that
- 00:13:42 let's add brackets right here and let's
- 00:13:44 now
- 00:13:45 insert another cell below our current
- 00:13:47 run like this
- 00:13:49 because in our case the jupiter notebook
- 00:13:52 file
- 00:13:52 and the source file are in the same
- 00:13:54 folder in the basics folder
- 00:13:55 so if you now type ls like that you can
- 00:13:59 see that we have our
- 00:14:00 python file and we have our
- 00:14:02 revenueprofit.csv file
- 00:14:04 and that's exactly the name that we can
- 00:14:06 select now so select it
- 00:14:07 and copy it and now paste it right here
- 00:14:10 into the brackets make sure to also add
- 00:14:13 single quotation marks
- 00:14:14 otherwise this doesn't work here and you
- 00:14:17 can also
- 00:14:18 select the cell right here with the ls
- 00:14:20 we don't need it anymore
- 00:14:21 hit escape and press d two times
- 00:14:24 like that this deletes a cell a nice
- 00:14:27 feature it might be helpful in some
- 00:14:29 cases
- 00:14:30 so with that we now said that
- 00:14:33 if we use content right here we want to
- 00:14:35 read the content of this
- 00:14:37 csv file so let's press shift and enter
- 00:14:40 and as you can see
- 00:14:41 nothing is displayed here as an output
- 00:14:43 as we saw it before
- 00:14:45 but if we now type content like this and
- 00:14:48 press shift and enter again
- 00:14:50 then we can see the output of our or
- 00:14:53 basically not the output
- 00:14:54 the content of our csv file
- 00:14:57 we can see that we have some problems
- 00:15:00 still though
- 00:15:00 because apparently the delimiter is not
- 00:15:03 correct in our case
- 00:15:05 now the great thing is that besides
- 00:15:06 adding the file name
- 00:15:08 or the path including the file name if
- 00:15:10 the file is not located
- 00:15:12 in the same folder as it is the case for
- 00:15:14 us you can also add other parameters
- 00:15:16 here
- 00:15:17 so if we enter or if we add a comma
- 00:15:20 and now say sap like this equals
- 00:15:24 now single quotation marks now we can
- 00:15:26 define
- 00:15:27 our separator our delimiter right here
- 00:15:30 in our case this should be a semicolon
- 00:15:33 so if we add the semicolon
- 00:15:35 and now important run this
- 00:15:38 cell right here once again that's
- 00:15:40 important you always have to
- 00:15:41 rerun the cells to make sure the changes
- 00:15:43 are applied and now run this cell once
- 00:15:46 again
- 00:15:47 you can see that our file is displayed
- 00:15:49 correctly now
- 00:15:51 now talking about files and being
- 00:15:53 displayed correctly
- 00:15:54 what do we actually see right here well
- 00:15:57 if we click into this cell
- 00:15:59 and now enter type and now enter content
- 00:16:02 so the type
- 00:16:03 of the content that we displayed right
- 00:16:05 here and again press shift and enter
- 00:16:08 then we can see that pandas created a
- 00:16:10 so-called
- 00:16:11 data frame now i don't want to dive too
- 00:16:14 much
- 00:16:14 into the details of these data frames
- 00:16:16 but the important thing is that a data
- 00:16:18 frame
- 00:16:19 simply represents this structure right
- 00:16:22 here
- 00:16:22 so we have a tabular structure and
- 00:16:25 that's important
- 00:16:26 an indexed tabular structure this means
- 00:16:28 we have an
- 00:16:29 index for our rows 0 to 7 in our case
- 00:16:32 and we also have an index for our
- 00:16:34 columns this index is
- 00:16:36 automatically created in our case
- 00:16:37 because the csv file had
- 00:16:39 headers so our headers basically are now
- 00:16:41 the index for our different columns
- 00:16:43 but you could also create these indexes
- 00:16:45 on your own nothing we'll ever look at
- 00:16:47 in this video though
- 00:16:48 so let's just keep in mind that we have
- 00:16:50 a data frame here which is a structure
- 00:16:52 that is created by pandas
- 00:16:54 and that we can use this data frame to
- 00:16:56 basically access our columns and
- 00:16:58 rows by certain indexes we'll see how
- 00:17:02 this works in a few seconds
- 00:17:04 now with this data frame structure we
- 00:17:06 also get a lot of data frame
- 00:17:08 functionalities you could say now what
- 00:17:10 do you mean by that
- 00:17:12 if i enter content here once again
- 00:17:15 and now type head and add brackets and
- 00:17:18 press shift and enter
- 00:17:20 you can basically see the same structure
- 00:17:22 we had before
- 00:17:23 the difference now is that we don't have
- 00:17:25 a preview of our entire content right
- 00:17:27 here
- 00:17:27 but that we only see a preview that we
- 00:17:30 can define on our own now what do you
- 00:17:32 mean by that
- 00:17:33 by default the head gives us back the
- 00:17:35 first five rows
- 00:17:36 of our file but if i enter a free into
- 00:17:39 these brackets right here
- 00:17:41 and press shift and enter once again you
- 00:17:43 can see that we only see the first
- 00:17:45 three rows guess you see how this works
- 00:17:47 now if i enter four we get four rows and
- 00:17:49 so on
- 00:17:50 and if we enter eight right here because
- 00:17:52 we have eight rows
- 00:17:53 zero with the index zero to the index
- 00:17:56 seven well
- 00:17:56 then we get the entire content so that's
- 00:17:59 already a specific
- 00:18:00 data frame functionality of pandas
- 00:18:03 but we have more of these for example we
- 00:18:06 can see that no data right here
- 00:18:08 apparently is not the right label for
- 00:18:10 our column
- 00:18:11 we can easily rename that in pandas for
- 00:18:14 that
- 00:18:14 i'll again set content equal to
- 00:18:17 something and this something simply is
- 00:18:20 content dot
- 00:18:21 now hit tab again and now you can see
- 00:18:23 that we have a lot of different things
- 00:18:25 that we can do
- 00:18:26 with our content one interesting thing
- 00:18:28 down here if we scroll down
- 00:18:30 is that we can rename it like that
- 00:18:33 now we simply open the brackets once
- 00:18:36 again
- 00:18:36 and what do we want to rename well we
- 00:18:38 want to rename a certain
- 00:18:40 column for that we simply type columns
- 00:18:43 like that equals and now curly braces
- 00:18:46 and now we need two things we first need
- 00:18:49 the
- 00:18:50 name of the the current name of the
- 00:18:52 label so this is
- 00:18:53 no data in our case now we add a colon
- 00:18:56 and now we define the new label name
- 00:18:58 that we want to have which is here in
- 00:19:00 our case
- 00:19:01 i guess this makes sense if we now press
- 00:19:03 shift and enter
- 00:19:04 we can see nothing but if we now again
- 00:19:06 say content
- 00:19:08 oops head like that maybe
- 00:19:12 four five you can see
- 00:19:15 that year now replaced no data so we can
- 00:19:18 also easily
- 00:19:19 access the data and change it right here
- 00:19:21 in our notebook
- 00:19:23 now with that let's say our data looks
- 00:19:25 fine and now we want to access specific
- 00:19:28 rows or columns let's start with a
- 00:19:30 specific row maybe
- 00:19:32 so let's start let's say we want to have
- 00:19:34 the
- 00:19:35 single year let's say a specific year
- 00:19:38 and we want to retrieve the data only
- 00:19:40 for the year 2012 for example
- 00:19:43 for that we could say content because
- 00:19:46 that's what we want to access
- 00:19:47 and specifically we want to access the
- 00:19:50 year column of our content so of the
- 00:19:52 table right here in our csv file
- 00:19:55 we can then set year equal to 2012
- 00:19:59 like that and now press shift and enter
- 00:20:02 and if we now simply say
- 00:20:04 single year like that then you can see
- 00:20:08 that we retrieve the data for the year
- 00:20:10 2012 only
- 00:20:12 so this is one way how we can access
- 00:20:14 specific
- 00:20:15 rows now what about columns then well we
- 00:20:18 could also
- 00:20:20 say if we want to access a single column
- 00:20:24 one way to do so would be to simply type
- 00:20:26 content right here
- 00:20:27 and after content we now can specify
- 00:20:30 which column we want to access
- 00:20:32 in our case this could be the revenue if
- 00:20:34 we now press shift and enter
- 00:20:36 and type single column right here
- 00:20:40 then you can see that we only got the
- 00:20:41 data for the revenue column so 100
- 00:20:44 100 and so on if you want to access
- 00:20:48 multiple columns by the way you could
- 00:20:50 also write something like this
- 00:20:51 so you open the curly braces two times
- 00:20:54 and now say you want to have
- 00:20:55 revenue like this
- 00:20:58 and profit like that if you do that
- 00:21:02 you can see that we have the revenue and
- 00:21:04 the profit columns
- 00:21:05 both in our table right here now these
- 00:21:08 ways to access data are
- 00:21:10 fine but i don't think these are really
- 00:21:12 clear and
- 00:21:13 really well flexible as i would call
- 00:21:15 them
- 00:21:16 because of that we have another
- 00:21:18 functionality
- 00:21:19 implemented into pandas which makes
- 00:21:21 things like that a lot easier
- 00:21:23 the so-called log argument now what does
- 00:21:25 this mean
- 00:21:26 well we can simply access our content so
- 00:21:30 up here the content right here once
- 00:21:32 again by while typing
- 00:21:34 content and now we add log to it like
- 00:21:37 that
- 00:21:38 and this simply allows us to select the
- 00:21:41 rows
- 00:21:41 and the columns by their label names
- 00:21:44 and this makes this really easy to use
- 00:21:47 because
- 00:21:48 we can see right here that for the rows
- 00:21:49 we have the label names from 0
- 00:21:51 to 4 actually not 4 if we
- 00:21:55 increase the head we could see that it's
- 00:21:57 zero to seven actually
- 00:21:58 and that we have column labels
- 00:22:02 named year revenue cost and profit and
- 00:22:05 so on
- 00:22:06 now what does this mean well let me
- 00:22:08 maybe first add another
- 00:22:09 cell above right here and let's print
- 00:22:11 out the head one more time
- 00:22:13 to see our entire table right here now
- 00:22:17 what i could say here is that i would
- 00:22:19 like to have
- 00:22:20 the row with the label name
- 00:22:23 two and i want to have the column
- 00:22:27 with the label name revenue something
- 00:22:29 like that
- 00:22:30 if i do this you can see that we get
- 00:22:34 130 that's exactly what we can see right
- 00:22:36 here because the revenue in the year
- 00:22:38 2012 was at 130
- 00:22:41 but we can do more let's say we also
- 00:22:43 want to
- 00:22:44 add the year 2015. for that we can
- 00:22:47 simply add
- 00:22:48 5 right here but now put both of these
- 00:22:51 into curly braces
- 00:22:52 otherwise this wouldn't work like this
- 00:22:54 and like that
- 00:22:55 so what we basically want to have is we
- 00:22:57 want to
- 00:22:59 display the revenue column or the
- 00:23:01 revenue basically
- 00:23:02 for the years with the label 2 and 5. so
- 00:23:05 2012 and 2015. if we now press shift and
- 00:23:09 enter
- 00:23:09 well you can see that we get 130 right
- 00:23:11 here and 179 right there
- 00:23:14 the same logic of course also applies
- 00:23:16 for the column labels
- 00:23:18 so if we add square brackets right here
- 00:23:21 and then
- 00:23:22 simply put the revenue right here and
- 00:23:24 now also add the profit for example
- 00:23:26 like that and press shift and enter you
- 00:23:29 can see that we get
- 00:23:30 the second and fifth label name for the
- 00:23:33 row
- 00:23:33 and two columns for the for revenue and
- 00:23:35 profit
- 00:23:37 you could also say that you want to have
- 00:23:38 the data for a specific
- 00:23:41 range of years so let's say you want to
- 00:23:43 start in the year 2012
- 00:23:44 and you want of the data up to year
- 00:23:46 2017. for that you could say you want to
- 00:23:49 have 2 up to 7
- 00:23:50 from the label name perspective to do
- 00:23:53 this you can get rid of the
- 00:23:55 square brackets here and simply type 2
- 00:23:57 to 7
- 00:23:58 like that important this refers to the
- 00:24:01 label name so this includes
- 00:24:03 all of these rows so if i enter shift
- 00:24:06 and
- 00:24:06 enter now then you can see that we get
- 00:24:08 all the data
- 00:24:09 from the year 2012 after year 2017
- 00:24:13 for our column right there and of course
- 00:24:16 we could also include
- 00:24:17 the year for example right there to make
- 00:24:19 sure we can actually see
- 00:24:21 which year we are referring to if you
- 00:24:23 would like to have
- 00:24:24 all columns to be included right here
- 00:24:26 then you can simply also
- 00:24:28 do it like this delete that and add the
- 00:24:30 colon right there if you do that
- 00:24:32 you can see that we added all the
- 00:24:34 columns and the selected rows
- 00:24:36 as we defined it right here you could
- 00:24:38 also do it like this
- 00:24:39 this would also work but it's not the
- 00:24:41 best practice code from a python
- 00:24:43 perspective
- 00:24:44 because you should always be as precise
- 00:24:46 as clear as possible
- 00:24:48 about what we want to achieve with our
- 00:24:49 code so definitely make sure to add the
- 00:24:51 colon right here
- 00:24:52 same result but better code in the end
- 00:24:55 so this is how we can
- 00:24:56 basically tailor the columns
- 00:25:00 and the rows that we want to display
- 00:25:01 right here i also want to show you
- 00:25:03 another functionality this is not the
- 00:25:05 lock argument which
- 00:25:07 just to keep that in mind refers to the
- 00:25:09 label names
- 00:25:10 so this one right here and the label
- 00:25:13 names right here
- 00:25:14 our index that we have for the different
- 00:25:16 rows but we also have
- 00:25:18 i look like this i log simply refers to
- 00:25:22 the
- 00:25:22 integers you could say of these
- 00:25:25 different
- 00:25:25 rows and columns now what do i mean by
- 00:25:27 that
- 00:25:29 well let's simply add ilok and let's
- 00:25:31 simply press shift and enter
- 00:25:33 and as you can see the way our table is
- 00:25:36 displayed
- 00:25:37 changed a bit for the columns it remains
- 00:25:40 the same because we want to display
- 00:25:42 all of our columns but for the rows we
- 00:25:44 still have the year 2012
- 00:25:46 included right here but we don't have
- 00:25:49 the number 7 you could say so the year
- 00:25:51 2017 included here anymore
- 00:25:53 now the reason for that is that lock
- 00:25:56 referred to the label name
- 00:25:58 and we wanted to have the label name
- 00:26:00 included here so it was there so we have
- 00:26:02 to put it also into our output
- 00:26:04 now for ilock the behavior is different
- 00:26:08 ilock excludes the last value right here
- 00:26:11 so we basically
- 00:26:12 include the label number two but we
- 00:26:14 exclude the last one
- 00:26:15 that's an important thing that you have
- 00:26:16 to keep in mind this is simply
- 00:26:18 how the two different arguments behave
- 00:26:21 what you can also do here is
- 00:26:23 you can add something like revenue for
- 00:26:25 example and then
- 00:26:26 access the revenue that's not working
- 00:26:29 why is it not working because as i said
- 00:26:32 i log
- 00:26:32 refers to the integers so you have to
- 00:26:35 write numbers here
- 00:26:36 the index numbers to be able to access
- 00:26:38 the different data
- 00:26:40 now what does this mean for the revenue
- 00:26:42 well the revenue right here
- 00:26:44 has the index one why does it have the
- 00:26:46 index one
- 00:26:47 because the index starts at zero in the
- 00:26:50 first column so with year
- 00:26:51 and then it continues with one in the
- 00:26:53 second third is then two
- 00:26:55 fourth is three and so on so if i enter
- 00:26:57 right here
- 00:26:58 one now and press shift enter you can
- 00:27:01 see
- 00:27:01 that we have the revenue column right
- 00:27:03 there and well we know it from before
- 00:27:05 the specific rows
- 00:27:07 as defined by our index numbers we can
- 00:27:10 also add
- 00:27:11 a colon and maybe three so this would be
- 00:27:15 zero
- 00:27:16 one two three but if we hit shift and
- 00:27:18 enter
- 00:27:19 well you can see that the profit column
- 00:27:21 is not included
- 00:27:22 again because of this behavior that i
- 00:27:24 log does not
- 00:27:26 include the last index number right here
- 00:27:29 in the result as i said just something
- 00:27:31 that you have to keep in mind
- 00:27:32 so let's maybe add the year once again i
- 00:27:35 think it's the better way to displace
- 00:27:37 data
- 00:27:38 so let's say that this is our final
- 00:27:40 table now
- 00:27:41 which which includes all the data that
- 00:27:43 we want to display
- 00:27:44 so for that let's now set content equal
- 00:27:48 to this content that we have right here
- 00:27:50 so if we now press
- 00:27:51 shift and enter and now say content
- 00:27:56 head once again and enter eight you can
- 00:27:59 see that we don't display
- 00:28:00 eight rows as in the beginning but only
- 00:28:03 the five rows
- 00:28:04 and the three columns that we now
- 00:28:06 defined for our content right here
- 00:28:08 with that we can now continue with two
- 00:28:11 last steps that i would like to show you
- 00:28:13 in this video
- 00:28:14 one thing is the describe argument
- 00:28:16 because it allows you to get a really
- 00:28:18 big
- 00:28:18 or quick idea of some statistics related
- 00:28:22 to your data
- 00:28:23 so simply type content dot describe
- 00:28:26 right here
- 00:28:27 and the curry and the brackets and if
- 00:28:29 you now hit shift and enter
- 00:28:31 you can see some really basic statistics
- 00:28:34 about your data
- 00:28:35 you can see that we have five different
- 00:28:37 years so five different revenue data
- 00:28:39 and cost data you can see the mean you
- 00:28:42 can see the standard deviation
- 00:28:44 and you can see a minimum and a maximum
- 00:28:46 value for example
- 00:28:47 so i won't dive too deep into that i
- 00:28:49 think it's quite straightforward
- 00:28:50 i just wanted to make sure that you are
- 00:28:52 aware of this functionality
- 00:28:53 also included in pandas by the way so
- 00:28:56 this is
- 00:28:57 one thing the describe argument right
- 00:28:59 here now let me conclude this video
- 00:29:01 with plotting a quick chart i said that
- 00:29:04 we will need the
- 00:29:05 matplotlib library and that we installed
- 00:29:07 it already
- 00:29:08 as part of the anaconda navigator so
- 00:29:11 before we create that quick chart let's
- 00:29:13 go to the matplotlib page and see what
- 00:29:16 matplotlib
- 00:29:17 actually is so here we are on
- 00:29:19 matplotlib.org
- 00:29:21 and well as you can read right here
- 00:29:23 matplotlib
- 00:29:24 basically allows us to quickly create
- 00:29:27 nice
- 00:29:27 and tailored so individually tailored
- 00:29:30 charts
- 00:29:31 for our python code and for our data
- 00:29:33 therefore
- 00:29:34 now i won't dive into this entire
- 00:29:36 documentation in this video
- 00:29:38 a nice thing to see though is this
- 00:29:41 examples page right here
- 00:29:42 because if you go to the stack bar graph
- 00:29:44 for example you can see a chart type and
- 00:29:46 what it could look like in python
- 00:29:48 and you can also see that we have some
- 00:29:51 example code down there
- 00:29:52 so feel free to play around with this
- 00:29:53 code and create your own charts
- 00:29:56 and as you can see also by this import
- 00:29:58 we will use matplotlib well specifically
- 00:30:01 in connection with pi plot which you can
- 00:30:03 find right here
- 00:30:04 which basically provides a matlab-like
- 00:30:06 plotting framework for our python code
- 00:30:09 we won't dive deeper into this right now
- 00:30:11 the important thing is that we can
- 00:30:12 simply
- 00:30:13 take this command right here and create
- 00:30:16 our chart right there a quick
- 00:30:17 line chart by the way so back in our
- 00:30:20 jupyter notebook right here
- 00:30:22 we can now import matplotlib
- 00:30:26 dot pi plot as just saw it as plt
- 00:30:29 and important to be able to plot the
- 00:30:32 chart here
- 00:30:33 in line in our jupyter notebook we have
- 00:30:35 to enter
- 00:30:36 percentage matte
- 00:30:39 plot lip oops like that
- 00:30:42 in line like this so if we now press
- 00:30:45 shift and enter
- 00:30:46 we should be ready to go to put
- 00:30:50 these data into our chart let me maybe
- 00:30:53 print the head once again to make sure
- 00:30:56 we can see
- 00:30:57 which data that we actually want to
- 00:30:59 print here
- 00:31:00 now how can we do this now well we can
- 00:31:02 refer to our
- 00:31:04 well pi plot right here as plt
- 00:31:07 so let's type plt dot
- 00:31:10 plot that's the first thing and now we
- 00:31:13 have to specify
- 00:31:15 what exactly so what data that we want
- 00:31:17 to display in this chart
- 00:31:19 well in our case i would say that the
- 00:31:21 x-axis could be the year
- 00:31:22 so content dot year
- 00:31:26 and for our y-axis i would say that we
- 00:31:29 basically just want to display our
- 00:31:31 revenue so let's say
- 00:31:32 content dot revenue like that so that's
- 00:31:35 the data that we want to display right
- 00:31:37 here
- 00:31:37 what we then need are some labels so
- 00:31:40 let's type plt
- 00:31:42 xlabel once and let's give it a name of
- 00:31:45 well
- 00:31:46 year and then we need a y
- 00:31:49 label also which should have a name of
- 00:31:52 well
- 00:31:52 revenue and now we finally want to make
- 00:31:55 sure
- 00:31:55 that we can display this chart so let's
- 00:31:58 simply type plt.show like this
- 00:32:00 and hit shift and enter and with that we
- 00:32:03 can see
- 00:32:04 that we now created our first chart we
- 00:32:06 can see the name for the y-axis for the
- 00:32:08 x-axis
- 00:32:09 we can see that we have some issues with
- 00:32:11 our dates though but that is something
- 00:32:13 we won't dive 2d into now in this video
- 00:32:16 and we can also see that we have our
- 00:32:18 revenues displayed right here
- 00:32:19 now that's for sure not the most
- 00:32:21 beautiful chart in the end but still we
- 00:32:23 made it
- 00:32:23 from python installation up to
- 00:32:26 connecting or
- 00:32:27 reading data from a csv file we even
- 00:32:30 renamed column labels
- 00:32:32 we selected or accessed specific columns
- 00:32:35 and rows
- 00:32:36 and we finally created a well not too
- 00:32:38 beautiful
- 00:32:39 chart in here so this is it with this
- 00:32:41 getting started video
- 00:32:42 as you can imagine we are just at the
- 00:32:45 very very basics of python
- 00:32:47 and its data analysis capabilities but
- 00:32:50 still
- 00:32:50 these are the first steps that should
- 00:32:52 help you to get a grip onto python
- 00:32:55 in relation to data analysis so i hope
- 00:32:57 that you like this video and that this
- 00:32:59 was helpful for you
- 00:33:00 and of course i hope to see you in one
- 00:33:02 of the next videos maybe
- 00:33:04 also related to python and data analysis
- 00:33:06 so
- 00:33:07 thanks a lot for watching and see you in
- 00:33:09 one of these next videos
- 00:33:12 bye