- 00:00:00 welcome back in this video we will not
- 00:00:04 create a project in power of your
- 00:00:06 desktop we will work on big data this
- 00:00:08 time well big data in our case is
- 00:00:11 actually just a single CSV file with two
- 00:00:14 columns but each column has more than 77
- 00:00:17 million rows and by that we have a total
- 00:00:20 file size of 1.8 gigabytes which is
- 00:00:23 quite big actually so we will connect
- 00:00:25 part of your desktop to this source file
- 00:00:27 then aggregate the data so reduce the
- 00:00:30 roles basically and finally export that
- 00:00:33 data as a CSV file so create a new
- 00:00:36 source file actually now before we start
- 00:00:39 one quick note I'm aware that normally
- 00:00:43 you don't do this in power of your
- 00:00:44 desktop because there are meta tools
- 00:00:46 available on the market for purposes
- 00:00:48 like that nevertheless in case you only
- 00:00:51 can use probably a desktop and you have
- 00:00:54 to do things like that I just want to
- 00:00:56 show you that it is possible now having
- 00:00:58 said that let's start
- 00:01:03 so this is the new power bi desktop file
- 00:01:06 I created and they're already connected
- 00:01:08 to free different source files free
- 00:01:11 source files I talked about one right
- 00:01:13 yes but we have data for the New York
- 00:01:17 City yellow caps in here so in the first
- 00:01:19 query you can see the data for the
- 00:01:22 second half of 2014 so we have each day
- 00:01:25 of the second half we have to sum up the
- 00:01:28 rights of that day and we have to sum of
- 00:01:30 the fare paid at that day so we have the
- 00:01:33 same thing for the first half of 2015
- 00:01:36 but for the rights only as you can see
- 00:01:39 right here so this is the second query
- 00:01:42 that we have another third Kure this is
- 00:01:45 the one I talked about in the beginning
- 00:01:46 and this is that big source file and as
- 00:01:49 you can see we don't have daily data in
- 00:01:51 here well we have daily data right but
- 00:01:54 data for each day and each single right
- 00:01:58 that took place so we have as I said
- 00:02:00 more than 77 million rows in here and
- 00:02:04 what we want to do is we want to
- 00:02:06 aggregate that information to daily data
- 00:02:09 only and to do this we need a function
- 00:02:12 in power bi desktop but before we start
- 00:02:15 let's first format that first column
- 00:02:18 differently because we only need the
- 00:02:20 actual date and not the time so to
- 00:02:23 change this we simply go to the
- 00:02:25 transform ribbon right here now go to
- 00:02:28 data type right here and select date now
- 00:02:32 its power bi already changed the format
- 00:02:35 when we connected it to the source file
- 00:02:37 we shouldn't press replace current right
- 00:02:39 here but add new step so that new
- 00:02:41 formatting right now if we do this we
- 00:02:44 can see that we now only have the dates
- 00:02:47 in that first column but why was that
- 00:02:49 important well as I said I want to
- 00:02:53 aggregate the data and to do this we
- 00:02:56 again go to the transformer band right
- 00:02:58 here and now select the group by
- 00:03:00 function right there and this now allows
- 00:03:04 us to group and by that to aggregate the
- 00:03:07 data depending on a column that we
- 00:03:10 select in our case we simply select the
- 00:03:13 first column so the pickup date time
- 00:03:15 column right here where we know if that
- 00:03:17 daily data actually and now each time
- 00:03:20 power bi finds an equal date in that
- 00:03:23 column it would simply aggregate that
- 00:03:25 now this is good but we shouldn't forget
- 00:03:28 about the second column that fair amount
- 00:03:30 right here because this should also be
- 00:03:33 aggregated to each single day but we
- 00:03:36 have to specify how exactly this
- 00:03:38 aggregation or that grouping should be
- 00:03:41 done to do this we simply can go to that
- 00:03:44 part right here first define a new
- 00:03:47 column name which would be fair in our
- 00:03:49 case like this and now the operation
- 00:03:52 shouldn't be counter OHS it should be
- 00:03:54 the sum so that we retrieved the sum of
- 00:03:58 the total fare paid at the corresponding
- 00:04:00 day so we select sum right here and now
- 00:04:04 we have to change the column from the
- 00:04:05 daytime to the fair amount of course and
- 00:04:09 this is it if we now press ok this is
- 00:04:13 going to take a while of course because
- 00:04:15 power bi will now scan through the
- 00:04:17 entire source file with that one point 8
- 00:04:20 gigabytes so we'll see each other as
- 00:04:23 soon as it's finished and this is it
- 00:04:26 well to be honest this took quite a
- 00:04:28 while but the great thing is that we
- 00:04:31 know if the data the way we want it to
- 00:04:33 be so we now have daily data in that
- 00:04:36 first column you can see that and in the
- 00:04:38 second column we have two some of the
- 00:04:40 fares of the corresponding days so this
- 00:04:43 is really good but there is one thing
- 00:04:45 that is not so good right here power bi
- 00:04:48 is still connected to that initial
- 00:04:50 source file so that 1.8 gigabyte file so
- 00:04:53 whenever it scans through the tile again
- 00:04:55 it will take a long time so we should
- 00:04:59 avoid that and to do this we will now
- 00:05:02 simply load this data we have right here
- 00:05:04 so this queries with that clean tables
- 00:05:06 into our actual data model and then
- 00:05:10 export the data out of the data model
- 00:05:12 into a new CSV file which will then
- 00:05:15 become our new source file now let's
- 00:05:18 load the data into the model first by
- 00:05:20 going to the home ribbon right here and
- 00:05:23 pressing close and apply like this
- 00:05:28 again this will take a few minutes so
- 00:05:31 see you again when this is done and now
- 00:05:36 this is done and the great thing now is
- 00:05:38 that if we change to the data view right
- 00:05:40 here and open our 2015 fares file right
- 00:05:45 there then you can see that the format
- 00:05:48 is no good of course but this is
- 00:05:49 something we don't have to take care
- 00:05:51 about right now but you can see that we
- 00:05:53 only have that 181 rows in here in the
- 00:05:57 data model as you can see right there so
- 00:06:00 what we can do right now is we can
- 00:06:02 export the data if the problem is that
- 00:06:05 we cannot do it with part of your
- 00:06:07 desktop the good thing is there is a
- 00:06:09 separate tool to do this let's take a
- 00:06:12 look at that tool and this tool is
- 00:06:15 called back studio you can find it right
- 00:06:17 here and deck studio simply will allow
- 00:06:20 us to query data out of power bi so out
- 00:06:24 of the data model that we just saw and
- 00:06:26 this means we can retrieve the data from
- 00:06:29 the table that we just created so if we
- 00:06:32 now press download right here and then
- 00:06:35 open back studio right there then we can
- 00:06:39 see that we have that connect window
- 00:06:41 that opens in that connect window we can
- 00:06:44 simply select the second option right
- 00:06:46 here and here we can see that we now
- 00:06:49 have our power bi file so this group
- 00:06:51 file let's select that and press connect
- 00:06:54 and now you can already see right here
- 00:06:58 that we have our free tables in there
- 00:07:00 and these are the data of the actual
- 00:07:03 data model not of the queries or of the
- 00:07:05 initial source file that's important to
- 00:07:07 know if we want to get that data now we
- 00:07:11 simply go to this window right evaluate
- 00:07:14 like this so we can select it right here
- 00:07:16 with the arrow keys on a keyboard press
- 00:07:19 tab now space well and now simply type
- 00:07:23 taxi and we can already see that we can
- 00:07:27 now select our fare data right here and
- 00:07:29 press tab again and if we now press f5
- 00:07:33 on the keyboard like this then it will
- 00:07:36 query our data and you can see again
- 00:07:39 that it Kure the data of the data model
- 00:07:42 as I just said now we have the data in
- 00:07:46 here but we want to export it as a CSV
- 00:07:49 to do this we simply click output over
- 00:07:52 here and select file like that now we
- 00:07:58 need to press f5 again to run the query
- 00:08:00 like this and as you can see we can now
- 00:08:04 define a name for that new file let's
- 00:08:06 maybe call it 2015 new just to know that
- 00:08:10 this is the new CSV file and now change
- 00:08:12 the text format right here to the CSV
- 00:08:15 format right there like this and here
- 00:08:18 you can see our other input files by the
- 00:08:19 way so we are in the right folder and
- 00:08:21 now we simply press save like this and
- 00:08:26 we now save the file so if we now go
- 00:08:29 back to power bi desktop so here I am in
- 00:08:32 the data model and open the query editor
- 00:08:34 like this and now go to our data sources
- 00:08:39 right here select as you see file so
- 00:08:43 here I am now again in that source
- 00:08:45 folder and you can see our 2015 new file
- 00:08:48 that we just created so let's select
- 00:08:51 that and press open and this is fine so
- 00:08:55 we can press ok right here and as you
- 00:08:58 can see if we now simply change the
- 00:09:01 format again of the first column by
- 00:09:03 going through transform ribbon go to
- 00:09:04 data type and change it to date and add
- 00:09:07 the new step then we see that we again
- 00:09:11 have our 181 rows but now with that
- 00:09:14 connections to that cleaned and small
- 00:09:17 source file and this will make the work
- 00:09:19 in our Creator a lot easier right now
- 00:09:21 and this is it actually this is what I
- 00:09:24 wanted to show you and as I said in the
- 00:09:27 beginning power bi desktop is probably
- 00:09:30 not the best tool to work with such huge
- 00:09:32 source data however depending on the
- 00:09:35 performance of the computer you can do
- 00:09:37 things like that if you want to do it
- 00:09:39 and in combination with tech studio you
- 00:09:42 can also easily extract that new
- 00:09:44 information and by that create cleaned
- 00:09:47 and smaller source files so this is it
- 00:09:50 thanks for watching and see you in the
- 00:09:52 next videos bye bye
- 00:09:57 maybe some documentaries which would be
- 00:10:00 some something for us like furs in the
- 00:10:02 nature yes jong-il feeling yeah