字幕列表 影片播放 列印英文字幕 Hello and Welcome to using MySQL to build Big Data applications to build Big Data applications This is going to be a tutorial about obviously, using MySQL obviously, using MySQL to build Big Data applications, but when I mean Big Data when I mean Big Data there could be two things, it could be.. there could be two things, it could be.. Sorry, there could be two problems that you are addressing. Either it's an problem of scaling as in, my system already has a lot of data and I.. as in, my system already has a lot of data and I.. I would like to be able to I would like to be able to make the existing features more performant or be allowed to get more volume be allowed to get more volume and the other problem is reporting and the other problem is reporting in the sense that you already have Big Data in the sense that you already have Big Data in the sense that you already have Big Data and you are asked to make use of that in some way and you are asked to make use of that in some way and you are asked to make use of that in some way to either give more insight to either give more insight to the business users in your organization or give aggregated reports to your customers about how they are performing to your customers about how they are performing and I'm going to focus today, on this side (reporting) and I'm going to focus today, on this side (reporting) and I'm going to focus today, on this side (reporting) of the Big Data the problem. So what is the problem with with the Big Data? So what is the problem with with the Big Data? Basically, it's as if you have a very large table Basically, it's as if you have a very large table Basically, it's as if you have a very large table with millions or billions of rows with millions or billions of rows and in order to do the reporting that you need to do and in order to do the reporting that you need to do you need to gather all this information from this table and process it in some way you need to gather all this information from this table and process it in some way However, what does that mean in terms of the underlying physics of it. However, what does that mean in terms of the underlying physics of it. However, what does that mean in terms of the underlying physics of it. You have a hard disk You have a hard disk (let's pretend that's a hard disk) and in order to get the certain rows from the table on the hard disk and in order to get the certain rows from the table on the hard disk you have to go over many different places in the hard disk So, if it is a large amount of data (that) would obviously be more time consuming. If the data is fragmented across different places on the hard disk that would mean you have to spin more. If the data is fragmented across different places on the hard disk that would mean you have to spin more. If the data is fragmented across different places on the hard disk that would mean you have to spin more. and once you have that you need to get that data into the CPU (roughly) you need to get that data into the CPU (roughly) you need to get that data into the CPU (roughly) to aggregate that (data) to aggregate that (data) To process it (the data). To manipulate it into whatever way you need it to (be) and then you produce a report which you later provide to your users which you later provide to your users provide to your users and they are happy about it (I'm not sure if you can see that) So, this problem has actually been going on for a very long time So, this problem has actually been going on for a very long time How are we able to, with existing hardware technologies, How are we able to, with existing hardware technologies, get more data faster to be able to process it and turn it into a report get more data faster to be able to process it and turn it into a report Many years ago, a person called Ralph Kimbal who is the main or one of the two main contributors to the data warehousing who is the main or one of the two main contributors to the data warehousing who is the main or one of the two main contributors to the data warehousing who is the main or one of the two main contributors to the data warehousing he came up with.. data warehousing.. I wouldn't say movement, but technology he came up with.. data warehousing.. I wouldn't say movement, but technology he came up with.. data warehousing.. I wouldn't say movement, but technology came up with the idea in 1995 or 1996 came up with the idea in 1995 or 1996 where he said basically, no matter what the technology is where he said basically, no matter what the technology is is we'll always have to go through a large number of rows is we'll always have to go through a large number of rows so how can we design our database (in a way) that we are able to produce reports without (in a way) that we are able to produce reports without (in a way) that we are able to produce reports without (in a way) that we are able to produce reports without very resource intensive operations and what he thought was his solution to this program was basically to create something called a summary table his solution to this program was basically to create something called a summary table and a summary table is an aggregated version of this table obviously, smaller and with less rows that data is already been taken from here (the large table) and summarized here (the small table). So when you access this summary table it's obviously much easier to get the rows and much easier to give back the results it's obviously much easier to get the rows and much easier to give back the results it's obviously much easier to get the rows and much easier to give back the results So let me give some examples about what what that would look like so let's say, you have so let's say, you have a table and it has orders like a basic e-commerce site and you have usually a hundred thousand rows usually a hundred thousand rows per day so it's not really a not really an issue for any relational database. You store those rows You store those rows with the database. That's fine. But your period of time, lets say a year But your period of time, lets say a year you have quite a large number of rows So you start to have 36.5 million rows and that could get cumbersome and in some cases it could be much more than 100,000 rows, but lets stick to this example So you want to create a report from the orders table and you want to know create a report from the orders table and you want to know The business users in your organization want to know how certain products doing across particular dates The business users in your organization want to know how certain products doing across particular dates The business users in your organization want to know how certain products doing across particular dates What you could you do (is), you could create a summary table What you could you do (is), you could create a summary table like this and For the sake of clarity, I'll write a SELECT statement here that will explain the contents of the summary table. So you have For the sake of clarity, I'll write a SELECT statement here that will explain the contents of the summary table. So you have select So lets say we need date, because that was what was requested So lets say we need date, because that was what was requested and any product_id and we want to get the aggregated details of revenue and we want to get the aggregated details of revenue and we want to get the aggregated details of revenue and then we GROUP BY it date Basically the two keys (columns) date and product_id This is now the new summary table and we can call it product revenue summary product revenue summary product revenue summary and this had to say we have . hundred products, so this will have hundred rows a day So obviously, you could after generating this table You could provide this table to your business users and say "Do whatever you need. Find out whatever information you want to gather." so lets say for example, If someone were to query for product 13A If someone were to query for product 13A If someone were to query for product 13A and how it did (performed) on weekends and how it did (performed) on weekends so perhaps you know you would find the table so perhaps you know you would find the table for weekends or dates Get only weekends and perhaps INNER JOIN it with that (summary) table Get only weekends and perhaps INNER JOIN it with that (summary) table and you'll get their answer very quickly and you'll get their answer very quickly and you'll get their answer very quickly and your users will be happy because of it and your users will be happy because of it A different sample or a different summary table could be for people who are interested to know how the product is selling across a particular geography and in this case, lets say city so what we would need to do for that it's a city_id isn't recorded in the orders table we would need to enrich we would need to enrich the table a little bit and the way we do that is we we INNER JOIN it with the addresses table and what we would do is we would, basically.. I'll just write it here what we would do is we would, basically.. I'll just write it here you would do SELECT let's do let's do o for orders, o.date and city and sum(o.revenue) FROM orders o INNER JOIN addresses a INNER JOIN addresses a on on (actually) using address_id address_id GROUP BY date and city and we will fill up a new summary table and we will fill up a new summary table called called city revenue summary so here we have two summary tables Two different ways of slicing the data. Now you aren't exactly limited by the number of summary tables you can have you aren't exactly limited by the number of summary tables you can have obviously, they take a certain amount of space and obviously, they take a certain amount of space and they also take some effort into creating (them), but we'll get into that soon they also take some effort into creating (them), but we'll get into that soon was you could have done for example here is that you could have added city to to product so you have product you have here date, product_id and city_id make it a larger summary table, but you can get the data in two different ways or perhaps you can then have a more extensive more extensive summary table with a higher level of granularity more extensive summary table with a higher level of granularity You could search for product and city and date that could be a user requirement. It depends. if you're interested in getting to the data in one way You are only interested in slicing the data in this way or slicing the data in this way (second summary table) You are only interested in slicing the data in this way or slicing the data in this way (second summary table) currently you have two summary tables and this particular summary table has saved you an INNER JOIN that could be quite valuable in terms of performance, saving you an INNER JOIN So, those are the two examples i'd just like to quickly give another example of what happens nowadays in some other companies of what happens nowadays in some other companies some social networks Already kind of use the idea of summary tables in their systems lets say they have lots of servers it's spread geographically: this is Europe This is North America. This is South America and this is Asia and this is Asia and in order for them to get reports that they are interested in what they would do is they would get data what they would do is they would get data From all the servers into lets say a map/reduce system in this case lets say hadoop, for example and remember, we don't need the exact and remember, we don't need the exact once it arrives here, we don't need the exact data from them. We need the aggregated data to goto once it arrives here, we don't need the exact data from them. We need the aggregated data to goto once it arrives here, we don't need the exact data from them. We need the aggregated data to goto to another database or another summary table to another database or another summary table and once the data from here is aggregated it goes into a reporting database depending on their needs this can be mysql database depending on their needs this can be mysql database if their needs are greater, it could be if their needs are greater, it could be if their needs are greater, it could be any number of commercial or open source solutions which can handle larger amounts of data But the theory is very similar to the example of summary tables there was that you get data from from someplace you you summer you advocated in the clinton reporting databases manual use those you know years those uh... access to state the base and korea according hollers as they see cannot from becky affirmative also creates ripples on your own but he's a study group or to chat uh... though you can't change of course you can change according to the report is is as it is whereas here uh... if they want to change uh... the query okay different information once today discover there's something wrong with that they prevail and it's not slacking more information according to what they found that they can uh... graders that was whereas here it's static um... so as well a lot of things you're adding basically have the leading ill to yield a delay system a subsystem uh... or something and you need to know basically creator he needs to uh... make sure that dates are rising too data constantly rising to the summary tables and needs more thought everything goals according as well soul uh... the joys residue of the owners animal duets with pass that you can do is the reason the some is that the summertime blues yeah uh... one is real time and one is attached university either on angel home overlooking once an hour or job it's that time geranium for example when no one's yours it's over and uh... it's very securities the pros lettuce for example it sorry that seems a little ones once an hour soldier getting older all the all those that happened over the last hour and breaking it i'm putting it into the summary tables insect time same principle but for all the day's worth of episodes it's becoming a bit uh... large amount of in these cases it's important to note that the summit that was our only refreshed you know once in a row once a day so it's a business decision that's okay that's and you can go out and do that uh... mcconnell's martin duckworth and forestry with uh... so it would be with not saw this with his uh... and the system you need to set up monitoring because you can't just take for granted that you know that has a right you can just create something like the select statement that i created put in the crimes of and and all that nothing goes around you have to make sure that that at everything is there's no warning messages no analysts is that the didgeridoo as it should um... regarding my sql to set something like this out you can basically is a select statement that i i did you know what the celts and uh... well close depending on your on norinko an insult into three-fifths if it's owns themselves for example you have uh... once all alert reporting so lands uh... you have a date the coming through petition which is the the riddles atm so you can set up magistrates mountain for example this is you're going to lose and here is your role mainstream lazy reporting seem annual to st louis yeah and have for example you have the to some of them and you would do use the insult statement here and something to say pencils uh... take it from here things that the clinton these two well so or that openssl insert it into tabled that the other uh... the other way of doing this is perhaps a bit bids moral slightly more confusing but sometimes it's a requirement by something databases that you do select uh... unsolved problem some kind of fun and then some kind of uh... into our on the twenty on august the loads data info on command mysql and this is sunday just helps with rick occasional absolute soaking requirement so that was one of the please do whatever is more convenient for you i would say it i would advocate here for example e wearing them as a group of i would also bergmann ending a little boy now because my skill by default would have the uh... grew brian also w invisible although by columns that you chose and that means you have to do in addition fossil thing if you don't need that they don't require it uh... estimate total you can handle it by now and a commandment uh... you can so you can not frank usual unique he's on on these and you can replacing terror that smoking principally no ring true uh... you can aren't story differs empl murdered a job between going to love the data uh... uh... it's only need a tony date arrives and that's fine but if for example uh... there's a chance of although they could being updated then you have to perhaps include more than when i walk for beta in into your interval you have to maybe diesel too little as i said replace interested in going to ordinator that may be updated you have to either recalled which totals of days ago you can just bulk say the last three six seven days uh... if any that there was a bit in that period of time please update countries object sometimes with a vaca submit something to look out for with uh... what do you can do is best for do uh... solar duke however didn't go that their cargoes quite good recently and this is still the same back still blanche i would say though that the relation that was regarding you know the crew buys this congregation tradition there is still very very strong uh... you would look at uh... wanted to do even though it's questions to just one beta versions not paralyzed very well uh... its if it takes a very long time you may want to a group in some way uh... just the differences here that this is one senator can be you know five six seven eight cells uh... judge richard data so he would you would get it heating digital mysql database interview with you cluster and i go back and kids or you can have back employees have hurt specific reporting data regarding real time interest in this basically means triggers powell purification and understands dirt itself so daggers once you update uh... regulator base when your insults and also that the riddle database uh... yesterday the summary tables as well in the same in the same uh... instance so or is it is a bit of an overhead uh... if you have a high note in the day the races in general and uh... uh... and you can take this additional rental grand you may want to consider branch all you know i consider all i don't want to use it requirement has to be real time and i i think you're in for social networks this is going to be like every requirement um... so you have basically when you have been so sick man here uh... you would if it's tickets for example you have uh... some integration then you would so you know for this particular line adv did them on document the distance this one role into this table and that is attitude if it's been updated change it if it's the lead to remove it there is also in a lot of other functions functions average main maxim that can be more complicated uh... there is an envelope arrived mobile from other websites speaks about that how to get used to go out to right uh... transformed if alum it's not an issue to add these two girls though it is an issue tablets relief and it's a bit more convenient because mona you don't need to monitor it so much index uh... if you insult if you answer that and something goes wrong and i wanted to make sense of it so were friends you fix it done and uh... if you're using a tribute to log something and then delete legend and asynchronous and slugs accurate time then you may need to use the monitor but uh... trying to confuse you uh... we've done basically trios and you can't find him halted set that up uh... and was being so just as a summarize even as the gorgeous progress if you want to speed up you'll your reports it is a good idea to have some examples uh... and you can use that reports of those it's very common in places a couple of days now and uh... i'm showing you can help you it does take a bit of addition design and i haven't spoken about it looks a bit of an example where we've removed animal join with pcp ambitions on something you have to look at it more and as a group of seven given too much code example soul uh... this is really interesting obviously understand what some of the rules are um... exit thank you for watching my sister uh... if you want to contacting uh... this reminder to us anderson in mind website slash blog we can find out more information to fit in the past thank you very much
A2 初級 使用MySQL構建大數據應用 (Using MySQL to Build Big Data Applications) 167 20 Chris Lyu 發佈於 2021 年 01 月 14 日 更多分享 分享 收藏 回報 影片單字