Placeholder Image

字幕列表 影片播放

  • >> Okay. Hi, guys. So this is Jay Pipes. He's a MySQL employee. He's in charge of North

  • American Community Relations. I met him this week at the MySQL User Conference which is

  • down in Sta. Clara. Now, the deal is that we're going to host this on Google video.

  • So if you have confidential questions, if you could leave them till the cameras are

  • turned off, that would be cool. With that, no more ado, Jay Pipes.

  • >> PIPES: All right. Nice to meet all of you. I'm going to be giving a talk that I gave

  • on, I think, Tuesday or Wednesday, on "Performance Tuning Best Practices for MySQL." Some of

  • the things are version specific but for the most part, it's pretty general. We're going

  • to be covering Profiling and Benchmarking Core Concepts, Sources of Problems that you

  • might encounter, Index Guidelines, Schema Guidelines, Coding Techniques, and lastly,

  • we'll go over some general things about tuning server parameters. Since the audience is,

  • I think, sort of mixed levels of experience, we'll just shout out questions or tell me

  • if you want some grand stuff or so. All right, Benchmarking Concepts. The general idea behind

  • benchmarking MySQL is to, obviously, have a track record. You know, when you change

  • things in your code, in your application, or the configuration files with MySQL, or

  • your schema design, or you're indexing, you want to make sure you have a track record

  • of what you're doing. Because a lot of times, if you--if you don't keep a track record and

  • don't do benchmarking, you won't really know whether a specific change that you made had

  • a detrimental impact or whether it actually increased performance. So, the baseline--and

  • you always want to have your baseline for your starting point. And always give yourself

  • a target. Don't just do it so it's a moving start--moving target, you always want to have

  • an end goal, you know. You either want to get, you know, 20% increase in performance.

  • So that's your goal. But just don't have it, you know, just kind of end with nothing because

  • then you'll just spend years and years and years trying to, you know, tune milliseconds

  • out of your--out of your application when there's really no benefit to it. So, change

  • only one thing at a time. A pretty basic concept but it really--it goes a long way in benchmarking.

  • If you change two things and you rerun a test, how do you know which one of those two things

  • actually affected the difference between the two tests? So, always change one thing at

  • a time and rerun the benchmarks. And record everything, configuration files, information

  • on the hardware, information on obviously the operating system, the version of MySQL,

  • and everything to do with the benchmarks. Keep it in a separate folder and keep everything

  • for historical record. A lot of times, you'll think that you won't need information when

  • you're running benchmarks. And then, six months later, you realize that, "Wow! It would be

  • really useful if I had--oh, I threw it away". So, always keep information from your benchmarks

  • so you have a historical record of what you did. And specifically with MySQL, you want

  • to disable the query cache whenever you're running benchmarks. The reason is it will

  • skew the results that you get from read-intensive queries and write-intensive queries as well.

  • So, disable the query cache by setting the query cache size equal to zero when you--when

  • you run benchmarks, so. ProfYes. Go ahead. >> [INDISTINCT] benchmarking for [INDISTINCT]

  • problem with the operating system, caches [INDISTINCT]?

  • >> PIPES: Right. Yes. The question was--what you were experiencing that the operating system

  • caching was skewing the results of the benchmark. Generally, the idea is either you can disable

  • caching if it's actually getting in the way or if you bump up the actual number of runs

  • in the benchmark tests you can minimize the skew effect. So that's usually what we recommend.

  • But, when you get into specific caches there's very specific things to do. So it kind of

  • depend on the exact scenario that you're doing, so. But, yes. Bumping up the number of runs

  • in the benchmarks generally tends to mitigate a lot of that, so. Does that answer your question?

  • Okay. Profiling Concepts, profiling is a little different than benchmarking. Generally, you're

  • profiling a production or running a test system that you're trying to figure out, you know,

  • diagnosing what's going on while it's running. So, instead of doing--you know finding the

  • stress or the load that the system can take, you're actually trying to pinpoint where the

  • bottlenecks are in the code while it's running. With MySQL, the best tool that you have is

  • the EXPLAIN command. So, regardless of what version of MySQL you are running, you really

  • want to get familiar with everything in the EXPLAIN command. Understand what the access

  • types, does everyone know what the EXPLAIN command is? Everyone use it? Okay. Does everyone

  • know what every single type of access type is? Okay. It's sort of a poorly named column

  • in the--in the EXPLAIN results. But, it actually describes the path or the optimization pattern

  • at which the optimizer's saying, "Okay well, I'm going to, you know, access, you know,

  • these results at and join it to these results at." And each of those things, the ref, the

  • refrenol, the equaf range, all the different access types mean different things. And the

  • performance that you get from each of them, you can go into MySQL manual, depending on

  • your version. You can see which ones have a better performance for different types of

  • queries. It's not always the same. So, you want to make sure you understand those. Understanding

  • obviously that if you see all in the access type, it means it's doing a full table scan.

  • You know just little things like that. Be aware of what they all mean and what the performance

  • impact of the different ones are. So, by using the Slow Query Log and mysqldumpslow to parse

  • the query log, very easy to set up, Log Slow Queries and then you give it a long query

  • time, anything above that, it will log to the--to the Slow Query Log. There's also--I

  • think in 4.1, we put it an option that you can--anything that--any table, any select

  • statement that does not use an index on a table can get logged to the Slow Query Log

  • regardless of how fast it runs. And that's--and that's useful for when you're doing an indexing

  • strategy and you're trying to determine okay my patterns have access, do I--do I have indexes

  • on those fields that I'm actually accessing? You know, and my where clause and ON clause

  • and that kind of thing. So, low hanging fruit law of diminishing returns tackles stuff that's

  • going to get you the absolute best return on investment for your time. Don't concentrate

  • on, you know, I know you guys are Google. So, a millisecond definitely does count. But,

  • if you got a--if you got a choice of, you know, going from five milliseconds to four

  • milliseconds, you're going from a minute to, you know, a second, it's a no-brainer. You

  • know, concentrate on the stuff that--that's going to give you the biggest impact. Using

  • Mytop. Jeremy Zawodny wrote a nice little tool. Most of you are probably familiar with

  • the top utility in UNIX. It's almost identical but it's just for MySQL. And it gives you

  • an ability to on a delay--on a repeating delay, you know, show you the process list of what's

  • going on in MySQL, and show, you know, show various show commands, the status variables

  • and gives you a nice little interface to profile into the system while it's running. And one

  • nice thing about Mytop that I really like is that you can--you can use it for remote

  • hosts as well. So, you can--you can run it on your local machine and access a remote

  • host. So that it is a nice little tool. Okay. Sources of Problems. This is kind of my personal

  • opinion of where I rank the law of diminishing returns, like what I actually like go for

  • immediately. The performance team tends to think that, you know, sometimes you can--you

  • can tune a server parameter and that's like the biggest bang for the least, you know,

  • amount of time. But I think, really, where performance problems come in are the--are

  • the definitely the top three here. Poor Indexing Choices will kill an application faster than

  • really anything else. If you're querying on a field and you don't have an index on it

  • and you're repeatedly hitting that, it's going to kill the application faster than, you know,

  • whether sort buffer size can be adjusted. So, same thing with the Inefficient or Bloated

  • Schema Design, you know, I talked in the--in the performance talk. I've seen a number of

  • cases where people designing new applications, well, actually de-normalized from the beginning

  • because they think it's like going to give them better performance. Don't do that. You

  • know, there's specific cases where you can de-normalize the schema and you can get some

  • performance benefits. But don't do it right away. You know, just because you think it's

  • going to get, you know, a performance benefit. Use good, efficient, normalize schemas. And

  • we'll talk--we'll go into detail on this later. Bad coding practices. MySQL has some inefficiency

  • problems with sub-queries. But in general, using joins is a lot more efficient than using

  • sub-queries. We're going to go show some examples of specific where you can really get a performance

  • increase by just simply changing the way that you--the way that you code SQL. Server Variables

  • Not Tuned Properly, that's a source of problems. But in my opinion, very specific situations

  • can be--can be helped by tuning server parameters. It's not some catch-all, you know, it totally

  • depends on, you know, how much memory you have in the machine, what the architecture

  • is, what operating system you're running on, what version of MySQL, what storage engines

  • you're using, your, you know, the type of the application that you have, whether it's

  • OLTP or heavy reads, you know, all these things. Yes. Tuning server parameters can help but

  • it's very specific to certain situations. And it also requires retesting and rerunning.

  • But, you know, you tweak one thing, you have to re, you know, run the benchmark test to

  • see if--see if you got the performance gain out it. It can be time consuming to tweak

  • server variables whereas it's pretty easy to, you know, change some seq--some SQL statements

  • and immediately rerun it and see if you got a huge performance gain from doing that. And

  • finally, Hardware and--and Network Bottlenecks. I/O, CPU bound, network latency. Right. There

  • is a good book that I brought along that I recommend for that and when we get to the

  • last slide, I'll show you that, so. All right, Indexing Guidelines. A poor or missing index,

  • definitely the fastest way to kill an application, like I said. Look for covering index opportunities.

  • We're going to go into some slides here to kind of explain what that is. But the covering

  • index is--especially for my--who's using MyISAM versus InnoDB? InnoDB, raise hands. MyISAM,

  • raise hands. Nobody is using MyISAM? For a single thing, okay.

  • >> [INDISTINCT] >> PIPES: Say again?

  • >> [INDISTINCT] >> PIPES: Got you, okay. Well, we'll explain

  • some things about covering index opportunities that are very specific to InnoDB. But covering

  • index is basically what it means is that the MySQL can get all the information from the

  • index records which are by definition slimmer than the data records because they don't contain

  • all the fields in the table and use that information to complete whatever the query was without

  • having to go into the data records. One of the key concepts to this, the slimmer that

  • your index records are, the more that you can fit into a single index block and the

  • more that you can fit into an index block, the fewer reads you're going to do, the faster

  • your application's going to be. So, covering index opportunities definitely look around

  • for those. When you're deciding on which fields you actually want to index, make sure you

  • take a look at what the selectivity of the--of the field is. Selectivity is the cardinality

  • or the number of unique values that are contained in that field divided by the total number

  • of values in a table. So, obviously a unique index would be a cardinality of one. Because

  • it's, you know, unique over the total count. That's obviously the best-case scenario but

  • you can get situations where you've got awful selectivity. If you're querying just on that

  • index or just on that field you may get awful performance from that because--or may just

  • decide not to use the index because it ceases to be useful. I think generally like, [INDISTINCT]

  • could talk a little bit more about this but I think that it numbers about 30%. If your

  • optimizer sees that, okay you do--you do a query select, you know, start from whatever,

  • where column A equals two. And it will take a look and see an estimate from what the information

  • that has on the--on the index, well this particular query returned a third--you know, around 30%

  • or more of the rows in the table. If it does it's actually much more efficient to just

  • sequentially scan the table than it is to go into the index and randomly seek and grab

  • a bunch of records and then do lookups back to the data records. So, be aware of--of the

  • uniqueness of the values in your index. For lower selective fields were you--were you

  • don't have very much selectivity you can sometimes tack them on, you know, to a multicolumn index

  • so that, especially we grouped by queries and we'll show an example of this you get

  • some efficiencies there. But just be aware of the selectivity, so. On multicolumn indexes,

  • like those kind of explained just now, the order of the fields is very important. You

  • have to look at the queries that you're running against the database. You know, if you have

  • the columns mixed up or they're not optimal, the optimizer just won't use the index. And

  • I'll show you a perfect example of something that happens all the time with this on group

  • by queries. And as the database grows you want to ensure that the distribution is good.

  • You know if you set up an application, writing new application and, you know, you go through

  • a couple of months and you got a couple of hundred thousand records in there. Okay, great.

  • Everything looks great. The index strategy that you designed for it is wonderful but

  • that doesn't necessarily mean that in nine months, you know, you've got millions of records

  • in there that the index strategy that you initially picked out is going to be optimal,

  • you know. Make sure that you understand how your data changes over time, you know. If

  • business practices can change. And so, if you've got values, let say's you got seven

  • values in a--in a field, right. And you've got an index on that, seven distinct values

  • that you're putting into like a status field. Okay. And over time 90% of the field values

  • contain one specific thing, that index ceases to be as useful as it was when you initially

  • designed the application because 90% of the--the rows contain the same value. So just be aware

  • that as your database grows and your application changes that you need to reexamine, you know,

  • what you initially thought was a good index strategy. Okay. Remove redundant indexes for

  • faster write performance. Obviously, every index that you put on a field you get a, you

  • know, slight performance impact on writes because it's got to write an index record

  • as well as the data record. Every time you do an insert and every time you update. If,

  • you know, update the key value, it's got to write to the index. So unfortunately, MySQL

  • allows you to create redundant indexes on the exact same things. So, I've seen cases

  • where you get 40 indexes on the table and, you know, 10 of them are redundant. You know,

  • because over time teams of people work on it and no one wants to change other people's

  • stuff, you know. So, they just add their own indexes and with different names and all that

  • but, they're redundant. They're on the same field. So, you know, as time goes by make

  • sure you check, remove redundant indexes to get some write performance. Okay. Common Index

  • Problem here. Kind of a web 2.0 example. You got a--you got a tag's table which got tag

  • ID and a text for the tag and a products table. Both of these tables have an auto-incrementing

  • primary key and then you've got a relationship table that stores the many to many relationship

  • between products and tags. And if you noticed that the primary key on these Products2Tags

  • table is product ID--tag ID. Makes sense? Pretty basic? Okay. Here's two examples of

  • pretty common queries run on this Products2Tags table. Only the top one's going to use the

  • index on it. And the reason is because of the order of the columns in the index. On

  • the top one, you're grouping by the--what's essentially the product ID. And so, it's going

  • to be able to take that left side of the index and use it in order to count the right side

  • value which is the tag ID. Now in the bottom you're reversing that. You're saying for each

  • distinct tag give me the--give me the count of products. You can't use the index. You

  • can't use that primary key index. So, you won't use it, you get no read performance

  • from it. The remedy would be to create an index that has tag on the left side. Okay.

  • Now, there's two examples here where it says create index, the--the top example, create

  • index, I X'd tag on just the tag ID for all you guys using InnoDB, that's exactly what

  • you'd want to do. From MyISAM, you'd want to do the bottom one to create a covering

  • index opportunity. Does anyone know why the top one should be used for InnoDB and not

  • the bottom one? What's the difference between the secondary indexes in InnoDB versus MyISAM?

  • >> Because it went back to the primary key. >> PIPES: Exactly. For it--it--because it's

  • a clustered index organization in InnoDB, every time you add an index on a field, that's

  • not the primary--every non-primary key index, the primary key is actually appended to every

  • single index record. So, if you've got a primary key product ID, tag ID and you add an index

  • on tag ID, product ID you're being redundant because you've already got the product ID

  • actually in the index record in InnoDB. So, it's important to understand what the difference

  • between the cluster organization and the MyISAM which is non-clustered where it's just a record

  • number that's attached to each index record instead of the clustering key. So, be aware

  • of those differences. So, I was talking about redundant indexes. That one on the bottom

  • would be redundant for InnoDB. Did the mic just change?

  • >> [INDISTINCT]. >> PIPES: Okay. Okay. Sorry. Schema Guidelines.

  • Inefficient schema, another great way to kill your application. Use the smallest data types

  • possible that you really need. If--if you don’t have four billion rows in the table,

  • don’t use big int. I mean, it’s just the basics, you know. Keep them as trim as possible,

  • you know, if you have maximum of a hundred values, don’t use int. Use tiny int, you

  • can feed, you know, 255 unique values in it. Don’t use char[200], you know, when a varchar

  • or you know, smaller char will do. Little things like this make a huge impact in performance.

  • And the reason is remember that every index record has got to contain, you know, that

  • width. So, if you've got an index on a char[40], every single records got 40 bytes in it. And

  • well, a little bit different because there's some index prefix compression that’s going

  • on but, you know, the--the general concept stays. If you have a index on first_name,

  • just index the first 10 characters of it or the first 15 characters of it. You fit much--many

  • more index records into a single key block and therefore you get fewer reads, faster

  • performance. When you've got many columns in a table, especially many nullable columns,

  • consider splitting the tables into two tables like a main and a detail table, something

  • like that. If you've got many nullable columns or columns that are rarely even accessed or

  • rarely searched on, put those in a separate table and join them on the one-to-one, you

  • know, relationship, okay? And the reason for that is you can pull many more of the main

  • records that you are actually searching on into memory. You get--especially, if you don’t

  • need the detail records of those fields, you can have a lot more opportunities to get more

  • performance from that. Consider vertically splitting many road tables using partitioning

  • or the MyISAM merge tables which--since you guys are using InnoDB, it doesn’t matter.

  • Partitioning is in 5.1, so, you guys are a little ways away from that. It's all right.

  • You can also do, you know, your own application level vertical, you know, partitioning where

  • you're naming different sets of tables and then using views on--the one person that was

  • using 5.0 you can use views in order to aggregate many tables into a single view. And always

  • remember that, you know, fewer reads has faster results. And this is--this is my--the limit

  • of my graphical ability right here. That's about it. It's just, you know, a little flow,

  • smaller data types placed in narrower index records. More records per block, fewer reads,

  • faster performance. It's just the general thing. How to get better performance from

  • your index, use schema. InnoDB, choose the smallest possible clustering key. Since like,

  • I just explained, it's appended to every single secondary index record. So, if you've got

  • a 12 byte primary key on your index in--on your InnoDB table, that's 12 bytes per index

  • record that are being appended every single time you insert in the table. So, be aware

  • of that. Don’t use surrogate keys when a naturally occurring one exists. We see this

  • all the time in the forums. And I often wonder what the use of it is and why people do it.

  • You've got a naturally occurring product ID, tag ID key. It's a naturally occurring primary

  • key. Yet a lot of people will just kind of put an auto-increment in there just for the

  • heck of it, you know, but you're never really going to be looking up by record ID. You will

  • always going to be looking up based on product ID or tag ID. So, some people call it a surrogate

  • key or a false key, get rid of them, they're just wasting space. Coding Guidelines, I tend

  • to call it chunky coding habits. Cutting everything down into the smallest possible units you

  • can. You know, thinking of SQL in terms of procedural programming is a bad idea. At least

  • to--well, correlated sub-queries and other things I hate. You want to--you want to be

  • an SQL programmer that thinks in sets. Don’t think in terms of for-loops, okay. Think in

  • terms of merging and intersecting sets of data. And you'll be able to cut it up into

  • chunks and you'll be much more efficient in the way when--in the way you code SQL. The

  • 5.0 guy, use store procedures. If you're not using store procedures, you're missing out

  • on a performance increase in 5.0. InnoDB, how many of you using SELECT COUNT(*) on InnoDB

  • tables? I know they saw you. Okay. Don’t. InnoDB cannot optimize SELECT COUNT(*) queries,

  • okay? So, your table and your schema will not scale well, okay? The more records you

  • get in there, it requires actually reading through and counting all the records. Whereas

  • with MyISAM, it can actually get the count from the index. With InnoDB, it cannot. It

  • actually has to count all the values. And the reason for that is the way that InnoDB

  • implements what's called multi-version concurrency control. And there's a version so that the

  • row level locking can occur in InnoDB. There's a version for each row. So there's complexities

  • involved with actually counting, you know, the number of values because well--which version

  • are you--are you comparing and so use counter tables. So if you're doing, you know, select

  • counts to our products, have a separate table that stores the number of products and when

  • you insert into the product table, increment it, when you delete, decrement it. Okay. That’s

  • how to scale InnoDB for performance. We've seen this by a number of big form applications

  • that, you know, that they always do, you know, count the number of threads and then count--some--how

  • many users are logged in. This is a big problem when, you know, if people use--switch from

  • MyISAM to InnoDB without looking at the code, just simply switch the engines, you know,

  • it was a--yeah. >> Well, also consider you don’t actually

  • necessarily want to know the exact number of rows. Do you care if you've got a 1000

  • threads or 1100 threads? It really doesn’t matter. You want the ballpark figure. So even

  • if your numbers are not fully up-to-date, it doesn’t matter. If you want to run Chrome

  • job that does the InnoDB count once an hour, then put that values [INDISTINCT]. You could

  • put it in another table. You could put it in a file that can include into whatever page

  • is displayed, that works fine. Then reduce the problem to once an hour rather than every

  • time you run that query. >> PIPES: All right. Isolating indexes or--index

  • fields on one side of the equation, you know, your basic--your basic Math class, you want

  • to put the single on the one side and had your equation on the other, you know. Well,

  • we'll show an example of how to do this in a second. And using calculated fields when

  • they're necessary, you know. Let's go into the example of isolating index fields on one

  • side of the equation. Okay. Here we've got--whenever you apply a function to a field that’s indexed,

  • it--the itemizer can't use the index on, okay. So you want to look for opportunities where

  • you can rewrite SQL statements like on the top here into the exact same SQL statement

  • or the exact same result, but youve isolated the index field without any function operating

  • on it on, you know, the left or whatever, right. And just converted the actual statement

  • into a--just a different way of representing it, okay. Now, in this case, you're just taking

  • the two days and converting it into current date minus seven day, you know. Just make

  • sure you understand the greater than and equals all that, you know, so you're including that.

  • Now, what's wrong with the second part? How can we improve that with our performance?

  • Anyone know? >> [INDISTINCT]

  • >> PIPES: Say again? >> [INDISTINCT]

  • >> PIPES: Yes, well, yeah, basically that. The current date function is a non-deterministic

  • function and the problem with that is that the query cache won't cache this query. Well,

  • the thing about current date is, it only returns a different value once every what, 24 hours?

  • So if you're running this, you know, let's say a hundred thousand times in a day, instead

  • of running this once, it's going to run a hundred thousand times. You know, with the

  • query cache it actually puts the query into the cache stores the results and every time

  • you issue that exact same query, just takes it rather to query cache never even has to

  • hit the Storage Engine, the mySQLs, whatever. It just takes right in the query cache. So

  • in your Perl Scripts or your PHP scripts or whatever you're using to write, just insert

  • the date instead of using current date, okay. A lot of people don’t realize this, but

  • very good way to get performance out of your query cache. Okay, using calculated fields.

  • When you got [INDISTINCT] email address like present.com, index can't be used. The reason

  • is because you can't have a wildcard on the left side of a constant, comparing it with

  • like. If you put the wildcard on the right--on the end and you have a constant string and

  • then wildcard the index can be used because you can look at the left side of the values,

  • okay. So the way to do this is to do a little trickery and put a reversed email address

  • actually in your table, okay, with 5.0 you can use a trigger. It actually inserts the

  • reverse of the email address that you're inserting directly into that new reversed email address

  • field, okay. And then on the bottom, you can just do the like reversed CONCAT and now you

  • can--now you actually can use an index on that field. So that’s a way of using calculated

  • fields or, you know, fields that, you know store different values based on other fields

  • on the table to get performance out of something that you'd never be able to use an index on,

  • you know. >> [INDISTINCT]

  • >> PIPES: Yes. This is something, you know, if you have an application that's looking

  • up, you know, all the email addresses in your database that end it dotcom or aol.com or

  • whatever it is, you know, that's--you don’t want to be running that top one every single

  • time if someone request that. You want to be operating on a reversed email address and

  • using the index on it. Okay. Learn to use joins. Like I said earlier, eliminating correlated

  • subqueries is a great way to get performance out of your application currently. Like, Timour

  • who's going to be speaking in a little bit, is on the optimizer team and he's working

  • on subquery optimizations but there's, you know, there's just a better way of doing things

  • with the joins, okay? It's easier and it follows the set-based programming principles of SQL

  • instead of the procedural language principles that most of us are very familiar with, you

  • know. The for-loop, the operating on, you know, everything one at the time. So, let's

  • go to some examples here. Don’t try and outthink the optimizing team. The team works

  • pretty smart. You'll meet them in a little bit but he's really smart. You can give the

  • SQL server--the MySQL server in a join hint using like straight join but the problem with

  • that is while you may get some performance benefit if the optimizer currently can't figure

  • out the right path and I've only seen this in like one or two instances. Well, in two

  • months your data sets changes and the optimizer that you've now overwritten with your straight

  • join hint, now you may have the suboptimal plan for it and you forgot that you put straight

  • join in there and now you've got a suboptimal plan. So, if you are going to use join hint,

  • make sure you thoroughly document where you put it in the code and don’t forget about

  • it. Because as your data set changes, your optimizer could much more easily pick an optimal

  • path for the query execution. Okay. Eliminating correlated subqueries. With 4.01, it looked

  • like most of you are on 4.01. You could use subqueries. I tend to avoid them like the

  • plague. We see this a lot on the forms, this kind of thing. Most of you like advance SQL

  • programmers, you probably not do anything like this but we see it everyday. You just,

  • I mean, the thing is, to try and get the maximum sold price from an order items table for each

  • product. And the procedural programmer, you know, who's a pro coder doesn’t necessarily,

  • you know, isn't familiar with SQL, hasn’t been trained in set-based, you know, thinking.

  • This is a very natural way of doing something. Your saying, "Okay. I want my products table.

  • I'm selecting for my product's table. Okay. And I want each of those products." And for

  • each one, I want to get the maximum price that it was sold in the order items table.

  • You know, correlating on the product ID. It's a very natural way for a procedural programmer

  • to think of things, you know. But remember, it's for each product, I mean, that's the

  • way that people think about it. That's not the set-based way of doing things. On the

  • right is the set-based way. You're joining, you say, "Okay. I've got two sets of information.

  • I've got my products and I got my order items. Okay. And I want to transform these order

  • items by grouping them on a product id and then finding the maximum price." So, that’s

  • my set. The maximum price is for each product today. And then, I have my products set and

  • the maximum price is set and I join them. That--that actually is two operations, right?

  • Two conceptual operations, grouping and joining to the products. On the left, you are actually

  • doing a query for every single product so, it doesn’t scale to linear model. Actually,

  • you have a thousand products; you're doing a thousand sub queries to get the maximum

  • price from the order items table. You have a million, you do a million queries. You're

  • still doing two queries on the--on the right. So, think about how you can get rid of correlated

  • subqueries. This is one of the best ways to get better performance out of your SQL code.

  • Using derived tables, it might seem like a big mess but it’s actually more efficient

  • on the right. And it goes back to the same principle of how many queries are you actually

  • issuing? On the left, you've got a join from customers to orders and in the where clause,

  • you have a correlated subquery with a group by in it, okay? Not a group, with an aggregate.

  • So, you're saying, the customers join to the orders but I want a group by the company of

  • the customer, not the customer id. And in order date equals the maximum or the last

  • order date in the orders table where the customer's the same, it's correlated to your customer's

  • table C and then group by the company. Well, you're going to be doing a sub-query for this

  • select MAX--for each customer in the order's table. Now, on the right, we say, "Okay, are

  • these customers set?" And we're interjoining on what's called the derived table, a sub-query

  • in the FROM clause. That’s one set. We're taking the maximum or the last order date

  • for each customer, distinct customer ID in our order's table, that’s a set. And then

  • we're joining back to the order's table so there's three--three operations. Three conceptual

  • sets to get that result versus the non-scaling method of a correlated sub-query in the WHERE

  • clause which has to be executed for every single customer in the order's table. Okay.

  • All right. And this is my last slide and I tend to kind of scheme over server parameters

  • because it's--I'm very happy to talk specifics with anyone of you but to generalize about

  • server parameters, it's okay but there're very specific situations where they can help--where

  • you can get some definite performance boost. Be aware of which server parameters are global

  • versus what's per thread--per connection made to the database. Okay. For instance, key buffer

  • size. It's a global variable. Sort buffer size, it's per thread so be aware of that.

  • Don’t, you know, assume that sort buffer size is for the whole database and bump it

  • up because it's actually per thread. So make small changes, preferably a single change

  • to your configuration file, retest. See if you get a performance benefit from that specific,

  • you know, change. Often provide a quick solution but they can be temporary. Really--really,

  • the performance benefit is optimizing application, your indexes, your schema--your coding. Query

  • cache, it's off by default so the size is zero for your query cache so if you haven't

  • changed it, it's not on. You can do show status and they'll say query cache type--I think

  • it's query cache type on but there's no size to it so that actually is use by default.

  • So unless you turned it on, you're not using your query cache. If you're doing read-intensive

  • applications, like, how many are doing, like, lots and lots of writes versus lots and lots

  • of reads? Who has 90% reads or more? Okay. Do you have your query cache as turned on?

  • Who has the query cache turned on that they know they've added size to? One guy in the

  • back, all right. Two, all right, excellent. Okay. If you're doing read-intensive applications,

  • turn the query cache on. Why? Because it doesn’t have to execute the query. If it's the exact

  • same query and your dataset isn’t changing much, it just takes it directly from the query

  • cache. The query cache actually stores the results set. It doesn’t just, like, compile

  • a query and store the query so they can be re-executed. It actually stores the rows so

  • that it doesn’t actually have to do in either parsing or anything and the optimization.

  • It just sends the rows right back to the client. So if you're doing repeated queries like that,

  • turn the query cache on and give it some size the--in the configuration file, it's the query_cache_size,

  • set it to, whatever, 32 meg or 64 meg. You can play around with it. You can also do it

  • online. I think it's sets global query cache size equal something or other, it's in the

  • menu. So you can do it online or you can do it in your configuration file. Beware of--since

  • you don’t use MyISAM tables. How many of you think that you're not MyISAM at all? I

  • hope not. Okay. The MySQL's internal database is in MyISAM. Okay. So that’s why if you

  • do show status, you'll see, actually, that you have MyISAM operations going on. So even

  • if all your tables are, InnoDB, the MySQL internal database is MyISAM, so you still

  • need to put, you know, something in there. So leave the defaults for key buffer size

  • there so the MySQL database can, you know, live happily. But the InnoDB buffer pool size,

  • that’s--that’s your main one for InnoDB that determines how many--how much memory

  • is allocated for storing date and index pages in InnoDB. I think Hecky recommends about

  • 50 to 80% of your total memory on the box should be allocated to InnoDB buffer size.

  • Okay. Other important InnoDB stuff is InnoDB log, final size and log buffer size. I think

  • he recommends that the log--I think the log buffer size maxes out for a gig but you can

  • adjust it depending on, you know, what's your situation is and I can take some questions.

  • Oh, yeah, memory is the cheapest, fastest, easiest way to get better performance out

  • of MySQL. Don't even ask your boss. Just stick another piece of RAM in there. It really is.

  • MySQL just loves RAM, you know. The more RAM you give it, it's like Pac-Man it just eats

  • the stuff up. It loves it. So, that's the easiest and cheapest way to get better performance

  • from a hardware situation. Yeah? >> you talked about

  • >> PIPES: Okay. The question was if you've got a logical key in a character field, right?

  • Say for instance--I don’t know, Social Security number, something like that. Is there a benefit

  • to making a--what you're calling a surrogate key out of an auto incrementing ID? There

  • is a benefit to that, a big benefit to that. Depending on the--especially like if you have,

  • you know, a CHAR 4 versus an INT you're probably not going to see much of performance difference.

  • If you got, you know, CHAR 8, CHAR 16, that--that's, you know, your product ID or your record ID

  • or whatever it is versus an auto incrementing key. You're going to get a lot better performance

  • from the numeric data type in four bytes. It's optimized for that, so.

  • >> Why do you go up to [INDISTINCT] >> PIPES: Okay. There's that string data type

  • that we need. Just--you mean versus like VARCHAR 4,000 or something versus text or whatever,

  • there's some--maybe [INDISTINCT] can talk to that. I don't know--I don’t know if there's

  • a specific, you know, how much performance benefit you get from doing VARCHAR versus

  • text. I think in InnoDB, there maybe something but

  • >> Yeah. So the question is there a benefit to using CHAR rather a VARCHAR in certain

  • cases? >> PIPES: Or text versus VARCHAR.

  • >> Or text versus VARCHAR. In MyISAM, text fields actually store it separately, text

  • in blobs. The actual row contains a pointer to where the text field is. In case of a CHAR

  • or VARCHAR it's part of the row. In InnoDB, same up to a point. In certain cases you would

  • just find everything in the row depends on which version and so on. InnoDB itself doesn’t

  • care where the things are CHAR or VARCHAR. It's always internally uses VARCHAR basically.

  • So, in InnoDB you don't need to make that distinction. Okay. I want to optimize for

  • fixed length fields, that kind of stuff. It doesn't care about that. MyISAM would care

  • so that's the basic story. Does that answer your question? Okay.

  • >> PIPES: Any other questions? No one? All right. Well, thanks guys. Thanks for letting

  • us come here.

>> Okay. Hi, guys. So this is Jay Pipes. He's a MySQL employee. He's in charge of North

字幕與單字

單字即點即查 點擊單字可以查詢單字解釋

B1 中級

MySQL的性能調優最佳實踐 (Performance Tuning Best Practices for MySQL)

  • 343 16
    陳柏霖 發佈於 2021 年 01 月 14 日
影片單字