字幕列表 影片播放
>> Okay. Hi, guys. So this is Jay Pipes. He's a MySQL employee. He's in charge of North
American Community Relations. I met him this week at the MySQL User Conference which is
down in Sta. Clara. Now, the deal is that we're going to host this on Google video.
So if you have confidential questions, if you could leave them till the cameras are
turned off, that would be cool. With that, no more ado, Jay Pipes.
>> PIPES: All right. Nice to meet all of you. I'm going to be giving a talk that I gave
on, I think, Tuesday or Wednesday, on "Performance Tuning Best Practices for MySQL." Some of
the things are version specific but for the most part, it's pretty general. We're going
to be covering Profiling and Benchmarking Core Concepts, Sources of Problems that you
might encounter, Index Guidelines, Schema Guidelines, Coding Techniques, and lastly,
we'll go over some general things about tuning server parameters. Since the audience is,
I think, sort of mixed levels of experience, we'll just shout out questions or tell me
if you want some grand stuff or so. All right, Benchmarking Concepts. The general idea behind
benchmarking MySQL is to, obviously, have a track record. You know, when you change
things in your code, in your application, or the configuration files with MySQL, or
your schema design, or you're indexing, you want to make sure you have a track record
of what you're doing. Because a lot of times, if you--if you don't keep a track record and
don't do benchmarking, you won't really know whether a specific change that you made had
a detrimental impact or whether it actually increased performance. So, the baseline--and
you always want to have your baseline for your starting point. And always give yourself
a target. Don't just do it so it's a moving start--moving target, you always want to have
an end goal, you know. You either want to get, you know, 20% increase in performance.
So that's your goal. But just don't have it, you know, just kind of end with nothing because
then you'll just spend years and years and years trying to, you know, tune milliseconds
out of your--out of your application when there's really no benefit to it. So, change
only one thing at a time. A pretty basic concept but it really--it goes a long way in benchmarking.
If you change two things and you rerun a test, how do you know which one of those two things
actually affected the difference between the two tests? So, always change one thing at
a time and rerun the benchmarks. And record everything, configuration files, information
on the hardware, information on obviously the operating system, the version of MySQL,
and everything to do with the benchmarks. Keep it in a separate folder and keep everything
for historical record. A lot of times, you'll think that you won't need information when
you're running benchmarks. And then, six months later, you realize that, "Wow! It would be
really useful if I had--oh, I threw it away". So, always keep information from your benchmarks
so you have a historical record of what you did. And specifically with MySQL, you want
to disable the query cache whenever you're running benchmarks. The reason is it will
skew the results that you get from read-intensive queries and write-intensive queries as well.
So, disable the query cache by setting the query cache size equal to zero when you--when
you run benchmarks, so. Prof… Yes. Go ahead. >> [INDISTINCT] benchmarking for [INDISTINCT]
problem with the operating system, caches [INDISTINCT]?
>> PIPES: Right. Yes. The question was--what you were experiencing that the operating system
caching was skewing the results of the benchmark. Generally, the idea is either you can disable
caching if it's actually getting in the way or if you bump up the actual number of runs
in the benchmark tests you can minimize the skew effect. So that's usually what we recommend.
But, when you get into specific caches there's very specific things to do. So it kind of
depend on the exact scenario that you're doing, so. But, yes. Bumping up the number of runs
in the benchmarks generally tends to mitigate a lot of that, so. Does that answer your question?
Okay. Profiling Concepts, profiling is a little different than benchmarking. Generally, you're
profiling a production or running a test system that you're trying to figure out, you know,
diagnosing what's going on while it's running. So, instead of doing--you know finding the
stress or the load that the system can take, you're actually trying to pinpoint where the
bottlenecks are in the code while it's running. With MySQL, the best tool that you have is
the EXPLAIN command. So, regardless of what version of MySQL you are running, you really
want to get familiar with everything in the EXPLAIN command. Understand what the access
types, does everyone know what the EXPLAIN command is? Everyone use it? Okay. Does everyone
know what every single type of access type is? Okay. It's sort of a poorly named column
in the--in the EXPLAIN results. But, it actually describes the path or the optimization pattern
at which the optimizer's saying, "Okay well, I'm going to, you know, access, you know,
these results at and join it to these results at." And each of those things, the ref, the
refrenol, the equaf range, all the different access types mean different things. And the
performance that you get from each of them, you can go into MySQL manual, depending on
your version. You can see which ones have a better performance for different types of
queries. It's not always the same. So, you want to make sure you understand those. Understanding
obviously that if you see all in the access type, it means it's doing a full table scan.
You know just little things like that. Be aware of what they all mean and what the performance
impact of the different ones are. So, by using the Slow Query Log and mysqldumpslow to parse
the query log, very easy to set up, Log Slow Queries and then you give it a long query
time, anything above that, it will log to the--to the Slow Query Log. There's also--I
think in 4.1, we put it an option that you can--anything that--any table, any select
statement that does not use an index on a table can get logged to the Slow Query Log
regardless of how fast it runs. And that's--and that's useful for when you're doing an indexing
strategy and you're trying to determine okay my patterns have access, do I--do I have indexes
on those fields that I'm actually accessing? You know, and my where clause and ON clause
and that kind of thing. So, low hanging fruit law of diminishing returns tackles stuff that's
going to get you the absolute best return on investment for your time. Don't concentrate
on, you know, I know you guys are Google. So, a millisecond definitely does count. But,
if you got a--if you got a choice of, you know, going from five milliseconds to four
milliseconds, you're going from a minute to, you know, a second, it's a no-brainer. You
know, concentrate on the stuff that--that's going to give you the biggest impact. Using
Mytop. Jeremy Zawodny wrote a nice little tool. Most of you are probably familiar with
the top utility in UNIX. It's almost identical but it's just for MySQL. And it gives you
an ability to on a delay--on a repeating delay, you know, show you the process list of what's
going on in MySQL, and show, you know, show various show commands, the status variables
and gives you a nice little interface to profile into the system while it's running. And one
nice thing about Mytop that I really like is that you can--you can use it for remote
hosts as well. So, you can--you can run it on your local machine and access a remote
host. So that it is a nice little tool. Okay. Sources of Problems. This is kind of my personal
opinion of where I rank the law of diminishing returns, like what I actually like go for
immediately. The performance team tends to think that, you know, sometimes you can--you
can tune a server parameter and that's like the biggest bang for the least, you know,
amount of time. But I think, really, where performance problems come in are the--are
the definitely the top three here. Poor Indexing Choices will kill an application faster than
really anything else. If you're querying on a field and you don't have an index on it
and you're repeatedly hitting that, it's going to kill the application faster than, you know,
whether sort buffer size can be adjusted. So, same thing with the Inefficient or Bloated
Schema Design, you know, I talked in the--in the performance talk. I've seen a number of
cases where people designing new applications, well, actually de-normalized from the beginning
because they think it's like going to give them better performance. Don't do that. You
know, there's specific cases where you can de-normalize the schema and you can get some
performance benefits. But don't do it right away. You know, just because you think it's
going to get, you know, a performance benefit. Use good, efficient, normalize schemas. And
we'll talk--we'll go into detail on this later. Bad coding practices. MySQL has some inefficiency
problems with sub-queries. But in general, using joins is a lot more efficient than using
sub-queries. We're going to go show some examples of specific where you can really get a performance
increase by just simply changing the way that you--the way that you code SQL. Server Variables
Not Tuned Properly, that's a source of problems. But in my opinion, very specific situations
can be--can be helped by tuning server parameters. It's not some catch-all, you know, it totally
depends on, you know, how much memory you have in the machine, what the architecture
is, what operating system you're running on, what version of MySQL, what storage engines
you're using, your, you know, the type of the application that you have, whether it's
OLTP or heavy reads, you know, all these things. Yes. Tuning server parameters can help but
it's very specific to certain situations. And it also requires retesting and rerunning.
But, you know, you tweak one thing, you have to re, you know, run the benchmark test to
see if--see if you got the performance gain out it. It can be time consuming to tweak
server variables whereas it's pretty easy to, you know, change some seq--some SQL statements
and immediately rerun it and see if you got a huge performance gain from doing that. And
finally, Hardware and--and Network Bottlenecks. I/O, CPU bound, network latency. Right. There
is a good book that I brought along that I recommend for that and when we get to the
last slide, I'll show you that, so. All right, Indexing Guidelines. A poor or missing index,
definitely the fastest way to kill an application, like I said. Look for covering index opportunities.
We're going to go into some slides here to kind of explain what that is. But the covering
index is--especially for my--who's using MyISAM versus InnoDB? InnoDB, raise hands. MyISAM,
raise hands. Nobody is using MyISAM? For a single thing, okay.
>> [INDISTINCT] >> PIPES: Say again?
>> [INDISTINCT] >> PIPES: Got you, okay. Well, we'll explain
some things about covering index opportunities that are very specific to InnoDB. But covering
index is basically what it means is that the MySQL can get all the information from the
index records which are by definition slimmer than the data records because they don't contain
all the fields in the table and use that information to complete whatever the query was without
having to go into the data records. One of the key concepts to this, the slimmer that
your index records are, the more that you can fit into a single index block and the
more that you can fit into an index block, the fewer reads you're going to do, the faster
your application's going to be. So, covering index opportunities definitely look around
for those. When you're deciding on which fields you actually want to index, make sure you
take a look at what the selectivity of the--of the field is. Selectivity is the cardinality
or the number of unique values that are contained in that field divided by the total number
of values in a table. So, obviously a unique index would be a cardinality of one. Because
it's, you know, unique over the total count. That's obviously the best-case scenario but
you can get situations where you've got awful selectivity. If you're querying just on that
index or just on that field you may get awful performance from that because--or may just
decide not to use the index because it ceases to be useful. I think generally like, [INDISTINCT]
could talk a little bit more about this but I think that it numbers about 30%. If your
optimizer sees that, okay you do--you do a query select, you know, start from whatever,
where column A equals two. And it will take a look and see an estimate from what the information
that has on the--on the index, well this particular query returned a third--you know, around 30%
or more of the rows in the table. If it does it's actually much more efficient to just
sequentially scan the table than it is to go into the index and randomly seek and grab
a bunch of records and then do lookups back to the data records. So, be aware of--of the
uniqueness of the values in your index. For lower selective fields were you--were you
don't have very much selectivity you can sometimes tack them on, you know, to a multicolumn index
so that, especially we grouped by queries and we'll show an example of this you get
some efficiencies there. But just be aware of the selectivity, so. On multicolumn indexes,