Prior to beginning the actual work to satisfy a query Oracle takes a look
around and decides on what it thinks is the best way to perform the query. A
table may have a number of indexes, they may be composite, unique, functional
etc. The statement may have a number of joins across many tables with many
conditions and there could be various ways to perform the same operation.
The optimiser gathers information about the various options and considers how
a full table scan, an index scan and different ordering of those scans would
result in different performance and workload.
There are two main modes in which the optimiser can operate. The Optimiser
Mode as it is known is set on the database instance by the DBA's. Oracle DBA can find
out which mode is in use with the following query.
SELECT VALUE
FROM v$parameter
WHERE NAME = 'optimizer_mode';
NAME
====
CHOOSE
The possible values of optimizer_mode are
- CHOOSE
- FIRST_ROWS
- ALL_ROWS
- RULE
FIRST_ROWS and ALL_ROWS explicitly use the cost based
optimiser. RULE uses the rule based optimiser and CHOOSE allows Oracle to choose
on an individual query basis. The variations on the ALL_ROWS or FIRST_ROWS
indicate to Oracle whether it should aim to complete the entire job with the
minimum of work, or whether it should aim to return it's first response as soon
as possible.
In a batch operation where no users are involved it is preferable to have the
entire job completed with the minimum of work whereas when Oracle DBA have a user
waiting for a result, it is preferable to have the first item in a list appear
while Oracle is still working on obtaining the rest of the list.
As a very simple rule of thumb, databases on Oracle versions upto 7.3 are
usually configured to use the rule based optimiser. From version 8 onwards, the
cost based optimiser became the default, recommended method and is usually in
use.
Learn Oracle - Table and index statistics
For rule based optimisation, table and index statistics are not required. For
cost based optimisation the statistics need to exist on atleast one of the
tables being referenced.
Statistics are needed by the cost based optimiser in order for it to
determine the best access path to the data Oracle DBA require. Consider a table
containing the gender and age of 20000 people. If Oracle DBA wanted to obtain all the
males aged 34 and the table had an index on gender and an index on age there
would be two possibilities.
- Use the gender index to first obtain all males and then go through the
result set finding the 34 year olds
- Use the age index to first obtain all 34 year olds and then search the
result set finding the males
Consider both approaches. We know there
are 20,000 people in the table. If we used the index to find all males it is
fair to say we would return about 10,000 rows. We would then have 10,000 rows
with no index available and so would have to search through the 10,000 taking
out non 34 year olds from the result set. If we used the index to find all 34
year olds first, it would be fair to say we would return about 200 rows which we
would then have to look through manually finding the males.
It is clear then that somehow Oracle needs to know the makeup of an index, ie
how selective it is for the given data (or index expression). Oracle
documentation is not particularly clear to me on the terminology used. They
refer to selectivity and cardinality. I'll probably get shot down for this, but
in my mind I call it the granularity of the index.
The granularity of the gender index could be said to be about 0.5 and the
granularity of the age index about 0.01.
These figures can be obtained by instructing Oracle to Analyze the table and
it's indexes.
ANALYZE TABLE PEOPLE COMPUTE STATISTICS;
ANALYZE TABLE PEOPLE COMPUTE STATISTICS FOR ALL INDEXES;
The resulting values can be seen in the system views USER_TABLES and
USER_INDEXES. From Oracle 8 onwards (I don't know about 7 and don't have one
here to check) Oracle DBA can use the DBMS_UTILITY package to have an entire schema or
instance analyzed. I should mention that I have seen reports of erratic
behaviour on 9 onwards when using DBMS_UTILITY to analyse the schema. Oracle DBA should
check this for yourself.
Since writing this, it has been brought to my attention that Tom Kyte (a
virtual reference library of Oracle knowledge) has said that for gathering
statistics ... DBMS_UTILITY PACKAGE : is deprecated
ANALYZE TABLE COMMAND : works but is not preferred
DBMS_STATS PACKAGE : is the preferred method
Certainly the ANALYZE TABLE command is very straight forward to use, and
to get Oracle DBA moving forward into SQL statement tuning it should suffice for the
moment. DBMS_STATS is far more powerful, but is also quite involved.
When instructing Oracle to perform object or schema analysis Oracle DBA can also
indicate whether it should compute the statistics (perform exact counts to get
the statistics) or estimate the statistics (work on a subset of the data to come
up with a reasonable estimation). For very large tables it is usually best to
estimate the statistics.
Refer to Oracle documentation for exact syntax of the ANALYZE TABLE command,
DBMS_UTILITY functions and DBMS_STATS functions applicable to your database
version.
When SQL is being optimised through the RULE based optimiser there is no
benefit to having statistics on your tables or indexes. The statistics are not
used to effect the actual access paths.
When using the cost based optimiser Oracle DBA should also ensure that your
statistics are up to date. The statistics on a table or index are correct at the
time that table or index was last analysed. If the demographics of your table
have changed substantially Oracle DBA should bring the statistics up to date by having
the statistics re-estimated or computed.
Learn Oracle - The rule based optimiser
The rule based optimiser, as it's name suggests follows a set of fairly
comprehensive rules for accessing data. The rules are ranked in order of usual
performance and Oracle will always use the highest ranked rules to perform the
data access.
Essentially, each predicate (part of your where clause) causes workload. By
looking at what indexes are available and what would be the likely workload
against that index, Oracle forms it's execution plan. For instance, Oracle would
usually execute an equals predicate first rather than a range predicate over an
index. Quite simply the equals predicate is generally more likely to return less
data and so have less disk I/O.
The rules are a little more complex than this and the Oracle documentation
goes into some detail explaining each of the rules and their ranks.
Prior to version 8, the rule based optimiser was the optimiser of choice, and
the default for a database. The cost based optimiser was still a little flaky
and the reliability of the rule based optimiser to produce consistent access
plans was a definate advantage.
Learn Oracle - The cost based optimiser
The cost based optimiser determines the cost of various different access
methods against a query and will always use the method with the lowest cost.
The Oracle documentation both gives a measurable unit to the cost (one unit
of Disk I/O but can also be configured to measure network traffic or CPU work)
and also states that the Cost has no actual measure, ie it is there as a guide.
Thanks guys.
Whilst the cost figure can be a useful indicator of a statement, it should
not be used as the final target for your tuning goals. As the cost may or may
not only measures disk I/O it may or may not not include the amount of memory
used by your query, or the amount of CPU time.
That all sounds a bit confusing, so in plain terms the Cost is calculated as
some arbitrary unit of work. It's only value is in comparing one execution plan
to another, but do not rely on it as the single measure of workload.
Consider the following... CREATE OR REPLACE FUNCTION DONT_RUN_THIS RETURN NUMBER AS
BEGIN
RETURN DONT_RUN_THIS;
END DONT_RUN_THIS;
/
SELECT DONT_RUN_THIS FROM DUAL;
/
Obviously this is a far fetched piece of code, but it demonstrates that
while it has no Disk I/O it would actually bring your server to it's knees. With
each iteration, slightly more memory would be allocated to the process and it
would also be using up all available CPU time.
There are some scenarios where the cost based optimiser can get it wrong. In
the standard form of gathering statistics there is no allowance for how data may
be 'skewed'. An example would be a table recording the favourite number of 1
million different people. Quite likely a large number of people might choose 7
as their favourite number. For arguments sake let's say it's 95%.
We could have the "number" column indexed but under the cost based optimiser
with standard statistics, a query on that table of the form SELECT NAME
FROM FAVOURITE_NUMBERS
WHERE NUMBER = 6;
would almost certainly force a full table scan. The reason is that Oracle
would have through the statistics determined that the index has a very low
granularity (almost all of the values are the same). It would therefore decide
that it is likely to be returning almost all of the rows of the table even if it
uses the index and so it would wrongly decide that a full table scan is quicker.
Clearly that is wrong. If Oracle had a better understanding of the skew of data
in the table it would recognise that when querying any favourite number except
7, the index path is quicker but when querying 7, the index path is redundant.
To allow the optimiser to understand the data skew, histograms can be
generated on the data as part of the statistics gathering process. The actuall
processes involved and the background to understanding histograms appears fairly
involved. I need to understand them far better myself before writing about them
here. Watch for a separate paper on histograms.
The cost based optimiser is the optimiser of choice for databases from
version 8 onwards. It does require that upto date statistics be maintained for
your indexes and tables and while preferable to the rule based optimiser it
still has some pitfalls.
|