Advanced Methods for Managing Statistics of Volatile Tables in Oracle (Hotsos 2012)

March 7, 2012, 12:19 pm

≫ Next: Hotsos 2012 – the benefits of a discussion with a really knowledgeable audience

Thanks to all who attended my presentation at Hotsos 2012. You were a great audience with insightful questions and relevant comments.

Here is my presentation along with the white paper.

The code for JUST_STATS package (spec and body) is available for free. Note that I cannot support the package and I do not accept any legal liability or responsibility for the content or the accuracy of the information so provided, or, for any loss or damage caused arising directly or indirectly in connection with reliance on the use of such information.

The readme and the file with examples could quite useful as well.

Hotsos is such a great conference… I had to fly back home on Tuesday morning, but I wish I could stay longer.

↧

Hotsos 2012 – the benefits of a discussion with a really knowledgeable audience

March 12, 2012, 12:02 pm

≫ Next: Finally! Something on this blog that can really save you money.

≪ Previous: Advanced Methods for Managing Statistics of Volatile Tables in Oracle (Hotsos 2012)

After a short conversation with Alex Gorbachev, I realized that maybe I should have added some implementation details about the “two-phase removal of data” approach for reducing table volatility.

First, the approach does not reduce the volatility of the content of the tab view, it reduces the volatility of what the CBO sees or assumes about the tab view. But, again, that is all that matters…

Second, please do not create a histogram on the deleted column of tab_internal table, even though the column would likely be heavily skewed. Doing so would introduce volatility because the histogram would represent the deleted/active distribution as of when the stats are gathered. What we need is the “standard” 50/50 split we get when the column does not have a histogram. Make sure that there are exactly two distinct values for the deleted column in tab_internal table.

Third, an index (regular of function-based /covering only the “N” value/ ) on the deleted column in tab_internal would be quite useful.

On a different note, someone from the back of the hall, I think that was Kerry Osborne, but I am not quite sure, mentioned that the JUST_STATS package should explicitly invalidate all cursors that use the table we are gathering stats on. I answered that I thought the package did invalidate cursors, but I was incorrect. Anyway, the current version does invalidate cursors (NO_INVALIDATE=> FALSE). Thanks a lot for this observation.

↧

Finally! Something on this blog that can really save you money.

April 13, 2012, 1:26 pm

≫ Next: Cardinality feedback for queries that use volatile tables with locked statistics.

≪ Previous: Hotsos 2012 – the benefits of a discussion with a really knowledgeable audience

No, I am not kidding!

As you may have heard, News America Marketing just launched a free couponing app for the iPad called SmartSource Xpress. We’re very excited about our revolutionary new product and wanted to share the news with you. SmartSource Xpress not only offers the great consumer savings SmartSource is known for, it’s also green (no paper!) and digital (very 21^st century!).

SmartSource Xpress is a user-friendly, elegant and fun way to clip coupons that saves you time both at home and at check out.

Imagine a digital coupon insert with enhancements including video, recipes and free samples! Then imagine getting coupon savings without having to clip, file or carry your coupons to the store. With SmartSource Xpress, coupons are “clipped” with a tap of the screen, then uploaded to your shopper loyalty cards to be redeemed automatically at check out. Never worry about leaving your coupons at home again!

So many great brands are participating, we couldn’t possibly list them all here, but we’ve got over 30 offers in our first book – and there will be a new book every Sunday, 50 weeks of the year.

So far, nearly 4000 stores are participating in the program, and that number is growing all the time. Check here for participating retailers.

SmartSource Xpress makes it simple to save money (and it’s pretty to look at, too)!

You can download SmartSource Xpress for free from the App Store.

I would love for you to be one of the first people to use our new app, and if you like it, hope that you’ll give it a positive review in the App Store for others to see.

↧

Cardinality feedback for queries that use volatile tables with locked statistics.

April 27, 2012, 10:56 am

≫ Next: Oracle Enterprise Manager Oracle Enterprise Manager Cloud Control 12c is here. Let’s look under the hood!

≪ Previous: Finally! Something on this blog that can really save you money.

As if dealing with volatile tables alone is not complicated enough, in Oracle 11gR2 we also have to consider how the new cardinality feedback feature (on by default) would affect us.

While there is a lot to be said about volatile tables (presentation, white paper) , there are two mainstream approaches to dealing with them – dynamic sampling and locking table statistics.
In 11gR2, dynamic sampling turns off cardinality feedback, so there is no change in behavior from previous versions.
Locking table statistics, however, does not turn cardinality feedback off and that is a big concern. Oracle’s official blog entry acknowledges that carnality feedback is not useful for managing volatile tables: “Cardinality feedback is useful for queries where the data volume being processed is stable over time. For a query on volatile tables, the first execution statistics are not necessarily reliable. This feature is not meant to evolve plans over time as the data in the table changes”. The statement is a little vague though. The precise statement should be “Cardinality feedback should not be used for volatile tables with locked statistics”.

Here is why:
According to MOS Best Practices for Automatic Statistics Collection [ID 377152.1], we lock the table statistics when the table is at its maximum size. The idea is to use those stats regardless of what is actually in the table. Cardinality feedback essentially overwrites the estimates derived from the data dictionary with real-time statistics derived from the table at that time, which is exactly what we do not want to do when managing volatile tables.

This example will illustrate this point.

Let’s create volatile_tab and static_tab tables, with 50K each and an index on static_tab

create table volatile_tab as with generator as ( select --+ materialize rownum id from dual connect by level < 10000 ) select mod(rownum,100) col1, rownum col2 from generator v1, generator v2 where rownum < 50000;

Table created.

create table static_tab as with generator as ( select --+ materialize rownum id from dual connect by level < 10000) select mod(rownum,100) col1, rownum col2 from generator v1, generator v2 where rownum < 50000 ;

Table created.

create index static_tab_i on static_tab (col2);

Index created.

Now, let’s gather tables statistics and lock the statistics of volatile_tab, the volatile table.

SQL>exec dbms_stats.gather_table_stats('TEST','static_tab');

PL/SQL procedure successfully completed.

SQL>exec dbms_stats.gather_table_stats('TEST','volatile_tab');

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.lock_table_stats('TEST','volatile_tab');

PL/SQL procedure successfully completed.

Since volatile_tab is a volatile table with a max size of 50K, it could contain significantly fewer records at some point in time. That is a typical scenario for a volatile table.

SQL> truncate table volatile_tab ;

Table truncated.

SQL> insert into volatile_tab with generator as ( select --+ materialize rownum id from dual connect by level < 10000 ) select mod(rownum,100) col1, rownum col2 from generator v1, generator v2 where rownum < 5 ;

commit;

Commit complete.

Now, let run this simple query:

SQL> select /*+ gather_plan_statistics */ count(*) from volatile_tab v, static_tab s where v.col2 = s.col2 and mod(v.col1,130) = 3 ;
COUNT(*) ---------- 1

SQL> select * from table(dbms_xplan.display_cursor('','','ALLSTATS LAST')); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- SQL_ID 6dx6gbc8kc030, child number 0 ------------------------------------- select /*+ gather_plan_statistics */ count(*) from volatile_tab v, static_tab s where v.col2 = s.col2 and mod(v.col1,130) = 3 Plan hash value: 2579530871 ------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | ------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 98 | | | |

PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 98 | | | | |* 2 | HASH JOIN | | 1 | 500 | 1 |00:00:00.01 | 98 | 1517K| 1517K| 371K (0)| |* 3 | TABLE ACCESS FULL| VOLATILE_TAB | 1 | 500 | 1 |00:00:00.01 | 6 | | | | | 4 | TABLE ACCESS FULL| STATIC_TAB | 1 | 50000 | 50000 |00:00:00.03 | 92 | | | | -------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id): ---------------------------------------------------
2 - access("V"."COL2"="S"."COL2") 3 - filter(MOD("V"."COL1",130)=3)

23 rows selected.

SQL>

We got exactly what we want when dealing with volatile table – the robust HASH join. The plan would perform OK when volatile_tab is small and would work great when volatile_tab is large.

Let run this statement one more time:
SQL> select /*+ gather_plan_statistics */ count(*) from volatile_tab v, static_tab s where v.col2 = s.col2 and mod(v.col1,130) = 3 ;

COUNT(*) ---------- 1

SQL> select * from table(dbms_xplan.display_cursor('','','ALLSTATS LAST'));

PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- SQL_ID 6dx6gbc8kc030, child number 1 ------------------------------------- select /*+ gather_plan_statistics */ count(*) from volatile_tab v, static_tab s where v.col2 = s.col2 and mod(v.col1,130) = 3

Plan hash value: 3273547765 ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 8 | PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 8 | | 2 | NESTED LOOPS | | 1 | 1 | 1 |00:00:00.01 | 8 | |* 3 | TABLE ACCESS FULL| VOLATILE_TAB | 1 | 1 | 1 |00:00:00.01 | 6 | |* 4 | INDEX RANGE SCAN | STATIC_TAB_I | 1 | 1 | 1 |00:00:00.01 | 2 | ----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter(MOD("V"."COL1",130)=3) 4 - access("V"."COL2"="S"."COL2")

27 rows selected.

Now, the cardinality feedback option kicked in, and the plan uses nested loops (NL) – the less robust join method. Even though this plan is OK for the current volume of volatile_tab, it would be bad when volatile_tab grows.

Let’s see:
SQL> insert into volatile_tab with generator as ( select --+ materialize rownum id from dual connect by level < 10000 ) select mod(rownum,100) col1, rownum col2 from generator v1, generator v2 where rownum < 50000 ;
commit ; Commit complete.

SQL> select /*+ gather_plan_statistics */ count(*) from volatile_tab v, static_tab s where v.col2 = s.col2 and mod(v.col1,130) = 3 ;

COUNT(*) ---------- 501
SQL> select * from table(dbms_xplan.display_cursor('','','ALLSTATS LAST')); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- SQL_ID 6dx6gbc8kc030, child number 2 ------------------------------------- select /*+ gather_plan_statistics */ count(*) from volatile_tab v, static_tab s where v.col2 = s.col2 and mod(v.col1,130) = 3

Plan hash value: 3273547765

---------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.02 | 643 |

PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.02 | 643 | | 2 | NESTED LOOPS | | 1 | 1 | 501 |00:00:00.02 | 643 | |* 3 | TABLE ACCESS FULL| VOLATILE_TAB | 1 | 1 | 501 |00:00:00.02 | 98 | |* 4 | INDEX RANGE SCAN | STATIC_TAB_I | 501 | 1 | 501 |00:00:00.01 | 545 | ----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id): ---------------------------------------------------
3 - filter(MOD("V"."COL1",130)=3) 4 - access("V"."COL2"="S"."COL2")

Note ----- - cardinality feedback used for this statement

27 rows selected.

The nested loops (NL) plans, a clearly bad choice, is used even when volatile_tab has plenty of records.

Cardinality feedback for volatile tables with locked statistics is can be one, although not the only, reason for plan instability. Oracle should modify MOS Best Practices for Automatic Statistics Collection [ID 377152.1] to specifically request cardinality feedback to be disabled for volatile tables.

↧

Oracle Enterprise Manager Oracle Enterprise Manager Cloud Control 12c is here. Let’s look under the hood!

May 18, 2012, 1:29 pm

≫ Next: Security considerations and challenges when querying OEM repository tables/views

≪ Previous: Cardinality feedback for queries that use volatile tables with locked statistics.

There are many great features in the new OEMCC 12c, and probably as many great presentations or blog posts about them.

This post is not about a particular new feature, it is about the internal OEMCC 12c repository structures and how to make use of them. Most, if not all, of OEM 10g/11g repository views are still present and used by OEMCC12c . Also, all the ideas and techniques I presented earlier (presentation, white paper) are still conceptually valid.

The way the structures are populated, however, can be slightly different. User Defined Metrics (UDMs) functionality is totally overhauled in OEM 12c. The new Metrics Extensions provide a flexible, structured and efficient mechanism for creating custom metrics. The new functionality now populates Metric_Name column in mgmt$metric_current with the Metric Extension name; prior versions simply inserted the hardcoded “SQLUDM” value. Metric_Column column in mgmt$metric_current exhibits the same pattern – in OEM 12c one can populate it with a custom value, while in prior versions it always contained a pre-defined strings ( ‘StrValue’).

The good news is that all reports and queries discussed in my presentation and white paper are still valid, except for the previously mentioned UDM/Metrics Extension change. It took my just a couple of hours to move all custom OEM repository structures, including the advanced tablespace forecasting, to the new 12c OEM.

        This is what needs to be changed in the ForcedLoggingAlert UDM/ Metrics Extension to work with Metrics Extensions:
and not exists (
     select
               *
         from
               mgmt$metric_current i
         where
               i.target_guid           = o.member_target_guid
         and metric_name             = ‘ME$ForcedLogging’
         and column_label            = 'ForcedLogging'
         and Metric_Column           = ‘ForcedLogging’
         and collection_timestamp    > sysdate - 20/1440
      and value                   = 'YES‘
          )

Where ‘ME$ForcedLogging’ is the name of the Metrics Extension and ‘ForcedLogging’ is the specified metric column.

↧

Security considerations and challenges when querying OEM repository tables/views

June 22, 2012, 11:55 am

≫ Next: How to Use the New OEM 12c Metric Extensions to Enforce Enterprise-Wide Custom Policies

≪ Previous: Oracle Enterprise Manager Oracle Enterprise Manager Cloud Control 12c is here. Let’s look under the hood!

OEM utilizes virtual private database (VPD) policies in addition to the standard database privileges and roles. That means that the DBA cannot just give privileges on OEM objects to a DB user, it has to take care of the VPD setting as well.

There are two ways to allow a DB user to access OEM table/views.

The first way is to configure the DB user as an OEM Administrator (Setup->Security->Administrators).Give the DB user/OEM admin at least EM_ALL_VIEWER role and the DB user is ready to access all OEM objects.

The second way is to create a regularly refreshed materialized view in the SYSMAN and grant that view to the DB user who needs the information.

This is the materialized view I use for the Forced Logging UDM in OEM 12c (Forced Logging method described in defail in my presentation and white paper) :

create materialized view "SYSMAN"."UDM_FORCED_LOGGING" tablespace "????????" pctfree 10 initrans 1 maxtrans 255 storage ( initial 64K buffer_pool default) logging using no index refresh complete start with to_date('06-22-2012 14:39:25','MM-dd-yyyy hh24:mi:ss') next sysdate + 1/96 as select count( member_target_guid ) cnt from sysman.mgmt$group_derived_memberships o , sysman.mgmt$target t , sysman.mgmt$availability_current st where o.composite_target_name = 'PRODUCTION_SYSTEMS' and o.member_target_type in ('oracle_database', 'rac_database') and ( t.target_type ='rac_database' or (t.target_type ='oracle_database' and t.type_qualifier3 != 'RACINST')) and o. member_target_guid = t.target_guid and t.target_guid = st.target_guid and st.availability_status = 'Target Up' and t.target_name not in ('????????') and not exists (select * from sysman.mgmt$metric_current i where i.target_guid = o.member_target_guid and metric_name = 'ME$ForcedLogging' and column_label = 'ForcedLogging' and metric_column = 'ForcedLogging' and collection_timestamp > sysdate - 20/1440 and value = 'YES' )

↧

How to Use the New OEM 12c Metric Extensions to Enforce Enterprise-Wide Custom Policies

August 8, 2012, 11:32 am

≫ Next: NYOUG Sept. 2012 – Reduce Your Disk Footprint by Sharing Read-Only Tablespaces

≪ Previous: Security considerations and challenges when querying OEM repository tables/views

Oracle Enterprise Manager’s User-Defined Metrics (UDMs) were a very suitable vehicle for enforcing enterprise-wide policies. My NYOUG 2011/VirtaThon presentation and white paper contain a comprehensive example – monitoring forced logging status of all production databases.

Since UDMs are gone in OEM 12c, we’ll see how to use Metric Extensions, the successor of UDMs, to accomplish the same goal.

Metric Extensions (MEs) represent a major improvement over UDMs in terms of code maintenance and ability to deploy to large number of targets. MEs, however, do not allow us to customize the SQL query for a specific target. Even though there is a good reason behind the behavior, it requires us to substantially change the architecture we have previously used.

In the original presentation, I used one UDM ( ForcedLogging ) that had different definitions for different targets.

In OEM 12c, we’ll need to have different Metric Extensions for each of the definitions and target types.

For most single instance DB targets, we’ll use the standard “ForcedLogging” ME that is defined with this query:

select force_logging from v$database

For most RAC DB targets, we’ll use “ForcedLoggingRAC” ME that is defined with the same query:

select force_logging from v$database

And if we want to ignore the “force logging” status of a database, then we can use ForcedLoggingIgnore” ME defined with this query:

select 'YES' from v$database

The query/Metalized View that actually enforces the enterprise-wide force logging policy gets a little more complex:

CREATE MATERIALIZED VIEW "SYSMAN"."UDM_FORCED_LOGGING" ("CNT") ORGANIZATION HEAP PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE "MGMT_TABLESPACE" BUILD IMMEDIATE USING INDEX REFRESH COMPLETE ON DEMAND START WITH sysdate+0 NEXT SYSDATE + 1/96 USING DEFAULT LOCAL ROLLBACK SEGMENT USING ENFORCED CONSTRAINTS DISABLE QUERY REWRITE AS SELECT COUNT( member_target_guid ) cnt FROM SYSMAN.MGMT$GROUP_DERIVED_MEMBERSHIPS O , SYSMAN.MGMT$TARGET T , sysman.MGMT$AVAILABILITY_CURRENT st WHERE o.composite_target_name = 'PRODUCTION' AND o.member_target_type IN ('oracle_database', 'rac_database') AND ( t.target_type ='rac_database' OR (t.target_type ='oracle_database' AND t.type_qualifier3 != 'RACINST')) AND o. member_target_guid = t.target_guid AND t.target_guid = st.target_guid AND st.availability_status = 'Target Up' AND (NOT EXISTS (SELECT * FROM sysman.mgmt$metric_current i WHERE I.TARGET_GUID = O.MEMBER_TARGET_GUID AND metric_name = 'ME$ForcedLogging' AND column_label = 'ForcedLogging' AND Metric_Column = 'ForcedLogging' AND collection_timestamp > sysdate - 30/1440 AND value = 'YES' ) AND NOT EXISTS (SELECT * FROM sysman.mgmt$metric_current i WHERE i.target_guid = o.member_target_guid AND metric_name = 'ME$ForcedLoggingRAC' AND column_label = 'ForcedLogging' AND Metric_Column = 'ForcedLogging' AND collection_timestamp > sysdate - 30/1440 AND value = 'YES' ) AND NOT EXISTS (SELECT * FROM sysman.mgmt$metric_current i WHERE i.target_guid = o.member_target_guid AND metric_name = 'ME$ForcedLoggingIgnore' AND column_label = 'ForcedLogging' AND Metric_Column = 'ForcedLogging' AND collection_timestamp > sysdate - 30/1440 AND value = 'YES' ));

↧

NYOUG Sept. 2012 – Reduce Your Disk Footprint by Sharing Read-Only Tablespaces

September 13, 2012, 1:43 pm

≫ Next: “ORA-20008 Number of extensions …” – what to do about this unnecessary error

≪ Previous: How to Use the New OEM 12c Metric Extensions to Enforce Enterprise-Wide Custom Policies

Thanks to all who attended my presentation (Reduce Your Disk Footprint by Sharing Read-Only Tablespaces)

at NYOUG.

We had a great discussion!

The Power Point slides are here – I did some animation, so this is the best way to look at the presentation. Here is the PDF version of the slides. The white paper is here - it is much more detailed than the presentation.

If you think that the non-supported universal method can save you disk space, feel free to call Oracle and ask them to take a look at it.

↧

“ORA-20008 Number of extensions …” – what to do about this unnecessary error

October 28, 2012, 1:03 pm

≫ Next: Same SQL_ID, same execution plan (SQL_PLAN_HASH_VALUE), greatly different elapsed time – a simple way to troubleshoot this problem with ASH

≪ Previous: NYOUG Sept. 2012 – Reduce Your Disk Footprint by Sharing Read-Only Tablespaces

There are times when Oracle puts restrictions to save us from our own bad practices. For instance, committing in triggers is forbidden so we do not make a mess, among other things.

And then there is “ORA-20008: Number of extensions in table ….. already reaches the upper limit (20)”. We can create thousands of tables, each of them with thousands of columns, but when we create column groups/extended stats, we face severe limitations – 20 or 10% of the number of columns. Why?!? It takes little disk space to accommodate those statistics. The incremental impact on the time to gather stats would be quite small as well.

In short, this restriction has no technologies or business sense, it is probably trivial to correct, but it is still there (as of 11.2.0.3.0).

Some would think that this restriction would almost never affect a “real” system. I disagree! A fact table in a data-warehouse environment could be queried in many different ways – and that’s the beauty of the Kimball approach. I have personally seen this situation in my systems a couple of times so far.

Then what can we do about it?

If the extended statics were generated automatically using the methods here

http://iiotzov.wordpress.com/2011/11/01/get-the-max-of-oracle-11gr2-right-from-the-start-create-relevant-extended-statistics-as-a-part-of-the-upgrade

http://blogs.oracle.com/optimizer/entry/how_do_i_know_what_extended_statistics_are_needed_for_a_given_workload

, then we can weed out the extended statistics that do not help the CBO a lot, and put in place only those extended stats that make a significant difference.

Since CBO can use column groups with more columns than are present in predicates (http://www.hotsos.com/sym12/sym_speakers_matuszyk.html) ,we can eliminate extended stats groups that are covered by another extended stats group.

Another way is to measure the dependent columns error (err)/correlation strength (http://iiotzov.wordpress.com/2011/07/05/getting-the-best-of-oracle-extended-statistics) and eliminate those column groups with smallest dependent columns error.

Here is an example:

create table test_tab as
with v1 as (select rownum n from dual connect by level <= 10000)
select
rownum id , mod (rownum , 100) corr_gr1_col1, mod (rownum , 25) corr_gr1_col2 , mod (rownum ,
50) corr_gr1_col3 ,
200 + mod (rownum , 11) corr_gr2_col4 , 200 + mod (rownum , 22) corr_gr2_col5 , 200 + mod (
rownum , 33) corr_gr2_col6
from
v1, v1
where
rownum <= 1000000
;

Table created.

select dbms_stats.create_extended_stats (NULL,'TEST_TAB','(corr_gr1_col1,corr_gr1_col2)') from dual

--------------------------------------------------------------------------------

SYS_STU_4QATWX9PNVNKREP$YZIDOX

… 19 more CREATE_EXTENDED_STATS statements

select dbms_stats.create_extended_stats (NULL,'TEST_TAB','(corr_gr1_col1,corr_gr1_col2,corr_gr2_col4,corr_gr2_col5,corr_gr2_col6)') from dual ;
select dbms_stats.create_extended_stats (NULL,'TEST_TAB','(corr_gr1_col1,corr_gr1_col2,corr_gr2_col
*
ERROR at line 1:
ORA-20008: Number of extensions in table SCOTT.TEST_TAB already reaches the
upper limit (20
ORA-06512: at "SYS.DBMS_STATS", line 8415 ORA-06512: at "SYS.DBMS_STATS", line 32587

We can now score each column group and decide whnich ones will remain and which ones will be dropped.

select (count(distinct(corr_gr1_col1) ) * count(distinct(corr_gr1_col3)))/count(distinct(sys_op _combined_hash(corr_gr1_col1,corr_gr1_col3))) from TEST_TAB ;

(COUNT(DISTINCT(CORR_GR1_COL1))*COUNT(DISTINCT(CORR_GR1_COL3)))/COUNT(DISTINCT(S

--------------------------------------------------------------------------------

50

… a few column group queries

select (count(distinct(corr_gr1_col1) ) * count(distinct(corr_gr2_col4)))/count(distinct(sys_op _combined_hash(corr_gr1_col1,corr_gr2_col4))) from TEST_TAB ; (COUNT(DISTINCT(CORR_GR1_COL1))*COUNT(DISTINCT(CORR_GR2_COL4)))/COUNT(DISTINCT(S -------------------------------------------------------------------------------- 1

The column group (corr_gr1_col1,corr_gr2_col4) does not add any value to the CBO, so we can drop it and create (corr_gr1_col1,corr_gr1_col2,corr_gr2_col4,corr_gr2_col5,corr_gr2_col6) instead.

↧

Same SQL_ID, same execution plan (SQL_PLAN_HASH_VALUE), greatly different elapsed time – a simple way to troubleshoot this problem with ASH

November 27, 2012, 12:37 pm

≫ Next: More Useful OEM Repository Queries

≪ Previous: “ORA-20008 Number of extensions …” – what to do about this unnecessary error

A query runs fine in one environment, but is terrible in another. The usual suspects – different execution plan, hardware issues, etc are ruled out.

The next step is to look at the actual execution statistics and see where the difference is. Re-executing the statements with GATHER_PLAN_STATISTICS hint was not an option. SQL Monitoring was running, but the query was behind the retention period.

This is where the Oracle Active Session history (ASH), a separately licenses option, comes to play.

Using sql_plan_line_id column in dba_hist_active_sess_history ASH view to get load profile is not new – http://eastcoastoracle.org/PDF_files/2011/Presentations/BarunSQL_Performance_Tuning_with_Oracle_ASH_AWR_Real_World_Use_Cases_Public.pdf .

Similar technique can be used to easily find out from which plan step onwards the two execution plans start to differ.

At DB1:
select sql_plan_line_id , count(*)
from dba_hist_active_sess_history
where
    sql_id = 'fmhbn1tn0c54z'
and sample_time
         between to_date('11/26/2012:10:00:00','MM/DD/YYYY:HH24:MI:SS’)
         and to_date('11/26/2012:11:00:00','MM/DD/YYYY:HH24:MI:SS’)
group by sql_plan_line_id
order by 1

"SQL_PLAN_LINE_ID" "COUNT(*)"
34                 135
35                 5
36                 2
37                 1
"" 2"",            2

At DB2:
select sql_plan_line_id , count(*)
from dba_hist_active_sess_history
where
    sql_id = 'fmhbn1tn0c54z'
and sample_time
        between to_date('11/25/2012:09:00:00','MM/DD/YYYY:HH24:MI:SS’)
        and to_date('11/25/2012:10:00:00','MM/DD/YYYY:HH24:MI:SS’)
group by sql_plan_line_id
order by 1

"SQL_PLAN_LINE_ID" "COUNT(*)"
33                  467 <—— Deviation
34                  135
35                  5
36                  2
37                  1
"" 2"",             2

We see that the difference starts at line 33 of the execution plan, and we can focus on finding the root cause.

↧

More Useful OEM Repository Queries

December 31, 2012, 10:49 am

≫ Next: Hotsos 2013 and some more useful OEM queries

≪ Previous: Same SQL_ID, same execution plan (SQL_PLAN_HASH_VALUE), greatly different elapsed time – a simple way to troubleshoot this problem with ASH

I already included a number of queries against the OEM repository in my presentation.
Now, I would like to add two more.

The first query shows an approximation of the average and max CPU utilization for a group of servers (names starting with “dev”).
Since the query assumed that each server has the same “weight”, the results are not completely correct for most environments.
They are a good start for a consolidation and/or virtualization projects though.

select
      a.rollup_timestamp          time,
      count(*)                     number_of_servers,
      sum(a.average)/count(*)      average_load ,
      sum(a.maximum)/count(*)      max_recorded,
      sum(a.average+3*a.standard_deviation)/count(*) max_derived
from
      mgmt$metric_hourly a ,
      mgmt$target b
where
      a.metric_name = 'Load'
and a.column_label = 'CPU Utilization (%)'
and a.target_guid = b.target_guid
and b.target_name like 'dev%'
group by a.rollup_timestamp
order by max_derived desc

The second query shows all databases that belong to the TEST OEM group and are in ARCHIVELOG mode. The query can be modified to look at all databases or to any subset.

select *
from
sysman.MGMT$GROUP_DERIVED_MEMBERSHIPS O ,
    sysman.MGMT$TARGET T ,
    sysman.MGMT$AVAILABILITY_CURRENT st
where
     o.composite_target_name = 'TEST'
and  o.member_target_type IN ('oracle_database', 'rac_database')
and ( t.target_type ='rac_database'
       or (t.target_type ='oracle_database'
           and t.type_qualifier3 != 'RACINST'))
and o. member_target_guid = t.target_guid
and t.target_guid = st.target_guid
and st.availability_status = 'Target Up'
and (not exists
      (select
             *
       from
            sysman.mgmt$metric_current i
       where
             i.target_guid = o.member_target_guid
       and   metric_name = 'archFull'
       and   metric_Column = 'archTotal'
       and   metric_label = 'Archive Area'
       and   column_label = 'Total Archive Area (KB)'
       and   key_value    = 'NOARCHIVELOG'
    ));

This query is not strictly based on the documentation, but it work in OEM 12c.

↧

Hotsos 2013 and some more useful OEM queries

January 31, 2013, 4:10 pm

≫ Next: Dependent/correlated sets in Oracle – definition, problems and solutions.

≪ Previous: More Useful OEM Repository Queries

Busy preparing for Hotsos 2013. My presentation this year is “Working with Confidence: How Sure Is the Oracle CBO about Its Cardinality Estimates, and Why Does It Matter?” I know it is a mouthful. The presentation/white paper would be up to the point with lots of pictures though

The only thing I have for the blog now is a revised version of a query I posted the last time. The original version was way too crude.

select a.rollup_timestamp ,

count(*) ,

sum(c.cpu_count) ,

sum(a.average*c.cpu_count)/ sum(c.cpu_count) ,

sum(a.maximum*c.cpu_count)/ sum(c.cpu_count) ,

sum((a.average+3*a.standard_deviation)*c.cpu_count) / sum(c.cpu_count)

from

mgmt$metric_hourly a ,

mgmt$target b ,

sysman.MGMT_ECM_HW c

where a.metric_name = 'Load'

and a.column_label = 'CPU Utilization (%)'

and a.target_guid = b.target_guid

and b.target_name like 'dev%'

and c.hostname||'.'||c.domain = b.target_name

and c.vendor_name = 'Intel Based Hardware'

group by a.rollup_timestamp

order by 5 desc

The query now takes into account the number of CPUs on a server. I use the undocumented MGMT_ECM_HW view – it contains a wealth of information.

The query assumes that all CPUs are identical – not perfect, but much better than the original.

↧

Dependent/correlated sets in Oracle – definition, problems and solutions.

February 27, 2013, 8:48 am

≫ Next: Hotsos Symposium 2013

≪ Previous: Hotsos 2013 and some more useful OEM queries

To better understand dependent/correlated sets, let’s take a brief look at dependent/correlated columns.

Oracle works under the assumption that the data in each column is independent. If an equality predicate on column COL1 leaves 10% of the records, and an equality predicate on column COL2 leaves 20% of the records, then by default, the Oracle CBO would assume that predicates on both COL1 and COL2 would leave 2 % (10%x20%) of the records. If the data is COL1 is correlated with the data in COL2 then equality predicates on both COL1 and COL2 would leave significantly more than 2% of the records. That difference in estimated cardinality could cause huge troubles when estimating consequent execution steps.

Oracle recognized the problem and came with a comprehensive solution – extended statistics in Oracle 11g.

Now about dependent/correlated sets – two sets are dependent/correlated if they contain the same or similar records. Oracle assumes that sets are independent. If a predicate like this “COL1 IN (select.. from .. SET1 ) “ leaves 10% of the records, and a similar predicate “COL1 IN (select.. from .. SET2 ) “ leaves 20% of the records, then Oracle assumed that both predicates “COL1 IN (select.. from .. SET1 ) “ AND “COL1 IN (select.. from .. SET2 ) “ would leave 2%(10%x20%) of the records. If SET1 is identical or very similar to SET2, then the “COL1 IN (select.. from .. SET1 ) “ AND “COL1 IN (select.. from .. SET2 ) “ would leave significantly more than 2% of the records. Needless to say, the difference in estimated cardinality could cause huge troubles when estimating consequent execution steps.

The big challenge is that there is nothing out of the box that can help us in the above scenario.

Tables TAB1 and TAB2 would provide us with the correlated sets (SET1,SET2)

create table tab1 as
with generator as (
select
  rownum id
from dual
connect by
  rownum <= 4000
)
select
id col1,
mod(id,1024) col2 ,
mod(id,2) col3
from (
   select
     /*+ no_merge */
     rownum id
   from
     generator,
     generator
   where
     rownum <= 1000000
)
;

create table tab2 as
with generator as (
select
rownum id
from dual
connect by
rownum <= 4000
)
select
  id col1,
  mod(id,1024) + 134 col2 ,
  mod(id,2) col3
from (
   select
     /*+ no_merge */
     rownum id
   from
     generator,
     generator
   where
rownum <= 1000000
)
;

TAB3 would be the table we are going to apply the filters with those correlated sets

create table tab3 as
with generator as (
select
rownum id
from dual
connect by
rownum <= 4000
)
select
id col1
from (
   select
     /*+ no_merge */
     rownum id
   from
     generator,
     generator
   where
rownum <= 1000000000
)

Gather stats:

exec dbms_stats.gather_table_stats(NULL,'TAB1');

exec dbms_stats.gather_table_stats(NULL,'TAB2');

exec dbms_stats.gather_table_stats(NULL,'TAB3');

The query we are interested in is

select
    t3.col1
from
    tab3 t3 ,
    tab1 t1 ,
    tab2 t2
where
    t3.col1 = t1.col1
and t1.col2 in (66,166,316,416,516,616)
and t3.col1 = t2.col1
and t2.col2 in (200,300,450,550,650,750)
and t1.col3 = t2.col3

Please note that the sets that come from TAB1 and TAB2 are identical, so this clause

t3.col1 = t1.col1 and t3.col1 = t2.col1

would eliminate significantly fewer records that Oracle CBO’s estimate.

Due to the described dependent/correlated sets behaviors, the CBO (DBMS_XPLAN.DISPLAY_CURSOR) estimates that only 17 records will be returned by the query, even though the query returns 5861 records.

---------------------------------------------------------
| Id | Operation         |Name |Rows |Bytes |Cost (%CPU)|
---------------------------------------------------------
| 0 | SELECT STATEMENT  |     |     |      |8182 (100) |
|* 1 | HASH JOIN         |     |17   |510   |8182 (4)   |
|* 2 | TABLE ACCESS FULL |TAB2 |5859 |70308 |657 (4)    |
|* 3 | HASH JOIN         |     |5872 |103K |7524 (4)   |
|* 4 | TABLE ACCESS FULL |TAB1 |5859 |70308 |653 (4)    |
| 5 | TABLE ACCESS FULL |TAB3 |16M |91M   |6786 (2)   |
---------------------------------------------------------

Predicate Information (identified by operation id):
—————————————————

1 - access("T3"."COL1"="T2"."COL1" AND "T1"."COL3"="T2"."COL3")
2 - filter(("T2"."COL2"=200 OR "T2"."COL2"=300 OR "T2"."COL2"=450 OR
"T2"."COL2"=550 OR "T2"."COL2"=650 OR "T2"."COL2"=750))
3 - access("T3"."COL1"="T1"."COL1")
4 - filter(("T1"."COL2"=66 OR "T1"."COL2"=166 OR "T1"."COL2"=316 OR
"T1"."COL2"=416 OR "T1"."COL2"=516 OR "T1"."COL2"=616))

Dynamic sampling, even at the max level, actually makes the estimate worse

exec dbms_stats.delete_table_stats(NULL,'TAB1');

exec dbms_stats.delete_table_stats(NULL,'TAB2');

exec dbms_stats.delete_table_stats(NULL,'TAB3');

alter session set optimizer_dynamic_sampling=10

---------------------------------------------------------
| Id | Operation        |Name |Rows |Bytes |Cost (%CPU)|
---------------------------------------------------------
| 0 | SELECT STATEMENT |     |     |      |8156 (100) |
|* 1 | HASH JOIN         |     |1    |91    |8156 (4)   |
|* 2 | TABLE ACCESS FULL |TAB2 |5861 |223K |644 (3)    |
|* 3 | HASH JOIN         |     |5861 |297K |7511 (4)   |
|* 4 | TABLE ACCESS FULL |TAB1 |5861 |223K |640 (3)    |
| 5 | TABLE ACCESS FULL |TAB3 |16M |198M |6786 (2)   |
---------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

Note
-----
- dynamic sampling used for this statement (level=10)

One way to resolve this is to force the execution order, so TAB3 is visited before TAB2 or TAB1.

Another solution is to create a virtual column identical to COL1

alter table tab3 add col1_corr generated always as (col1*1 ) not null ;

,gather stats

exec dbms_stats.gather_table_stats(NULL,'TAB1');

exec dbms_stats.gather_table_stats(NULL,'TAB2');

exec dbms_stats.gather_table_stats(NULL,'TAB3');

Then set the number of distinct records for that new virtual column to 1

exec dbms_stats.set_column_stats(NULL,'TAB3','COL1_CORR',distcnt=>1);

and use the new virtual column (COL1_CORR), instead of COL1, for one of the predicates

select
t3.col1
from
    tab3 t3 ,
    tab1 t1 ,
    tab2 t2
where
    t3.col1_corr = t1.col1
and t1.col2 in (66,166,316,416,516,616)
and t3.col1 = t2.col1
and t2.col2 in (200,300,450,550,650,750)
and t1.col3 = t2.col3

Now, the CBO expects 2936 records – much better than the original 17 records.

---------------------------------------------------------
| Id | Operation         |Name |Rows |Bytes |Cost (%CPU)|
---------------------------------------------------------
| 0 | SELECT STATEMENT |     |     |      |8182 (100) |
|* 1 | HASH JOIN         |     |2936 |103K |8182 (4)   |
|* 2 | TABLE ACCESS FULL |TAB1 |5859 |70308 |653 (4)    |
|* 3 | HASH JOIN         |     |5872 |137K |7528 (4)   |
|* 4 | TABLE ACCESS FULL |TAB2 |5859 |70308 |657 (4)    |
| 5 | TABLE ACCESS FULL |TAB3 |16M |183M |6786 (2)   |
---------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("COL1"*1="T1"."COL1" AND "T1"."COL3"="T2"."COL3")
2 - filter(("T1"."COL2"=66 OR "T1"."COL2"=166 OR "T1"."COL2"=316 OR
"T1"."COL2"=416 OR "T1"."COL2"=516 OR "T1"."COL2"=616))
3 - access("T3"."COL1"="T2"."COL1")
4 - filter(("T2"."COL2"=200 OR "T2"."COL2"=300 OR "T2"."COL2"=450 OR
"T2"."COL2"=550 OR "T2"."COL2"=650 OR "T2"."COL2"=750))

Setting the number of distinct values for COL1_CORR to 1 forced the CBO to believe that the clause on COL1_CORR would not eliminate any records. That’s the only way (for me anyway) to tell the optimizer that the COL1_CORR clause would not further reduce the number of records.

Interestingly, I get the same result when for any value of distcnt …

↧

Hotsos Symposium 2013

March 14, 2013, 12:04 pm

≫ Next: Hotsos Symposium 2013 continued

≪ Previous: Dependent/correlated sets in Oracle – definition, problems and solutions.

It was a great pleasure presenting at Hotsos.

Thank you for attending my presentation – your comments and questions were quite insightful. I owe a bit more explanation to the lady who asked about the mechanism/formula for passing filter errors into the resulting set. I’ll get more information about that soon – stay tuned.

Title:

Working with Confidence: How Sure Is the Oracle CBO about its Cardinality Estimates, and Why Does It Matter?

Abstract:

The Oracle CBO is tasked to provide an execution plan for every syntactically correct statement. To tackle this very broad requirement, the CBO has to come up with selectivity coefficients for every possible type of predicate. In some cases, the selectivity coefficients are deeply rooted in math and common sense — we know for a fact that lookup by primary key would result in one or zero records.

In other cases, however, the CBO has to take a guess, sometimes even a wild guess, on how much a particular predicate would affect the cardinality of the resulting set. A simple LIKE clause with a leading wildcard would force the CBO to default the selectivity coefficient to around 5%, a guess that could be very far from reality.

To measure the level of guesswork employed by the CBO, the notion of “confidence” level of a cardinality estimate is introduced. Estimates of the confidence level the Oracle CBO should be getting for common SQL predicates are presented. The effects of joins on the confidence levels are reviewed as well as methods to reduce the level of guesswork the optimizer has to employ.

The practical implications of considering CBO’s confidence in its cardinality estimates are discussed in detail. Guidelines on how to vet new SQL and PL/SQL features, designs, and coding standards are given to aid the CBO in avoiding unnecessary guesses about predicate selectivity. A technique for analysis and resolution of performance issues based on dealing with constructs that force the CBO to make guesses is presented.

Files:

The presentation is here.

The white paper is here. It is quite comprehensive, with lots of details and references.

Use XPLAN_CONFIDENCE package entirely at your own risk:

The deployment instructions are here. The spec of the package is here, and the body is here.

↧

Hotsos Symposium 2013 continued

April 19, 2013, 1:43 pm

≫ Next: Measuring the Benefits of Database Virtualization/Thin Provisioning Solutions

≪ Previous: Hotsos Symposium 2013

During my presentation at Hotsos 2013, an attendee questioned my statement that each filter/selection predicate contributes to the overall cardinality error. She believed that only the predicate with the maximum error mattered.

I believe the following paper supports my position:

http://www.btw-2013.de/proceedings/Taking%20the%20Edge%20off%20Cardinality%20Estimation%20Errors%20using%20Incremental%20Execution.pdf

I got the Hotsos 2013 “Speaker scoring and verbatim comments” information a couple of weeks ago.

I am happy that the presentation was very well received!!!

Thanks for your feedback. I appreciate all your comments. They would not only making this presentation better, but also improve my skills as a speaker.

Here are some of the comments:

This was an intriguing session, but I left feeling like there was still much more to be said on the subject.

There is indeed much more to be said. The problem with digging deeper in the subject is that it gets quite academic and might be less interesting for the audience. The article I mentioned earlier in the post, as well as Ioannidis Y. And S. Christodoulakis (1991) paper cited in the reference can be a good starting point for further research.

is this accurate, did Maria review this info?

Good question

I am not sure who reviewed the paper before the presentation, but I believe that Maria Colgan was in audience. I too will be very interested in her feedback…

He spent a lot of time talking about Terradata, The package he showed seemed way to simplistic to be of any real practical use. He did show some creative ideas on solving some issues, however there wasn’t a real example to show how to do what he purposed.

The package is quite limited for most practical uses. It was intended to be for “demonstration purposes only”.

Fair point about examples though. I’ll try to publish some examples in this blog in the coming months. Stay tuned…

↧

Measuring the Benefits of Database Virtualization/Thin Provisioning Solutions

May 14, 2013, 11:57 am

≫ Next: Some more useful undocumented OEM 12c repository tables/views

≪ Previous: Hotsos Symposium 2013 continued

Overview

Database virtualization and thin provisioning are new and powerful ways to share/reuse storage among non-production database copies.

Solutions for reducing the disk footprint of non-production Oracle databases range from complete database virtualization (Delphix, etc), to solutions build around storage vendor’s features (EMC,NetApps, etc), to purely software solutions (cloneDB, etc).

Sharing read-only tablespaces – a “poor man’s” database virtualization – is less known approach for saving disk space and improving refresh times in non-production environments. Here is a link to the presentation I did at NYOUG (2012) about this topic.

What are we measuring and why?

In general, most solutions are quite effective, both in terms of storage and refresh time, in delivering a copy of the production database.

Once the copy is delivered, it is open for users and batch jobs, so it starts changing. Those changes belong to the new database and therefore cannot be shared with the other non-production databases. That means that they consume “real” disk space. The more DB changes since refresh, the lower the benefits of the DB virtualization/thin provisioning solution.

Measuring the amount of change after a refresh is important in understanding how much disk space would be needed for the non-production Oracle environments after DB virtualization/thin provisioning is in place. The number is essential in computing ROI of the project.

How are we measuring it?

One way to measure the change is to run an incremental RMAN backup and see how big the resulting backup file(s) is. In many cases, however, that is not doable.

The method described here works only on 11gR2. It only requires the SCN of the DB at the time of refresh.

The information can be find in V$DATABASE (RESETLOG_CHANGE#) for databases created by RMAN DUPLICATE.

If the data was transferred with DataPump, or other similar method, a suitable SCN can be found in V$ARCHIVED_LOG (FIRST_CHANGE#, NEXT_CHANGE#).

The new DBVerify feature allows us to utilize HIGH_SCN parameter to find out how many blocks where changes since HIGH_SCN.

Let’s see how many blocks were modified after SCN 69732272706.

It is really easy:

dbv userid=sys/?????????? file=/tmp/test.dbf HIGH_SCN=69732272706

Page 64635 SCN 1012797077 (16.1012797077) exceeds highest scn to check 1012795970 (16.1012795970)

Page 66065 SCN 1012796687 (16.1012796687) exceeds highest scn to check 1012795970 (16.1012795970)

Page 66187 SCN 1012796687 (16.1012796687) exceeds highest scn to check 1012795970 (16.1012795970)

Page 87759 SCN 1012796692 (16.1012796692) exceeds highest scn to check 1012795970 (16.1012795970)

DBVERIFY - Verification complete

Total Pages Examined : 93440

Total Pages Processed (Data) : 62512

Total Pages Failing (Data) : 0

Total Pages Processed (Index): 9399

Total Pages Failing (Index): 0

Total Pages Processed (Other): 6119

Total Pages Processed (Seg) : 1705

Total Pages Failing (Seg) : 0

Total Pages Empty : 13705

Total Pages Marked Corrupt : 0

Total Pages Influx : 0

Total Pages Encrypted : 0

Total Pages Exceeding Highest Block SCN Specified: 39

Highest block SCN : 1012797131 (16.1012797131)

In this case, only 39 blocks (out of 100K) were modified after SCN 69732272706.

Please note that even though DBVerify (dbv) can work with open DB files, it will not account for changed blocks that were not yet written to the file.

Also, as per MOS document 985284.1 – ” Why HIGH_SCN Option Doesn’t Work While Running DBV Against ASM Files?” – the HIGH_SCN flag works only for files that are not stored in ASM. Hope they fix that problem soon.

↧

Some more useful undocumented OEM 12c repository tables/views

June 28, 2013, 8:48 am

≫ Next: Oracle 12c Adaptive Execution Plans – Do Collected Stats Still Matter?

≪ Previous: Measuring the Benefits of Database Virtualization/Thin Provisioning Solutions

As mentioned in a previous post, MGMT_ECM_HW contains a wealth of information about hosts in OEM.

The problem is that CLOCK_FREQ_IN_MHZ column in MGMT_ECM_HW is not properly populated. All hosts had the value of 100 there – an obvious mistake.

That prompted me to do a search through the OEM repository objects. This is what I found:

MGMT_ECM_HW_CPU is similar to MGMT_ECM_HW, but FREQ_IN_MHZ column is properly populated. No need to parse descriptions to get the CPU’s speed

MGMT_ECM_HW_NIC – lots of information about network cards

MGMT_ECM_OS – OS related information

MGMT_ECM_HW_IOCARD – information about IO peripheral devices

↧

Oracle 12c Adaptive Execution Plans – Do Collected Stats Still Matter?

July 31, 2013, 1:05 pm

≫ Next: NYOUG Fall General Meeting (2013)

≪ Previous: Some more useful undocumented OEM 12c repository tables/views

Adaptive execution plans is one of the most exciting new features in Oracle 12c. Now, in Oracle 12c, the optimizer can pick the join method based on the actual amount of data processed, rather than the estimates it had at compile time. Does that mean that the estimated cardinalities/set sizes and the table/index statistics they are based upon no longer matter?

Not so! Here is why:

Even though Oracle uses actual information from the STATISTICS COLLECTOR step, it also uses lots of estimated information to decide the join strategy. I am not aware of the specific formula used to select the join type, but I think it is reasonable to assume that Oracle uses the estimated cost of a Nested Loop and the estimated cost of a Hash Join for that formula. Those estimated costs are based on collected data dictionary statistics (DBA_TABLES, DBA_INDEXES,..).

Here is an example:

Table TEST1 has 20 million records. The data dictionary statistics (DBA_TABLES, etc ) for TEST1 represent the data in the table.

Table TEST2 has 2 million records. The data dictionary statistics (DBA_TABLES, etc ) for TEST2 represent the data in the table.

This sample query

SELECT sum(length(TEST1.STR))

FROM TEST1 ,

TEST2

WHERE

TEST1.ID = TEST2.ID

generates an adaptive plan, whoch picks Hash Join, as expected.

Let’s see at how much data we need to have in TEST2 in order to switch to from Nested Loops to Hash Join.

To do that, we need to truncate TEST2 without re-gathering stats. Also, we need to create a loop that inserts data in TEST2 (without stats gathering) and saves the execution plan.

For my case, using this setup, we can see that the switch from NL to HJ happens when TEST2 has approx. 11500 cords.

The switch point depends on the TEST1 stats, not the actual data in TEST1.

We can double the data in TEST1 and we’ll get the same result as long as the stats are not re-gathered.

Conversly, we can delete 70% of the data in TEST1 and we’ll get the same result as long as the stats are not re-gathered.

In short, the cardinality estimates and the stats they are based upon are still very important.

If the actual cardinality of the second set in a join is higher than the estimated cardinality , then for some values Oracle would be using HJ, even though NL would produce better results.

Conversely, if the actual cardinality of the second set in a join is lower than the estimated cardinality, then for some values Oracle would be using NL, even though HJ would produce better results.

P.S.

I was not able to get the Adaptive Execution Plan to work properly from a PL/SQL procedure. I had to write a shell script…

↧

NYOUG Fall General Meeting (2013)

August 30, 2013, 11:15 am

≫ Next: New York Oracle User Group (NYOUG) Fall 2013 Conference

≪ Previous: Oracle 12c Adaptive Execution Plans – Do Collected Stats Still Matter?

I am excited to present at the NYOUG Fall General Meeting (2013). I’ll be speaking about volatile tables in Oracle. They are not common, yet they can wreak havoc on the database performance when present.

The presentation has some improvements over the one I gave at Hotsos a couple years back. It covers relevant Oracle 12c features as well.
I’ll be happy to see you there!

↧

New York Oracle User Group (NYOUG) Fall 2013 Conference

September 12, 2013, 1:27 pm

≫ Next: The Cost of Adaptive Execution Plans in Oracle (a Small Study)

≪ Previous: NYOUG Fall General Meeting (2013)

Thanks to all who attended my session at NYOUG. As usual, it was a great experience!

Download the presentation here and the white paper here.

You can also download the spec here and the body here of JUST_STATS package.
You can download the README file here andf some examples of gathering stats in trigger here.

↧