An initiative of Mobiliar IT

Digging for the perfect In Memory Query – Part 1

Last week I held a webinar for the German Oracle User Group (DOAG) about our journey here @Mobiliar with Oracle In Memory (you may find the recordings of the webinar in German here). Five years have passed, and we are still only using the In Memory option for our Risk Management Application (RICO). You can find more information about our journey in the presentation I held (DOAG_INMEMORY_Mobiliar).

But why only one database? Well, it’s a question of priorization and effort needed to find the fitting application for it. First you have to look closer at the database, find the most executed queries, look at the execution plans and then try to imagine which queries could benefit from Oracle In Memory. This obviously takes time and concentration, and normally, in the daily craziness, you do not have the time for it. After you have found the queries you’ll get the performance benefits very fast. You only have to put the objects (tables, partition or even columns) in the object store and begin enjoying. It is about two years now that I’ve been thinking about how to identify the applications that can profit from In Memory, with the lowest effort possible in searching for them. As I wrote in another article we have built our own performance warehouse here at Mobiliar: we have a centralized copy of all queries, all execution plans and execution statistics of all our databases over a period of five years. Could it really be so difficult?

I’ve invested some time to prepare a demo for the webinar and I will now describe the findings during the preparation of this demo and how our performance warehouse can be involved by calling the first step to find out objects that would fit to be put in memory, basing on executed queries.

I took a big table from our RICO database and tried to execute a query on it, that -theoretically – should be perfect for In Memory.

This table called TPA9490 has 378 columns (!) and every column is a numeric field.
Values are well distributed within the table. So that if we tried to put this table In Memory and perform a sum on a column with a filtering, we should certainly profit from the In Memory option.

Let’s have a look at the column store containing the table: We can see that the table is completely in the column store and has a very high compression ratio: 3 GB in the column store, 128 GB on Disk. That’s very good!

SQL> select SEGMENT_NAME, INMEMORY_SIZE/1024/1024/1024  in_memory_giga, 
BYTES/1024/1024/1024 as disk_giga, INMEMORY_COMPRESSION, populate_status 
from v$im_segments where segment_type = 'TABLE' order by 1;
SEGMENT_NA IN_MEMORY_GIGA  DISK_GIGA INMEMORY_COMPRESS POPULATE_STAT

---------- -------------- ---------- ----------------- -------------

TPA9490        2,92462158 127,828751 FOR CAPACITY HIGH COMPLETED

So if we try on the column C96624 and issue the following query, we immediately see
that Oracle is performing the operation towards the In Memory column store with incredible performance.

SQL> select sum(C96624) from  AOO_RICO.TPA9490 where C864BSAJ > 100;

SUM(C96624)

-----------

 11083

Elapsed: 00:00:00.05

plan1

So far, so good. To be sure, I tried this with another column of the table:

select /*+parallel(8)*/  sum(C96624)from AOO_RICO.TPA9490 where c96624 > 1;

SUM(C96624)

-----------

6,0517E+11

Elapsed: 00:00:09.26

I got 9 seconds???

I immediately had a look at the execution plan and was astonished as I noticed that the query is using an index

plan2.2

Why did the optimizer decide to use this index? How is the index defined?

SQL> select index_name, column_name from dba_ind_columns 
where INDEX_NAME = 'I_XPA94902';

INDEX_NAME           COLUMN_NAME
-------------------- --------------------
I_XPA94902           C96624

The index is defined only on one column, the column we are querying, C96624.

SQL> select segment_name, bytes/1024/1024/1024 
from dba_segments where segment_name = 'I_XPA94902' ;

SEGMENT_NA BYTES/1024/1024/1024
---------- --------------------
I_XPA94902           2,24707031

And on disk it occupies a bit less than the whole table in the column store.

Indeed an index on only the selected column sounds very sexy to the optimizer. Nevertheless, a Full Index Scan may lack in performance, for example because it will perform I/O if the index does not reside totally in the buffer cache.
A SQL_AWR report helps us to identify the wait events on this query. As we imagined the operation is slow due to I/O – 300’000 Disk Reads.

SQL_AWR1

Well, we could not start tuning in the conventional way, put the index in the Keep Pool in order to avoid I/O at all (keep in mind: You will reserve 2,2 GB for the index in the Buffer Cache compared to 2.9 GB for the whole table in the column store). Avoiding I/O at all, we broke down the execution time to 2.36 seconds, performing 300’000 Buffer Gets.

SQL_AWR2

But what would have happened without the index?
I tried to set the Index invisible and execute the query again.

SQL> alter index aoo_rico.I_XPA94902 invisible;

The result this time is less than a second (!), and Oracle is querying the column store.

Please note that the cost of the operation raised, from 11’128 with the Index Fast Full Scan to 268’000 with the In Memory Plan. Quite a big difference! A much higher cost for the In Memory operation, but In Memory is about 100 times faster compared to the Full Index Scan when performing I/O and 30 times faster compared to the Full Index Scan without I/O.

SQL> select /*+  monitor parallel(8)*/  sum(C96624) 
from  AOO_RICO.TPA9490 where c96624 > 6000;

SUM(C96624)
-----------
1,3056E+10
Elapsed: 00:00:00.15

plan3

Very interesting – the AWR_SQL Report shows only 800 Buffer Gets!!!

SQL_AWR3

So the first question that I ask myself is:

Why does the optimizer think that performing a Full Index Scan is much cheaper than performing the operation directly on the column store?

I cannot answer the question at this time. Maybe it works as designed, maybe we have a misconfiguration in our database parameters or maybe it is a bug. As we are a reference customer for In Memory I will talk about this to the In Memory Product Management. This will be exciting!

The next question I ask myself is: Do we really need an index on a single column if this column is In Memory?

I changed the index again to visible and executed another, more selective query, forcing an Index Range Scan rather than a Full Index Scan

SQL> alter index aoo_rico.I_XPA94902 visible;
SQL> select /*+  monitor parallel(8)*/  sum(C96624) 
      from  AOO_RICO.TPA9490 where c96624 > 6000;

SUM(C96624)
-----------
1,3056E+10

Elapsed: 00:00:00.58

 

plan4

Indeed the execution time was less than a second. Again I show you the comparison with the In Memory Option after having set the index to invisible again:

SQL> alter index aoo_rico.I_XPA94902 invisible;

Index altered.

Elapsed: 00:00:00.03

SQL> select /*+  monitor parallel(8)*/  sum(C96624) 
from  AOO_RICO.TPA9490 where c96624 > 6000;

SUM(C96624)
-----------
1,3056E+10

Elapsed: 00:00:00.15

 

plan5

The column store is a bit faster than the Index Range Scan. That’s good, as we do not see any degradation with the index set to invisible on the query.

Next question: Could we drop the index? The application could benefit on the queries performing the full index scan, and on dml operations the database would not have to maintain the index, either.

To decide wether to drop the index or not, we first have to answer other questions: In which other query is the index used? Would other queries degrade?
Can we determine all the queries in our database were the query is used?

We can monitor the index with the monitoring usage or with the new index usage tracking feature in 12.2,  but the only information we collect with this is if the index is used or not, we learn nothing about the query itself that uses the index. And we would like to have a look at the queries.

We could look in our AWR and search in the dba_hist_sql_plan to find out the plans and the sql_ids the index is used in. But normally AWR covers only a short period of time, what about the monthly or even the yearly proceedings? Could we have a performance degradation here if we dropped the index? Can we risk that?

Fortunately we have a performance warehouse here at Mobiliar, containing the whole execution plans of all the queries over all instances of the last 5 years.

We issue the following query on our performance warehouse: the inner query searches for all sql ids in the plans containing our index. The outer query shows us the first characters of the sql text.

select sql_id, to_char(substr(sql_text, 1,300)) 
from aoo_awr.dba_hist_sqltext 
where sql_id in (
                 select /*+ parallel(8) */ distinct a.sql_id  
                  from aoo_awr.dba_hist_sql_plan a 
                   where a.object_name = 'I_XPA94902');

 

sqls

 

Exactly 442 distinct sql ids were recorded in our warehouse and as we can see from the screenshot above, most of the queries follow the same pattern. It will not be very difficult
to find out what would be the consequences of dropping this index.

Summarizing, the next steps we will perform will be:

Step 1: Talk to the Oracle In Memory product management and find out what they think about the optimizer costs for In Memory scans compared to Index Scans.

Step2: Have a look at the statements using the index and probably drop it.

Step3: Try to scan our performance warehouse to identify queries that could fit In Memory and see if the assumption of substituting indexes with memory scans could be a good approach.

More in the next article, so stay tuned!

 

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Basic HTML is allowed. Your email address will not be published.

Subscribe to this comment feed via RSS

%d bloggers like this: