Clustering factor and unique key
Clustering Factor - A Awesome Simple Explanation of how it is calculated:
Basically, CF is calculated by performing a full index scan and looking at the rowid of each index record. If the table block is referenced different from the previous index entry, CF is incremented. If the reference block of the table is the same as the previous index, CF is not incremented. Thus, CF gives an indication of how well the data in the table is ordered with respect to the index entry (which is always sorted and stored in the index entry order). The better (lower) the CF, the more efficient it would be to use the index, since fewer table blocks would have to be accessed to get the required data through the index.
My index stats:
So here are my analyzes (index on one column).
The starting index PK_
is my primary key and UI
is the unique key. (Of course both have unique values)
Query1:
SELECT index_name,
UNIQUENESS,
clustering_factor,
num_rows,
CEIL((clustering_factor/num_rows)*100) AS cluster_pct
FROM all_indexes
WHERE table_name='MYTABLE';
Result:
INDEX_NAME UNIQUENES CLUSTERING_FACTOR NUM_ROWS CLUSTER_PCT
-------------------- --------- ----------------- ---------- -----------
PK_TEST UNIQUE 10009871 10453407 96 --> So High
UITEST01 UNIQUE 853733 10113211 9 --> Very Less
We can see that PK has the highest CF while the other unique index does not.
The only logical explanation that amazes me is that the data below is actually stored in column order above the Unique index.
1) Am I correct with this understanding?
2) Is there a way to give PK, the lowest CF
number?
3) Viewing the cost of a query using both of these indices is very fast for individual samples. But still, the CF number is what worries us.
The table is relatively huge at 10M records and also receives realtime inserts / updates.
My database version is Oracle 11gR2, over Exadata X2
source to share
You see the heap table data indexed by an ordered tree structure.
To get extremely low CF numbers, you will need to order the data according to the index. If you want to do this (for example SQL Server or Sybase clustered indexes), in Oracle you have several options:
- Just create additional indexes with additional columns that can suit your general queries. Oracle can return a result set from an index without a reference to the underlying table as long as all required columns are in the index. If possible, consider adding columns to the end of your PC to serve your heaviest query (practical if your query has a small number of columns). This is usually the recommended way to change all of your tables in the IOT.
- Use IOT (Indexed Ordered Table) is a table stored as an index, so it is ordered using a primary key.
- A sorted hash cluster is more complex, but can also be profitable when accessing a list of entries for a specific key (like a bunch of text messages for a given phone number).
- Reorganize your data and keep the records in the table in the order of your index. This option is ok if your data does not change and you just want to change the order of the heap, although you cannot explicitly control the order; all you can do is order a query and let Oracle add it to a new bucket.
If most of your access patterns are random (OLTP) for single record access, I wouldn't bother with the clustering factor. It's just a metric that is neither bad nor good, it just depends on the context and what you are trying to accomplish.
Always remember that Oracle issues are not SQL Server issues, so make sure any design change is justified by performance measurement. Oracle is very parallel and very weak. Its design with multiple concurrency versions is very efficient and different from other databases. However, it's still good practice to set up ordering data for sequential access if that's your common use case.
For some advice on the subject, read Ask Tom: What are oracle clusters and non-clustered indexes
source to share