The table is the index and the index is the table
Sorry, could you repeat that?
It took me some time and reading to fully understand that, and, even more, to see the benefits of it. After trying them out, and sometimes stubbornly insisting on understanding all elements behind it (i guess our DBA is starting slowly to grow white hair because of me) , i now have a good understanding of them, how they work, when should they be used, what are the advantages and disadvantages over regular heap tables, and so on.
Although a lot of information has sum up after reading all kind of articles and books, a lot of what i am going to write below is based on personal opinions, benchmarks, tests that i have performed, and even more “try and err”
What is an IOT – IOT vs HEAP TABLES
In order to understand this one, you need to understand once more the basic principle behind a heap organized table. Imagine the heap table as an empty bus waiting for its passengers. Each of the passengers will take place in the first empty place, so that the last place, in the tail of the bus will be seated only after the last passenger has embarked the bus. Every new passenger gets to be seated to the end of the tail. This is how HEAP tables work. New data will be inserted in the tail, and no logic organization of data is being taken care of.
Imagine a family of two, where the mother boards the bus first, and the child is the last to board the bus. Now, in order to control the tickets of the family, you will need to visit all rows of the bus. If the bus had 10 lines of seats, and you only needed to control the tickets for this family, you would need to visit all 10 lines.
The simple solution for the problem above is of course building an index containing some sort of family id (of course an index is not suitable for such a small population, but let’s just use this for now) Now, in order to get to the members of the family, you need to visit the index and extract the seat numbers. Only then you can go to each seat number and control the ticket.
Let’s do a sum up of the steps needed: go to the index (root + branch + block = 3 I/O); extract the row id’s from the index; go to the seats (2 I/O, one for each member) – that makes a total of 5 I/O’s
Now, let us imagine that instead of the bus, we have a transatlantic 747, with rows of up to 9 people. Also imagine this family had 9 members. Spreading them all over the plane would be quite costly (heap organized table), therefore you try organizing them altogether, placing them all in a row. This is how IOT works.
IOT organizes the data logically, grouping data belonging to the same PK (primary key) into the same block (or several adjacent blocks) This way, getting to the family will actually only need getting to the row. The whole family is there, on the row, and hey…it’s even placed in ascending birthday order!
Let’s do a sum up of the steps needed: go to the index (root + branch + block = 3 I/O); … that’s it. The data IS STORED in the index, but ORGANIZED after the primary key. All data is stored in one block so you only need to do one I/O to read the data. So instead of 5 I/O (like in the heap organized table), you need only two. Additionally, you save the extra table I/O by ROW ID, which costs much more compared to the index I/O (all data is stored in the index, getting to the index means getting to the data)
What is two compared to five you would ask…it’s a huge difference if we use the same family model, and we are talking about a 40 members family.With an IOT, depending on the size of the data stored you could still have the chance to read ALL data from one block. Using the heap table, you could do up to (worse case scenario) 40 block reads!!!
Summary – IOT vs HEAP Tables:
In a Heap organized table, data comes wherever there is place, at the end of the table. Getting to the data means getting to the index, extracting the row id’s, going to the table.
In an Index Organized Table, the data is stored in a “group by” fashion, based on the primary key. Getting to the data means getting to the index. There is no need for an extra I/O on the table, since all data is in the index.
Now that we have an understanding of how the two tables work, i’ll point some directions on :
- when to use them
- when to not use them
- things to take care of when working with them
- advantages and hints
As i have tested with a lot of options, i will exemplify where needed with different explain plans, statistics, etc.
When to use Index Organized Tables
I think the starting point of this topic should be Christian Antognini’s summary of when to use IOT: Use an IOT only when:
- Getting to the data can be done exclusively using the primary key
- Storing a row in the IOT does not need more than 50 % of the block size
These two points should be your starting point when thinking about changing to IOT. IF and ONLY IF the above is true, following points could be considered:
- When you have a classic m:n parent child relationship, and the cardinality of values for the child is high. In order to understand this, imagine an online transaction system, where the ORCL ticker is updated on a secondly basis, with the newest put and pull prices.Imagine the following structure:
SYMBOL(id,name) – PK(id) ( this is the parent )
VALUES(pk_id,hour,pull_value,put_value,timestamp) – PK(pk_id,hour) ( this is the child)
Having the ORCL symbol updated on a secondly basis means having 3600 inserts for each hour. Getting to the data would be in this case very straightforward:
SELECT * FROM VALUES WHERE PK_ID=:1 AND HOUR=:2
Now, if the block size would allow you to store 3600 records in a block, you could get to all of the data (most optimistic scenario) in only 3 I/O’s: go to root -> go to branch -> read the block
Organizing the data this way guarantees a low level of I/O, depending on the number of records related to one value of the primary key
- When joining m:n parent/child tables
A nested join between two tables goes like this: for each record of the leading table, the second table will be searched. If the search can be performed by reading a single block, the total numbers of I/O’s necessary will be considerably reduced
- When you need to retrieve the data sorted by the primary key
Since the records are sorted ascending (this is how a B-Tree index works) by the primary key, the retrieval of data will be already sorted. It is a benefit that a lot of people tend to oversee. That will surely change once one truly knows what an ORDER BY clause means (from a performance point of view on the database) More to this in the points to follow
- When you know that your business pattern is inserting data in a scattered manner (primary keys not close to each other)
Inserting (in a relatively high speed scenario) of records having primary keys close to each other is highly unlikely (example: a lot of users, registered in the system at very far points from each other, inserting data at the same time. Since the id’s of the users are very scattered, the chance of writing the data in the same blocks is very low)
- Extracting “ranges” of data – Using ranges on the primary key
The IOT is great for range scans, since the most costly operation is getting to the first key. Because all data is at leaf level, there is no need to return to the branch level in order to get to the next values, therefore, again, saving I/O
When not to use Index Organized Tables
- When you need to sort the data.
Although this is a very fast structure, the “order by” clause will still need to do a full traverse of the index tree, therefore doing a FULL SCAN. A full scan is a sequential process, which will read blocks one after the other. We cannot do “bulk, multi block” reading, since we need the data to be sorted
- When building matrix type parent:child relations (primary keys that are built over multiple columns)
Building a structure like P(m,n):C(n,p) – where P is the leading (parent) table, and C is the child table, will surely provide no benefit from this if the intention is to speed up the join over the two tables by doing the join over the index. Why? The reason is quite simple: the keys m,n will almost never match the keys n,p. It is like trying to join the following pairs:
You are loosing a lot of resource for finding a match where there is no match to be found, since for each value of the leading index a scan will be performed in the child index.
- When inserting and updating in the same transaction in a OLTP type system, where the primary key is additionally built using monotonic ascending values (ex:sequence). Updating right after inserting will aquire a latch on the buffer cache, holding a lock on the block, preventing other sessions from inserting new data in the same block (buffer busy waits)
What to watch out for when working with Index Organized Tables
- Using the suffix without the prefix on a IOT as predicate will do a full scan.
The data is organized after the primary key, and the primary key is composed of (prefix,suffix) In the IOT table, the suffix values are actually grouped into the prefix value, so the key is organized in this case after prefix. Supposing we had 5 columns, of which we wanted three of them to be in the primary key, this is how the structure will look like:
IOT_TABLE(col1,col2,col3,col4,col5) ; PK(col1,col2,col3)
Trying to find the data using a statement like:
SELECT * FROM IOT_TABLE WHERE col3=:1
will generate a full scan, since col3 is a suffix only. On the other hand, using a statement like:
SELECT * FROM IOT_TABLE WHERE col2=:1 will perform a fast full scan, therefore taking advantage of multi block reads. A better way to get to the values would be
SELECT * FROM IOT_TABLE WHERE col1=:1 and col3=:2
- Additional indexes in an Index Organized Table
Although allowed, this is not recommended. That is because the primary key values will actually be stored in the index as a replacement for the row id, and guessing will be used instead of direct access. The guessing can go wrong if data is being moved in the index over time, since the logical guess is based on a “pointer to block” guess system. If the data is not found in the guessed block, a full process is started to find the data (going again right from the top)
- Storing large data in an Index Organized Table
This is actually one of the only reasons that i find valid for statements like “IOT is harder to maintain”. I do think that moving larger data when splitting the index costs more, but that is the only additional cost that i can think of compared to a normal table. The IOT is using the same B-Tree structure and algorithms.
Advantages of Index Organized Tables
Data is delivered sorted, since it is already organized in a sorted manner, after the primary key. No explicit sorting is needed anymore (which would require the costly, sequential, I/O consuming FULL SCAN)
Since the table is stored in an index structure, there is no need for an index AND a table, so only a segment is reserved for the index. The table is the index and the index is the table
Since the data is only stored once, in the index, the access is done strictly with I/O on a single structure (only one segment)
Data is organized in a logically manner, after the primary key. Reading ALL data for one PK value will in the best case scenario take only one I/O Block Read operation, since all the data is logically and physically stored in one single block
- BUFFER CACHE USAGE
Since less blocks are needed to read the data, the burden on the buffer cache is sensitively reduced (less space to be allocated in the buffer cache for reading the data)
Tips and hints on using Index Organized Tables
- LOWER BLOCK SIZES
Use lower block sizes for OLTP systems, where inserting data patterns reflect close to each other Primary Key values (imagine the first 10 users registered in the system, with id’s of 1 to 10, permanently inserting data) This way you will reduce block contention, since the blocks will be filled sooner, and wait times to write in the same block will be reduced.
Do that either by altering the block size at table space level, or by increasing the PCTFREE parameter (which defines the maximum level of free space on the block, when inserting is not allowed no more, and a new block is allocated)
- Overflow large data
If you are working with large data, be sure to overflow it at DML level by indicating ORACLE to store it in a separate TABLESPACE, other than the one for the IOT table. This is very useful when working with LOB data
In the next post (depending on the time, since i am behind with some other promised ”follow up” posts, i will update the real world scenario, with “before and after” examples, based on my tryouts and modifications. There will be both negative and positive impact examples, where i will provide details, explain plans, sql statistics, etc., going back to the points described in this article