How do database indexes work exactly?

Fuma

Executive Member
Joined
Jul 9, 2007
Messages
5,365
Reaction score
346
Location
Pretoria
I still can't understand the technical background of how indexes work really.

I was running this simple (maybe a little complex) script on these tables that have thousands of data and it took about 3 minutes to return values.
I put indexes on some of the fields I search by (WHERE CLAUSE) and it took less than a second to bring the data up.
So I just read trough what indexes an how they work, but technically how does it work?
Some people say too many indexes can hamper the database's performance.
 
It may have been the indexes as they do help hugely but sounds to me like you query could have been cached too.

How indexes work
 
Last edited:
It may have been the indexes as they do help hugely but sounds to me like you query could have been cached too.

How indexes work
I ran my query a couple of times (without indexes) and it took 3 minutes each time. I then created the indexes and it took less than a second. I removed the indexes and it went back to 3 minutes.
 
Index are good for search in terms of DB perfomance, but bad when doing lots of insert/updates.

Without going too deep, Index work like an index of a book, instead of searching through the whole pages of a book to find what you want, they only go to the exact page based on the index (your where clause).
hope this helps...
 
Indexes.....aaaah...very interesting subject and also very very complicated.

When you index a column on a table, the db stores the column + reference to the table row, in such a way that makes searching for a particular column's value very efficient.

So, when you table has an index on a column you are filtering on, the db will very quickly determine which rows you want. If you did not have an index, the db needs to scan through ALL the data to find the rows that match.

If indexing of data interests you, research the following:
- Big O notation (a simple notation to describe the "performance" of an algorithm related to the volume of data it works with)
- Binary search algorithm
-- VERY fast, but only works well with static data
- B Tree
- VERY fast, specifically designed to minimize the number of times dist reads are done
- B+ Tree
-- VERY fast, specifically designed to minimize the number of times dist reads are done with some more optimizations.

As an example, a binary search based index:
Complexity = O(log 2 N)
Searching for your answer through 1 million records requires at most 20 checks. Impressive eh ?


B/B+ trees are extremely complex and were specifically designed for databases, back in the day when data resided on very slow tape drives and disk "seeks" had to be kept at an absolute minimum.

Fasinating stuff :D
 
So, when you table has an index on a column you are filtering on, the db will very quickly determine which rows you want. If you did not have an index, the db needs to scan through ALL the data to find the rows that match.
OK how's scanning through an index any different from scanning through a regular db listing though? 0_o
I can haz not understand...
 
OK how's scanning through an index any different from scanning through a regular db listing though? 0_o
I can haz not understand...

Because the Index is sorted, a Binary Search can be performed on the Index.
 
See it like this:

An index is a special data structure, not a database table, that stores data in such a way that it is very efficient to find what you are looking for without having to search through all of the data.
As mentioned before, the simplest of these is a sorted index where you can use the binary search algorithm to find your data.

What the binary search algorithm does is:
Imagine you have a list of id numbers, sorted numerically from lowest to highest.
to find a specific id number you:
- start in the middle of this list and read the id number in the middle
- if the entry in the middle is less than your search value, you know that the lower half of the numbers simply cannot be one of the numbers you want
-- so right there you've eliminated half of the data out of your search, on the first check!
- moving forwards, you now choose a new mid point which will be 75% into the list (half way within the upper half of your list)
--where you will again compare one number and decide if you should continue searching on the lower quarter or upper quarter.

I found this definition of the algo on the interwebz:
"A searching algorithm which works on a sorted table by testing the middle of an interval, eliminating the half of the table in which the key cannot lie, and then repeating the procedure iteratively. "

Hope this helped.
 
See it like this:

An index is a special data structure, not a database table, that stores data in such a way that it is very efficient to find what you are looking for without having to search through all of the data.
As mentioned before, the simplest of these is a sorted index where you can use the binary search algorithm to find your data.

What the binary search algorithm does is:
Imagine you have a list of id numbers, sorted numerically from lowest to highest.
to find a specific id number you:
- start in the middle of this list and read the id number in the middle
- if the entry in the middle is less than your search value, you know that the lower half of the numbers simply cannot be one of the numbers you want
-- so right there you've eliminated half of the data out of your search, on the first check!
- moving forwards, you now choose a new mid point which will be 75% into the list (half way within the upper half of your list)
--where you will again compare one number and decide if you should continue searching on the lower quarter or upper quarter.

I found this definition of the algo on the interwebz:
"A searching algorithm which works on a sorted table by testing the middle of an interval, eliminating the half of the table in which the key cannot lie, and then repeating the procedure iteratively. "

Hope this helped.
Actually yes that helps a lot! Thanks!
But that begs the question, why don't databases sort and store binary by default?
 
Top
Sign up to the MyBroadband newsletter
X