Jump to content

Talk:Bitmap index

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by GregorB (talk | contribs) at 15:06, 10 June 2009 (Dubious: Re). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
WikiProject iconDatabases (inactive)
WikiProject iconThis article is within the scope of WikiProject Databases, a project which is currently considered to be inactive.

Something to note?

Should we note that bitmap indexes are good for equality operations in WHERE clauses, but not so good for less than or greater than operations? - Ta bu shi da yu 15:45, 10 November 2007 (UTC)[reply]

Huh, in what way aren't they "good" for < or >? -- intgr [talk] 23:20, 10 November 2007 (UTC)[reply]
"Bitmap indexes are also not suitable for columns that are primarily queried with less than or greater than comparisons. For example, a salary column that usually appears in WHERE clauses in a comparison to a certain value is better served with a B-tree index. Bitmapped indexes are only useful with equality queries, especially in combination with AND, OR, and NOT operators." Oracle Concepts document. Reread what they do and I'm sure you'll work out why this is the case. - Ta bu shi da yu 08:42, 11 November 2007 (UTC)[reply]
Bitmap indexes can be generated for any logical operation; e.g. you can have a bitmap index for the expression 'salary > 10000' and every time you query WHERE salary > 10000, this bitmap will be useful. I have no idea why Oracle's documentation singles out equality operations as the only use case for bitmap indexes — it isn't. -- intgr [talk] 21:12, 11 November 2007 (UTC)[reply]
I suppose that it's to do with a higher cardinality. My understanding of bitmap indexes is that the higher the cardinality (or percentage of unique elements) then the less efficient in storing the data in the index. Could be wrong. - Ta bu shi da yu 02:05, 12 November 2007 (UTC)[reply]
Ok I see where's the confusion. When the Oracle documentation talks of bitmap indexes, they actually imply that there will be a separate bitmap for every possible value in an indexed column; for example, one for gender='M', another for gender='F' — for a cardinality of two, 2 bitmaps will be created. When you perform either of these queries, the planner recognizes these expressions and uses the appropriate one of the indexes.
In a similar way, you could create bitmap indexes for two expressions salary > 10000, salary > 20000, and there would be just two bitmaps. However, to use this approach, you have to create indexes for both of the expressions manually, because the DBMS cannot know which expressions your queries will be using. And if the query expression doesn't exactly match any of the index expressions, no indexes can be used. -- intgr [talk] 11:34, 12 November 2007 (UTC)[reply]
Ah, I follow. Maybe this is something we should add to the article. - Ta bu shi da yu 12:19, 12 November 2007 (UTC)[reply]

Oracle uses BBC to compress bitmap index, it stores each compressed bitmap for each distinct value. This kind of bitmap doesn't support range queries well. Chan had introduced range or interval encoding schemes for range queries and two-sided range queries, which can improve range query performance. In the above example," A>10000", if using range encoding, it only needs two bitmap to answer that query. In oracle it may have to use 10000 bitmaps, so it is slow.

zhuo wang —Preceding unsigned comment added by 218.25.35.20 (talk) 07:43, 10 June 2008 (UTC)[reply]

Oracle's compressed bitmap indexes in fact works very well even when the indexed column has many distinct values. This is one of the main points of reference # 1.

Compressed bitmap indexes are very efficient for range queries such as "A > 10000", reference #4 and [1] give proofs in terms of theoretical analysis and timing measurements. In fact, the timing measurements published there are exclusively on range queries (not equality queries). Even though it might involve ORing a lot of bitmaps to answer such a query, each bitmap must be very small (after compression) in this case. Since these bitwise logical OR operations are very efficiently, the answer can be computed quickly.

A historical note: in Dr. O'Neil's paper on Model 204, Query 3 involves a range condition "Ownername > 'Z'" -- even the first commercial implementation of bitmap index did not shy away from range conditions. -- user:oaf2 21:02, 26 August 2008 (UTC)[reply]

Usage

See Prof. Lemire's comment on the usability of bitmap indices. --Ragib (talk) 15:01, 20 August 2008 (UTC)[reply]

I just read the article by Vivek Sharma that Daniel Lemire linked to in his post. I think that anyone who is editing this page would do well to read it. Dtunkelang (talk) 15:19, 20 August 2008 (UTC)[reply]

Dubious

For a table with n columns, the total number of distinct indexes to satisfy all possible queries is n!

Not that it matters much, but shouldn't this be 2n-1? Set of all possible indexes is the set of all column subsets, i.e. a power set (minus one, because you can't have a zero-column index). GregorB (talk) 15:16, 15 April 2009 (UTC)[reply]

You also have to consider the order of the columns. According to the PostgreSQL documentation about multicolumn indexes:
The query planner can use a multicolumn index for queries that involve the leftmost column in the index definition plus any number of columns listed to the right of it, without a gap. For example, an index on (a, b, c) can be used in queries involving all of a, b, and c, or in queries involving both a and b, or in queries involving only a, but not in other combinations.
This means that, to be able to answer every possible query using indexes, you need one for each possible column permutation, that is, n! --Pezezin (talk) 21:30, 5 May 2009 (UTC)[reply]


Order of columns is relevant not only for PostgreSQL. You cannot easily use an index that accesses columns A,B,C (in that order) to execute a query that uses B and C: http://richardfoote.wordpress.com/2008/03/10/index-skip-scan-does-index-column-order-matter-any-more-warning-sign/ --194.139.55.62 (talk) 05:52, 12 May 2009 (UTC)[reply]

You're correct, e.g. SQL Server and Oracle will use the index for any leading set of columns. However, my line of thinking was this: a WHERE clause can constrain an n-column query result in at most 2n-1 ways. For each of these WHERE clauses you need exactly one index, covering all the columns used by the clause. So, even if databases could not use leading sets of columns (i.e. even if your WHERE clause columns had to match precisely those in the index), 2n-1 indexes would still be enough. GregorB (talk) 15:06, 10 June 2009 (UTC)[reply]