Mysql Hash索引和B-Tree索引區別(Comparison of B-Tree and Hash Indexes)
上篇文章中說道,Mysql中的Btree索引和Hash索引的區別,沒做展開描述,今天有空,上Mysql官方文件找到了相關答案,看完之後,針對兩者的區別做如下總結:
引用維基百科上的描述,來解釋一下這兩種資料結構,這些知識在《資料結構與演算法》這門課程中也有講述:
在電腦科學中, B樹 ( 英語: B-tree)是一種自平衡的樹,能夠保持資料有序。這種資料結構能夠讓查詢資料、順序訪問、插入資料及刪除的動作,都在對數時間內完成。B樹,概括來說是一個一般化的二叉查詢樹(binary search tree)一個節點可以擁有最少2個子節點。與自平衡二叉查詢樹不同,B樹適用於讀寫相對大的資料塊的儲存系統,例如磁碟。B樹減少定位記錄時所經歷的中間過程,從而加快存取速度。B樹這種資料結構可以用來描述外部儲存。這種資料結構常被應用在資料庫和檔案系統的實現上。
什麼是HASH資料結構:
散列表( Hash table ,也叫 雜湊表 ),是根據鍵(Key)而直接訪問在記憶體儲存位置的資料結構。也就是說,它通過計算一個關於鍵值的函式,將所需查詢的資料對映到表中一個位置來訪問記錄,這加快了查詢速度。這個對映函式稱做雜湊函式,存放記錄的陣列稱做 散列表 。
一個通俗的例子是,為了查詢電話簿中某人的號碼,可以建立一個按照人名首字母順序排列的表(即建立人名 {\displaystyle x}到首字母 {\displaystyle F(x)}的一個函式關係),在首字母為W的表中查詢“王”姓的電話號碼,顯然比直接查詢就要快得多。這裡使用人名作為關鍵字,“取首字母”是這個例子中雜湊函式的函式法則 {\displaystyle F()},存放首字母的表對應散列表。關鍵字和函式法則理論上可以任意確定。
總言之:
-
- HASH這種資料結構,資料是無序的,是key-value型,被用於精確匹配非常高效。所以在mysql中使用這種索引型別,將不支援模糊匹配,比如like ‘aaa%’。
- B-Tree這種資料結構資料是有序的,在Mysql中預設的索引型別是 B-Tree。B-Tree這種索引型別,決定了mysql能夠基於這種型別做資料區間匹配,可以實現
=
,>
,>=
,<
,<=
, orBETWEEN 這些語法。且支援
模糊搜尋,但是不支援 類似 like '%xxx%'這種前後模糊匹配的語句,僅支援後半段模糊匹配。 - 其中,mysql並不是有索引就一定會使用,查詢優化階段會判斷掃描行進行預估,可能表掃描更快,這個時候就不走索引,解決方法就是加上limit。
以下是mysql的官方文件說明,簡單明瞭,還配有兩個案例,原文連結: https://dev.mysql.com/doc/refman/5.7/en/index-btree-hash.html
8.3.8 Comparison of B-Tree and Hash Indexes
Understanding the B-tree and hash data structures can help predict how different queries perform on different storage engines that use these data structures in their indexes, particularly for the MEMORY
storage engine that lets you choose B-tree or hash indexes.
B-Tree Index Characteristics
A B-tree index can be used for column comparisons in expressions that use the =
, >
, >=
, <
, <=
, or BETWEEN
operators. The index also can be used for LIKE
comparisons if the argument to LIKE
is a constant string that does not start with a wildcard character. For example, the following SELECT
statements use indexes:
SELECT * FROM tbl_name WHERE key_col LIKE 'Patrick%';
SELECT * FROM tbl_name WHERE key_col LIKE 'Pat%_ck%';
In the first statement, only rows with 'Patrick' <=
are considered. In the second statement, only rows with key_col
< 'Patricl''Pat' <=
are considered. key_col
< 'Pau'
The following SELECT
statements do not use indexes:
SELECT * FROM tbl_name WHERE key_col LIKE '%Patrick%';
SELECT * FROM tbl_name WHERE key_col LIKE other_col;
In the first statement, the LIKE
value begins with a wildcard character. In the second statement, the LIKE
value is not a constant.
If you use ... LIKE '%
and string
%'string
is longer than three characters, MySQL uses the Turbo Boyer-Moore algorithm to initialize the pattern for the string and then uses this pattern to perform the search more quickly.
A search using
employs indexes if col_name
IS NULLcol_name
is indexed.
Any index that does not span all AND
levels in the WHERE
clause is not used to optimize the query. In other words, to be able to use an index, a prefix of the index must be used in every AND
group.
The following WHERE
clauses use indexes:
... WHERE index_part1=1 AND index_part2=2 AND other_column=3
/* index = 1 OR index = 2 */
... WHERE index=1 OR A=10 AND index=2
/* optimized like "index_part1='hello'" */
... WHERE index_part1='hello' AND index_part3=5
/* Can use index on index1 but not on index2 or index3 */
... WHERE index1=1 AND index2=2 OR index1=3 AND index3=3;
These WHERE
clauses do not use indexes:
/* index_part1 is not used */
... WHERE index_part2=1 AND index_part3=2
/*Index is not used in both parts of the WHERE clause*/
... WHERE index=1 OR A=10
/* No index spans all rows*/
... WHERE index_part1=1 OR index_part2=10
Sometimes MySQL does not use an index, even if one is available. One circumstance under which this occurs is when the optimizer estimates that using the index would require MySQL to access a very large percentage of the rows in the table. (In this case, a table scan is likely to be much faster because it requires fewer seeks.) However, if such a query uses LIMIT
to retrieve only some of the rows, MySQL uses an index anyway, because it can much more quickly find the few rows to return in the result.
Hash Index Characteristics
Hash indexes have somewhat different characteristics from those just discussed:
-
They are used only for equality comparisons that use the
=
or<=>
operators (but are very fast). They are not used for comparison operators such as<
that find a range of values. Systems that rely on this type of single-value lookup are known as “ key-value stores”; to use MySQL for such applications, use hash indexes wherever possible. -
The optimizer cannot use a hash index to speed up
ORDER BY
operations. (This type of index cannot be used to search for the next entry in order.) -
MySQL cannot determine approximately how many rows there are between two values (this is used by the range optimizer to decide which index to use). This may affect some queries if you change a
MyISAM
orInnoDB
table to a hash-indexedMEMORY
table. -
Only whole keys can be used to search for a row. (With a B-tree index, any leftmost prefix of the key can be used to find rows.)