C語言實現一個簡易的Hash table(6)

雜湊表 C語言 · 發表 2019-01-27 22:48:58

摘要：上一章中，我們實現了 Hash表中的插入、搜尋和刪除介面，我們在初始化 hash表時固定了大小為 53 ，為了方便擴充套件，本章將介紹如何修改 hash表的大小。設定 Hash表大小現在，我們的 hash表是固定大小( 53 )的，當插...

上一章中，我們實現了 Hash表 中的 插入 、 搜尋 和 刪除 介面，我們在初始化 hash表 時固定了大小為 53 ，為了方便擴充套件，本章將介紹如何修改 hash表 的大小。

設定 `Hash表` 大小

現在，我們的 hash表 是固定大小( 53 )的，當插入越來越多資料時，我們的 hash表 就會被插滿，這個問題有兩個原因：

雜湊表的效能隨著高衝突率而降低
我們的'hash表'只能儲存固定數量的記錄，如果我們儲存更多，將無法插入資料

為了減少 hash表 被插滿的情況發生，當插入很多資料時，我們可以增大 hash表 的大小， hash表 中的 count 屬性代表已經插入的資料條數，在每次插入和刪除時，我們計算表的“負載”，或插入的數量和總的大小的比率，如果它高於或低於某些值，我們會減小或擴大 hash表 的大小。

我們定義如下規則：

如果 負載>0.7 ，就 擴大
如果 負載<0.1 ，就 縮小

要調整大小，我們建立一個大約是當前大小的一半或兩倍的新雜湊表，並將所有未刪除的項插入其中。

我們的新 hash表 大小應該是大約是當前大小的兩倍或一半的素數，找到新的 hash表 大小並非易事。為了確定 hash表 的大小，我們現設定一個最基本的大小，然後將實際大小定義為大於基本大小的第一個素數。擴大時，我們先將基本大小加倍，找到第一個更大的素數，然後作為 hash表 的大小，縮小時，我們將大小減半並找到下一個更大的素數。

我們先從基本大小 50 開始，我們使用最簡單粗暴的方法通過檢查每個連續數是否為素數來查詢下一個素數。這個簡單粗暴的方法看起來不是很理想，但是我們實際需要檢查的值很少，並且花費的時間超過了重新散列表中每個專案所花費的時間。

首先，我們先定義一個函式用來找到下一個素數， prime.h 和 prime.c 的內容如下：

// prime.h
int is_prime(const int x);
int next_prime(int x);

// prime.c
#include <math.h>
#include "prime.h"

/*
 * Return whether x is prime or not
 *
 * Returns:
 *1- prime
 *0- not prime
 *-1 - undefined (i.e. x < 2)
 */
int is_prime(const int x) {
if (x < 2) { return -1; }
if (x < 4) { return 1; }
if ((x % 2) == 0) { return 0; }
for (int i = 3; i <= floor(sqrt((double) x)); i += 2) {
if ((x % i) == 0) {
return 0;
}
}
return 1;
}

/*
 * Return the next prime after x, or x if x is prime
 */
int next_prime(int x) {
while (is_prime(x) != 1) {
x++;
}
return x;
}

下一步，我們需要修改 ht_new 函式，使之可以在建立 hash表 時指定大小，為此我們要建立一個新的函式 ht_new_sized ，在 ht_new 中我們呼叫 ht_new_sized 並給我們的 hash表 一個預設大小：

// hash_table.c
static ht_hash_table* ht_new_sized(const int base_size) {
ht_hash_table* ht = xmalloc(sizeof(ht_hash_table));
ht->base_size = base_size;

ht->size = next_prime(ht->base_size);

ht->count = 0;
ht->items = xcalloc((size_t)ht->size, sizeof(ht_item*));
return ht;
}


ht_hash_table* ht_new() {
return ht_new_sized(HT_INITIAL_BASE_SIZE);
}

現在一切準備就緒。在我們的設定 hash表 大小函式中，我們需要檢查以確保我們沒有將雜湊表的大小減小到最小值以下，然後，我們初始化一個所需大小的新 hash表 ，原表中所有非 NULL 或者未被刪除的都會插入到新 hash表 中，然後我們在刪除舊的 hash 表之前將屬性賦值給新的 hash表 。

// hash_table.c
static void ht_resize(ht_hash_table* ht, const int base_size) {
if (base_size < HT_INITIAL_BASE_SIZE) {
return;
}
ht_hash_table* new_ht = ht_new_sized(base_size);
for (int i = 0; i < ht->size; i++) {
ht_item* item = ht->items[I];
if (item != NULL && item != &HT_DELETED_ITEM) {
ht_insert(new_ht, item->key, item->value);
}
}

ht->base_size = new_ht->base_size;
ht->count = new_ht->count;

// To delete new_ht, we give it ht's size and items 
const int tmp_size = ht->size;
ht->size = new_ht->size;
new_ht->size = tmp_size;

ht_item** tmp_items = ht->items;
ht->items = new_ht->items;
new_ht->items = tmp_items;

ht_del_hash_table(new_ht);
}

為了簡化設定大小，我們定義了兩個函式：

// hash_table.c
static void ht_resize_up(ht_hash_table* ht) {
const int new_size = ht->base_size * 2;
ht_resize(ht, new_size);
}


static void ht_resize_down(ht_hash_table* ht) {
const int new_size = ht->base_size / 2;
ht_resize(ht, new_size);
}

要執行調整大小，我們先檢查插入和刪除時 hash表 上的負載。如果它高於或低於0.7和0.1的預定義限制，我們分別調高或調低。

為了避免進行浮點運算，我們將計數乘以 100 ，並檢查它是高於還是低於 70 或 10 ：

// hash_table.c
void ht_insert(ht_hash_table* ht, const char* key, const char* value) {
const int load = ht->count * 100 / ht->size;
if (load > 70) {
ht_resize_up(ht);
}
// ...
}


void ht_delete(ht_hash_table* ht, const char* key) {
const int load = ht->count * 100 / ht->size;
if (load < 10) {
ht_resize_down(ht);
}
// ...
}

上一章：實現介面

下一章：附錄：替代碰撞處理

C語言實現一個簡易的Hash table(6)

設定 Hash表 大小

您可能也會喜歡…

設定 `Hash表` 大小