Redis 字典重点内容的代码实现示例

Redis 字典概述

Redis 字典是 Redis 中一种非常重要的数据结构，它被广泛应用于 Redis 的各种数据类型的底层实现，例如数据库、哈希表等。字典提供了一种键值对（key - value）的存储方式，允许快速地根据键来查找对应的值。

从本质上来说，Redis 字典是基于哈希表实现的。哈希表是一种以数组为基础的数据结构，它通过哈希函数将键映射到数组的某个位置，从而实现快速的查找。在理想情况下，哈希表的查找、插入和删除操作的时间复杂度接近 O(1)。

Redis 字典的数据结构

1. 哈希表（`dict.h/dictht`）

Redis 中的哈希表结构定义如下：

typedef struct dictht {
    dictEntry **table;
    unsigned long size;
    unsigned long sizemask;
    unsigned long used;
} dictht;

table：是一个指向 dictEntry 指针数组的指针，数组中的每个元素都是一个指向 dictEntry 结构体的指针，实际的键值对数据就存储在 dictEntry 结构体中。
size：表示哈希表的大小，即 table 数组的长度。
sizemask：size - 1 的值，用于在计算哈希值时，通过与哈希值进行按位与操作，将哈希值映射到 table 数组的有效索引范围内。
used：表示哈希表中已经使用的节点数。

2. 哈希表节点（`dict.h/dictEntry`）

typedef struct dictEntry {
    void *key;
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    struct dictEntry *next;
} dictEntry;

key：存储键，是一个 void * 类型的指针，这使得 Redis 可以存储任意类型的键。
v：是一个联合体，用于存储值。这种设计可以在存储不同类型的值时节省内存空间，val 指针用于存储一般的指针类型值，u64 和 s64 分别用于存储无符号 64 位整数和有符号 64 位整数，d 用于存储双精度浮点数。
next：是一个指针，用于链接哈希冲突时的下一个节点，通过链地址法解决哈希冲突。

3. 字典（`dict.h/dict`）

typedef struct dict {
    dictType *type;
    void *privdata;
    dictht ht[2];
    int rehashidx;
    unsigned long iterators;
} dict;

type：指向一个 dictType 结构体的指针，该结构体定义了一系列操作哈希表的函数，例如计算哈希值、比较键等。

typedef struct dictType {
    unsigned int (*hashFunction)(const void *key);
    void *(*keyDup)(void *privdata, const void *key);
    void *(*valDup)(void *privdata, const void *obj);
    int (*keyCompare)(void *privdata, const void *key1, const void *key2);
    void (*keyDestructor)(void *privdata, void *key);
    void (*valDestructor)(void *privdata, void *obj);
} dictType;

privdata：是一个指针，用于传递给 dictType 中定义的函数作为私有数据。
ht：是一个包含两个 dictht 结构体的数组，通常情况下，只有 ht[0] 用于存储数据，ht[1] 仅在 rehash 时使用。
rehashidx：记录当前 rehash 的进度，如果值为 -1，表示当前没有进行 rehash 操作。
iterators：记录正在使用的迭代器数量，用于确保在迭代过程中字典不会被修改。

哈希函数

Redis 使用 MurmurHash2 算法作为默认的哈希函数，该算法具有良好的性能和哈希分布特性。以下是 Redis 中使用的 MurmurHash2 哈希函数的简化实现：

// MurmurHash2, 64-bit versions, by Austin Appleby
// Note - This code makes a few assumptions about how your machine behaves -
// 1. We can read a 4-byte value from any address without crashing
// 2. sizeof(int) == 4
// 3. We can write a 4-byte value to any address without crashing
// 4. Unsigned arithmetic does not overflow

// 'm' and 'r' are mixing constants generated offline.
// They're not really 'magic', they just happen to work well.
#define M 0x5bd1e995
#define R 24

// Hash a 64-bit value.
// NOTE: This function requires that your machine has a 64-bit type.
// If you want to support 32-bit machines, you'll need to use a different
// function.
unsigned long long murmurhash64A(const void *key, const int len, const unsigned long long seed) {
    const unsigned long long m = 0xc6a4a7935bd1e995ULL;
    const int r = 47;
    unsigned long long h = seed ^ (len * m);
    const unsigned long long *data = (const unsigned long long *)key;
    const unsigned long long *end = data + (len / 8);

    while (data != end) {
        unsigned long long k = *data++;
        k *= m;
        k ^= k >> r;
        k *= m;
        h *= m;
        h ^= k;
    }

    const unsigned char *data2 = (const unsigned char *)data;

    switch (len & 7) {
        case 7:
            h ^= (unsigned long long)data2[6] << 48;
        case 6:
            h ^= (unsigned long long)data2[5] << 40;
        case 5:
            h ^= (unsigned long long)data2[4] << 32;
        case 4:
            h ^= (unsigned long long)data2[3] << 24;
        case 3:
            h ^= (unsigned long long)data2[2] << 16;
        case 2:
            h ^= (unsigned long long)data2[1] << 8;
        case 1:
            h ^= (unsigned long long)data2[0];
            h *= m;
    }

    h ^= h >> r;
    h *= m;
    h ^= h >> r;

    return h;
}

在 Redis 字典中，通过 dictType 结构体中的 hashFunction 指针来调用这个哈希函数，以计算键的哈希值。

哈希冲突的解决

当不同的键通过哈希函数计算得到相同的哈希值时，就会发生哈希冲突。Redis 使用链地址法来解决哈希冲突。在链地址法中，哈希表数组的每个元素都是一个链表头指针，当发生哈希冲突时，新的节点将被插入到链表的头部。

例如，假设有两个键 key1 和 key2，它们通过哈希函数计算得到的哈希值相同，都映射到哈希表数组的 table[i] 位置。此时，key1 和 key2 对应的 dictEntry 节点将通过 next 指针链接起来，形成一个链表。

插入操作

Redis 字典的插入操作步骤如下：

计算键的哈希值，通过哈希值和 sizemask 确定该键值对在哈希表中的索引位置。
检查该位置是否已经有节点，如果没有，则直接在该位置创建新的 dictEntry 节点并插入键值对。
如果该位置已有节点，说明发生了哈希冲突，通过 next 指针将新节点插入到链表头部。
插入成功后，更新哈希表的 used 字段，表示哈希表中已使用节点数加 1。

以下是简化的插入操作代码示例：

int dictAdd(dict *d, const void *key, const void *val) {
    int index;
    dictEntry *entry;
    dictht *ht;

    if (dictIsRehashing(d)) _dictRehashStep(d);

    // 计算哈希值并确定索引位置
    index = _dictKeyIndex(d, key);
    if (index == -1) return DICT_ERR;

    ht = dictIsRehashing(d) ? &d->ht[1] : &d->ht[0];
    entry = zmalloc(sizeof(*entry));
    entry->next = ht->table[index];
    ht->table[index] = entry;
    ht->used++;

    // 复制键值
    entry->key = dict->type->keyDup(dict->privdata, key);
    entry->v.val = dict->type->valDup(dict->privdata, val);

    return DICT_OK;
}

查找操作

查找操作是 Redis 字典中最为常用的操作之一。其步骤如下：

计算键的哈希值，通过哈希值和 sizemask 确定该键在哈希表中的索引位置。
从该索引位置开始，遍历链表，逐个比较节点的键，直到找到匹配的键或链表结束。
如果找到匹配的键，则返回对应的值；否则返回 NULL。

以下是简化的查找操作代码示例：

dictEntry *dictFind(dict *d, const void *key) {
    dictEntry *he;
    unsigned int h, idx, table;

    if (d->ht[0].used == 0) return NULL; /* 哈希表为空 */

    if (dictIsRehashing(d)) _dictRehashStep(d);

    h = dict->type->hashFunction(key);
    for (table = 0; table <= 1; table++) {
        idx = h & d->ht[table].sizemask;
        he = d->ht[table].table[idx];
        while (he) {
            if (dict->type->keyCompare(d->privdata, key, he->key))
                return he;
            he = he->next;
        }
        if (!dictIsRehashing(d)) break;
    }
    return NULL;
}

删除操作

删除操作需要先找到要删除的节点，然后将其从链表中移除，并释放相关的内存。步骤如下：

计算键的哈希值，确定在哈希表中的索引位置。
遍历链表找到要删除的节点，并记录其前驱节点。
调整链表指针，将前驱节点的 next 指针指向要删除节点的后继节点。
释放要删除节点的键和值所占用的内存，然后释放节点本身。
更新哈希表的 used 字段，表示哈希表中已使用节点数减 1。

以下是简化的删除操作代码示例：

int dictDelete(dict *d, const void *key) {
    uint64_t h;
    dictEntry *he, *prevHe;
    int table;

    if (d->ht[0].used == 0) return DICT_ERR;

    h = dict->type->hashFunction(key);
    for (table = 0; table <= 1; table++) {
        he = d->ht[table].table[h & d->ht[table].sizemask];
        prevHe = NULL;
        while (he) {
            if (dict->type->keyCompare(d->privdata, key, he->key)) {
                if (prevHe == NULL)
                    d->ht[table].table[h & d->ht[table].sizemask] = he->next;
                else
                    prevHe->next = he->next;
                dict->type->keyDestructor(d->privdata, he->key);
                dict->type->valDestructor(d->privdata, he->v.val);
                zfree(he);
                d->ht[table].used--;
                return DICT_OK;
            }
            prevHe = he;
            he = he->next;
        }
        if (!dictIsRehashing(d)) break;
    }
    return DICT_ERR;
}

Rehash

随着数据的不断插入和删除，哈希表的负载因子（used / size）会发生变化。当负载因子过高（超过一定阈值，Redis 默认是 1）或过低（小于一定阈值，Redis 默认是 0.1）时，为了保持哈希表的性能，需要进行 rehash 操作。

rehash 操作的步骤如下：

为 ht[1] 分配空间，其大小是 ht[0] 的两倍（负载因子过高时）或一半（负载因子过低时）。
将 ht[0] 中的所有键值对重新计算哈希值并插入到 ht[1] 中。
释放 ht[0] 的空间，并将 ht[1] 赋值给 ht[0]，同时为 ht[1] 重新分配一个空的哈希表。
将 rehashidx 设置为 -1，表示 rehash 操作完成。

为了避免在 rehash 过程中对性能产生过大影响，Redis 采用渐进式 rehash 的方式，即每次操作（如插入、删除、查找）时，只进行一小步 rehash 操作，逐步将 ht[0] 中的数据迁移到 ht[1] 中。

以下是渐进式 rehash 操作的部分代码示例：

int _dictRehashStep(dict *d) {
    if (!dictIsRehashing(d)) return 0;

    // 每次迁移一个桶
    while (d->ht[0].used != 0) {
        dictEntry *de, *nextde;
        unsigned int bucket;

        // 找到下一个非空桶
        while (d->ht[0].table[d->rehashidx] == NULL) d->rehashidx++;
        bucket = d->rehashidx;

        de = d->ht[0].table[bucket];
        while (de) {
            nextde = de->next;
            unsigned int h = dict->type->hashFunction(de->key) & d->ht[1].sizemask;
            de->next = d->ht[1].table[h];
            d->ht[1].table[h] = de;
            d->ht[0].used--;
            d->ht[1].used++;
            de = nextde;
        }
        d->ht[0].table[bucket] = NULL;
        d->rehashidx++;
        if (d->ht[0].used == 0) {
            zfree(d->ht[0].table);
            d->ht[0] = d->ht[1];
            _dictReset(&d->ht[1]);
            d->rehashidx = -1;
            return 1;
        }
        // 一次只迁移一个桶，避免阻塞
        break;
    }
    return 0;
}

总结

通过深入了解 Redis 字典的代码实现，我们明白了它是如何基于哈希表高效地实现键值对存储的。从哈希函数的选择到哈希冲突的解决，再到插入、查找、删除和 rehash 等操作的具体实现，每一个细节都体现了 Redis 在追求高性能和高效内存利用方面的设计理念。掌握这些内容，对于深入理解 Redis 的底层机制以及优化基于 Redis 的应用程序具有重要意义。无论是在数据库存储、缓存应用还是消息队列等场景中，对 Redis 字典的理解都能帮助开发者更好地使用 Redis 提供的强大功能。同时，在实际开发中，我们也可以借鉴 Redis 字典的设计思路，设计出适合特定应用场景的高效键值对存储结构。