Redis字典数据结构详解

Redis字典简介

Redis是一个高性能的键值对存储系统，其内部使用了多种数据结构来实现不同类型的值存储。字典（dict）是Redis中非常基础且重要的数据结构之一，它用于实现Redis的数据库，以键值对的形式存储数据。无论是普通的字符串键值对，还是哈希类型（hash）等，在底层都依赖字典结构来管理数据。

Redis字典的实现

Redis的字典是基于哈希表实现的。哈希表能够提供高效的查找、插入和删除操作，平均时间复杂度为O(1)。在Redis中，哈希表的实现包含了几个关键的数据结构：哈希表（dict.h/dictht）、哈希表节点（dict.h/dictEntry）和字典（dict.h/dict）。

哈希表结构（`dictht`）

在Redis的源码中，哈希表结构定义如下：

typedef struct dictht {
    dictEntry **table;
    unsigned long size;
    unsigned long sizemask;
    unsigned long used;
} dictht;

table：这是一个指向数组的指针，数组中的每个元素都是一个指向dictEntry结构体的指针，即哈希表节点。这个数组用于存储键值对数据。
size：哈希表的大小，也就是table数组的长度。
sizemask：掩码，值为size - 1。在计算哈希值对应的数组索引时，会使用哈希值与sizemask进行按位与操作，来确定键值对在哈希表中的存储位置。
used：记录哈希表中已使用的节点数量，即当前哈希表中存储的键值对个数。

哈希表节点结构（`dictEntry`）

哈希表节点用于存储具体的键值对数据，其定义如下：

typedef struct dictEntry {
    void *key;
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    struct dictEntry *next;
} dictEntry;

key：指向键的指针，键可以是任意类型的数据，在Redis中通常是字符串。
v：这是一个联合体，用于存储值。值可以是指针类型（void *val），也可以是整数（uint64_t u64、int64_t s64）或浮点数（double d）类型，这样设计是为了在不同场景下节省内存空间。
next：指向下一个哈希表节点的指针，用于解决哈希冲突。当不同的键计算出相同的哈希值时，通过链表将这些冲突的节点串联起来。

字典结构（`dict`）

字典结构将哈希表和一些辅助操作函数整合在一起，其定义如下：

typedef struct dict {
    dictType *type;
    void *privdata;
    dictht ht[2];
    int rehashidx;
    unsigned long iterators;
} dict;

type：指向dictType结构体的指针，该结构体定义了一系列操作函数，用于处理不同类型的键值对，例如计算哈希值、比较键、释放键和值等操作。
privdata：私有数据指针，用于传递一些与特定类型键值对相关的附加数据，供dictType中的函数使用。
ht[2]：包含两个哈希表，通常情况下只使用ht[0]来存储数据。在进行哈希表扩展或收缩（rehash）操作时，会用到ht[1]。
rehashidx：记录当前rehash操作的进度。当rehashidx为 -1 时，表示当前没有进行rehash操作；否则，它表示正在进行rehash，并且指示当前正在从ht[0]的哪个索引位置开始移动数据到ht[1]。
iterators：记录当前正在使用字典迭代器的数量，用于防止在迭代字典的过程中进行rehash操作，避免数据不一致。

哈希算法与哈希冲突处理

哈希算法

Redis使用MurmurHash2算法来计算键的哈希值。MurmurHash2是一种快速且性能良好的哈希算法，能够产生高质量的哈希分布，减少哈希冲突的概率。在Redis中，通过dict.c/dictGenHashFunction函数来调用MurmurHash2算法。

// 计算哈希值的函数
static unsigned int dictGenHashFunction(const void *key, int len) {
    return dictHashFunction(key, len);
}
// 实际调用MurmurHash2算法的函数
unsigned int dictHashFunction(const void *key, int len) {
    return dictGenMurmurHash(key, len);
}

计算出哈希值后，通过与哈希表的sizemask进行按位与操作，得到键值对在哈希表中的存储位置：

// 根据哈希值计算索引位置
static unsigned int dictHashKey(dictht *ht, const void *key) {
    return dictGenHashFunction(key, dictKeyLen(ht, key)) & ht->sizemask;
}

哈希冲突处理

当不同的键计算出相同的哈希值时，就会发生哈希冲突。Redis采用链地址法（separate chaining）来处理哈希冲突。在哈希表的每个数组位置上，不是直接存储键值对，而是存储一个链表头指针。当发生冲突时，新的键值对节点会被插入到链表的头部。

// 插入新节点到哈希表
dictEntry *dictAddRaw(dict *d, const void *key, dictEntry **existing) {
    long index;
    dictEntry *entry;
    dictht *ht;

    if (dictIsRehashing(d)) _dictRehashStep(d);

    // 选择当前使用的哈希表
    ht = dictIsRehashing(d)? &d->ht[1] : &d->ht[0];

    // 计算哈希值和索引位置
    index = _dictKeyIndex(d, key, dictHashKey(ht, key), existing);
    if (index == -1) return NULL;

    // 分配新节点内存
    entry = zmalloc(sizeof(*entry));
    // 将新节点插入到链表头部
    entry->next = ht->table[index];
    ht->table[index] = entry;
    ht->used++;

    return entry;
}

这种处理方式使得哈希表在面对哈希冲突时，能够在不改变数组大小的情况下，将冲突的节点组织起来，保证数据的完整性。同时，在查找键值对时，虽然哈希冲突会导致需要遍历链表，但由于哈希表的负载因子（used / size）通常控制在合理范围内，链表的长度一般不会太长，查找性能仍然能够保持在可接受的水平。

字典的操作

插入操作

在Redis中，向字典插入键值对的主要函数是dictAdd。该函数首先会检查键是否已经存在于字典中，如果存在则根据设置决定是否覆盖值。如果键不存在，则会计算键的哈希值，找到对应的哈希表位置，并将新的键值对节点插入到链表头部。

int dictAdd(dict *d, const void *key, void *val) {
    dictEntry *entry = dictAddRaw(d, key, NULL);
    if (!entry) return DICT_ERR;
    dictSetVal(d, entry, val);
    return DICT_OK;
}

dictAddRaw函数负责查找插入位置并创建新的节点，dictSetVal函数则负责设置节点的值。

删除操作

删除操作通过dictDelete函数实现。该函数首先根据键计算哈希值，找到对应的哈希表位置，然后在链表中查找要删除的节点。找到节点后，将其从链表中移除，并释放相关的内存。

int dictDelete(dict *d, const void *key) {
    dictEntry *he, *prevHe;
    unsigned int h;

    if (d->ht[0].used == 0 && d->ht[1].used == 0) return DICT_ERR;

    if (dictIsRehashing(d)) _dictRehashStep(d);

    h = dictHashKey(d, key);
    he = d->ht[0].table[h & d->ht[0].sizemask];
    prevHe = NULL;

    while (he) {
        if (dictCompareKeys(d, key, he->key)) {
            dictEntry *nextHe = he->next;
            if (prevHe == NULL)
                d->ht[0].table[h & d->ht[0].sizemask] = he->next;
            else
                prevHe->next = he->next;
            dictFreeKey(d, he);
            dictFreeVal(d, he);
            zfree(he);
            d->ht[0].used--;
            return DICT_OK;
        }
        prevHe = he;
        he = he->next;
    }

    return DICT_ERR;
}

在删除节点时，需要注意维护哈希表的used计数器，确保其准确反映当前哈希表中的键值对数量。

查找操作

查找操作通过dictFind函数实现。该函数根据键计算哈希值，找到对应的哈希表位置，然后在链表中查找目标键值对。如果找到，则返回对应的dictEntry指针；否则，返回NULL。

dictEntry *dictFind(dict *d, const void *key) {
    dictEntry *he;
    unsigned int h;

    if (d->ht[0].used == 0 && d->ht[1].used == 0) return NULL;

    if (dictIsRehashing(d)) _dictRehashStep(d);

    h = dictHashKey(d, key);
    he = (dictIsRehashing(d)? d->ht[1].table : d->ht[0].table)[h & (dictIsRehashing(d)? d->ht[1].sizemask : d->ht[0].sizemask)];

    while (he) {
        if (dictCompareKeys(d, key, he->key))
            return he;
        he = he->next;
    }

    return NULL;
}

在查找过程中，如果当前正在进行rehash操作，需要根据rehashidx的状态在合适的哈希表中进行查找。

Rehash机制

Rehash的原因

随着数据的不断插入和删除，哈希表的负载因子会发生变化。当负载因子过高（例如超过1）时，哈希冲突的概率会显著增加，导致性能下降。此时，需要对哈希表进行扩展（expand），即增加哈希表的大小，重新分配内存，并将原哈希表中的数据重新计算哈希值并插入到新的哈希表中，这个过程称为rehash。另一方面，当负载因子过低（例如小于0.1）时，为了节省内存空间，可以对哈希表进行收缩（shrink），减少哈希表的大小。

Rehash的过程

分配新的哈希表：根据当前哈希表的状态决定是扩展还是收缩。如果是扩展，新哈希表的大小为原哈希表大小的2倍；如果是收缩，新哈希表的大小为原哈希表大小的1/2。在dict结构体的ht[1]中分配新的哈希表空间。

// 分配新的哈希表
int _dictExpandIfNeeded(dict *d) {
    // 检查是否需要扩展
    if (dictIsRehashing(d) || d->ht[0].used < d->ht[0].size) return DICT_OK;

    // 计算新的哈希表大小
    unsigned long size = d->ht[0].size? d->ht[0].size * 2 : DICT_HT_INITIAL_SIZE;

    // 分配新的哈希表空间
    if (size > DICT_MAX_MEMORY / sizeof(dictEntry *))
        return DICT_ERR;

    dictht n;
    _dictInit(&n, size);

    // 将新哈希表赋值给ht[1]
    if (d->ht[0].table == NULL) {
        d->ht[0] = n;
        return DICT_OK;
    }

    d->ht[1] = n;
    d->rehashidx = 0;
    return DICT_OK;
}

迁移数据：将ht[0]中的数据逐步迁移到ht[1]中。在迁移过程中，每次处理一个索引位置上的链表，将链表中的所有节点重新计算哈希值并插入到ht[1]的合适位置。这个过程不是一次性完成的，而是在每次字典操作（插入、删除、查找等）时，通过_dictRehashStep函数进行一小步迁移，以避免在rehash过程中长时间阻塞其他操作。

// 执行一步rehash操作
void _dictRehashStep(dict *d) {
    if (d->iterators > 0) return;
    if (!dictIsRehashing(d)) return;

    // 迁移一个索引位置上的链表
    while (d->ht[0].used && d->rehashidx < d->ht[0].size) {
        dictEntry *he, *nextHe;

        if (d->ht[0].table[d->rehashidx] == NULL) {
            d->rehashidx++;
            continue;
        }

        he = d->ht[0].table[d->rehashidx];
        while (he) {
            nextHe = he->next;
            unsigned int h = dictHashKey(d, he->key) & d->ht[1].sizemask;
            he->next = d->ht[1].table[h];
            d->ht[1].table[h] = he;
            d->ht[0].used--;
            d->ht[1].used++;
            he = nextHe;
        }
        d->ht[0].table[d->rehashidx] = NULL;
        d->rehashidx++;
    }

    // 完成rehash后，将ht[1]赋值给ht[0]，释放ht[1]的空间
    if (d->ht[0].used == 0) {
        zfree(d->ht[0].table);
        d->ht[0] = d->ht[1];
        _dictReset(&d->ht[1]);
        d->rehashidx = -1;
    }
}

完成rehash：当ht[0]中的所有数据都迁移到ht[1]后，将ht[1]赋值给ht[0]，并释放ht[1]的空间，同时将rehashidx设置为 -1，表示rehash操作完成。

渐进式Rehash

渐进式Rehash的优点

如果在rehash过程中一次性将所有数据从旧哈希表迁移到新哈希表，可能会导致Redis在一段时间内无法处理其他客户端请求，影响系统的响应性能。渐进式rehash通过将rehash操作分散到多次字典操作中，每次只迁移一小部分数据，避免了长时间的阻塞，保证了系统的正常运行。

渐进式Rehash的实现

在字典的每个操作函数中（如dictAdd、dictDelete、dictFind等），都会调用_dictRehashStep函数，执行一步rehash操作。同时，在执行字典操作时，如果当前正在进行rehash，需要根据rehashidx的状态，在ht[0]和ht[1]中正确查找和处理数据。

// 插入操作中调用rehash步骤
int dictAdd(dict *d, const void *key, void *val) {
    dictEntry *entry = dictAddRaw(d, key, NULL);
    if (!entry) return DICT_ERR;
    dictSetVal(d, entry, val);
    return DICT_OK;
}

// dictAddRaw函数中调用rehash步骤
dictEntry *dictAddRaw(dict *d, const void *key, dictEntry **existing) {
    long index;
    dictEntry *entry;
    dictht *ht;

    if (dictIsRehashing(d)) _dictRehashStep(d);

    // 选择当前使用的哈希表
    ht = dictIsRehashing(d)? &d->ht[1] : &d->ht[0];

    // 计算哈希值和索引位置
    index = _dictKeyIndex(d, key, dictHashKey(ht, key), existing);
    if (index == -1) return NULL;

    // 分配新节点内存
    entry = zmalloc(sizeof(*entry));
    // 将新节点插入到链表头部
    entry->next = ht->table[index];
    ht->table[index] = entry;
    ht->used++;

    return entry;
}

这样，在处理客户端请求的过程中，字典会逐步完成rehash操作，既保证了数据的一致性，又不会对系统性能造成太大影响。

代码示例

下面通过一个简单的C语言示例代码，展示如何使用Redis字典结构的部分功能。假设我们有一个简化版的字典实现，只包含插入和查找操作。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// 定义哈希表节点
typedef struct dictEntry {
    char *key;
    void *val;
    struct dictEntry *next;
} dictEntry;

// 定义哈希表
typedef struct dictht {
    dictEntry **table;
    unsigned long size;
    unsigned long used;
} dictht;

// 定义字典
typedef struct dict {
    dictht ht[2];
    int rehashidx;
} dict;

// 初始化哈希表
void _dictInit(dictht *ht, unsigned long size) {
    ht->table = (dictEntry **)malloc(size * sizeof(dictEntry *));
    if (!ht->table) {
        perror("malloc");
        exit(1);
    }
    ht->size = size;
    ht->used = 0;
    for (unsigned long i = 0; i < size; i++) {
        ht->table[i] = NULL;
    }
}

// 初始化字典
void dictInit(dict *d) {
    _dictInit(&d->ht[0], 16);
    _dictInit(&d->ht[1], 0);
    d->rehashidx = -1;
}

// 计算哈希值
unsigned long dictHashFunction(const char *key, unsigned long len) {
    unsigned long hash = 5381;
    for (unsigned long i = 0; i < len; i++) {
        hash = ((hash << 5) + hash) + key[i];
    }
    return hash;
}

// 插入键值对
int dictAdd(dict *d, const char *key, void *val) {
    unsigned long h;
    dictht *ht;
    if (d->rehashidx != -1) {
        // 进行rehash操作
        // 这里简单模拟，只迁移一个索引位置的数据
        dictEntry *he, *nextHe;
        ht = &d->ht[0];
        he = ht->table[d->rehashidx];
        while (he) {
            nextHe = he->next;
            h = dictHashFunction(he->key, strlen(he->key)) & d->ht[1].size;
            he->next = d->ht[1].table[h];
            d->ht[1].table[h] = he;
            d->ht[0].used--;
            d->ht[1].used++;
            he = nextHe;
        }
        d->ht[0].table[d->rehashidx] = NULL;
        d->rehashidx++;
        if (d->ht[0].used == 0) {
            free(d->ht[0].table);
            d->ht[0] = d->ht[1];
            _dictInit(&d->ht[1], 0);
            d->rehashidx = -1;
        }
    }

    ht = d->rehashidx == -1? &d->ht[0] : &d->ht[1];
    h = dictHashFunction(key, strlen(key)) & ht->size;
    dictEntry *entry = (dictEntry *)malloc(sizeof(dictEntry));
    if (!entry) {
        perror("malloc");
        return 0;
    }
    entry->key = strdup(key);
    entry->val = val;
    entry->next = ht->table[h];
    ht->table[h] = entry;
    ht->used++;
    return 1;
}

// 查找键值对
void* dictFind(dict *d, const char *key) {
    unsigned long h;
    dictht *ht;
    ht = d->rehashidx == -1? &d->ht[0] : &d->ht[1];
    h = dictHashFunction(key, strlen(key)) & ht->size;
    dictEntry *he = ht->table[h];
    while (he) {
        if (strcmp(he->key, key) == 0) {
            return he->val;
        }
        he = he->next;
    }
    return NULL;
}

int main() {
    dict d;
    dictInit(&d);

    dictAdd(&d, "key1", (void *)"value1");
    dictAdd(&d, "key2", (void *)"value2");

    void *val = dictFind(&d, "key1");
    if (val) {
        printf("Found value: %s\n", (char *)val);
    } else {
        printf("Key not found\n");
    }

    return 0;
}

在这个示例中，我们实现了一个简单的字典结构，包括初始化、插入和查找操作。同时，模拟了渐进式rehash的过程，在插入操作中，如果检测到正在进行rehash，会执行一小步迁移操作。虽然这个示例与Redis的实际字典实现相比非常简化，但能够帮助理解字典的基本工作原理。

总结

Redis的字典数据结构是其实现高效键值对存储的核心。通过基于哈希表的设计，结合链地址法处理哈希冲突以及渐进式rehash机制，Redis能够在保证高性能的同时，适应数据量的动态变化，有效管理内存空间。深入理解Redis字典的实现原理，对于优化Redis的使用、解决性能问题以及进行定制化开发都具有重要意义。无论是在缓存应用、实时数据分析还是其他场景中，掌握Redis字典的内部机制都能让开发者更好地发挥Redis的优势。