Redis 1-1、字典数据结构

分享未结  0  1298

李延 LV6 2021-07-11

悬赏：20积分

字典就是我们日常在java中使用的map结构，以k-v键值对存储数据。

在redis中是使用最广泛的数据结构，redis的数据结构就是字典来做底层实现的，对于数据库的所有curd操作都是对字典的操作。

# 散列表数据结构说明

参考：http://data.biancheng.net/view/107.html

# redis 字典数据结构定义

所有关于字典数据结构的定义都在源码的在src/dict.h文件中

## 哈希表

在src/dict.h中：

```c

typedef struct dictht {
    dictEntry **table;
    unsigned long size;
    unsigned long sizemask;
    unsigned long used;
} dictht;

```

table是一个数组，每个元素都是dictEntry结构。

size 是记录table数组的大小。

used 记录当前字典的节点数。

sizemask 用于计算当前key应该存放到table的那个索引上，它的值总是size-1

## 哈希表节点
```c

typedef struct dictEntry {
    //键
    void *key;
    //值
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    //指向下一个节点的指针
    struct dictEntry *next;
} dictEntry;

```

每个dictEntry都存放着一个键值对，而且它是一个链表结构，next指向下一个节点

## 字典
```c

typedef struct dict {
    //不同类型特定的函数
    dictType *type;
    
    //私有数据（不太清楚）
    void *privdata;
    //hash表，有两个值，用于扩容
    dictht ht[2];
    // 记录扩容时的索引，当其为-1表示当前字典没有在扩容
    long rehashidx; 
} dict;

```

在dict中我们看到到有一个type属性。这个结构里面是不同的函数指针，对于不同的字典类型，都有各自不同的函数，比如：不同的计算hash方法、各自判断key相等的方法。

dictType

```c

typedef struct dictType {
    //计算hash方法
    uint64_t (*hashFunction)(const void *key);
    //复制key方法
    void *(*keyDup)(void *privdata, const void *key);
    //复制value方法
    void *(*valDup)(void *privdata, const void *obj);
    // 判断两个key相等的方法
    int (*keyCompare)(void *privdata, const void *key1, const void *key2);
    //销毁key方法
    void (*keyDestructor)(void *privdata, void *key);
    //销毁value方法
    void (*valDestructor)(void *privdata, void *obj);
} dictType;
```

# redis 字典数据结构实现

在上面我们主要看了字典的数据结构定义，下面我们主要分析几个重要的方法。

## 添加

在dict.c 中dictAdd函数。

```c

int dictAdd(dict *d, void *key, void *val)
{
    // 添加k
    dictEntry *entry = dictAddRaw(d,key,NULL);
    //如果返回的entry为空说明添加失败
    if (!entry) return DICT_ERR;
    //为当前节点设置我们需要的值，这里不会创建新空间，还是直接引用之前的val
    dictSetVal(d, entry, val);
    //返回设置成功
    return DICT_OK;
}
```
在上面代码中参数有3个：dict数据结构、我们需要保存的key和value。

其中主要分为两个步骤：
1. 根据我们输入的key在dict结构中插入节点，并将新创建的节点dictEntry返回。
2. 将我们需要保存的value设置到新创建的dictEntry节点中。

我们具体看这两个步骤

### dictAddRaw
```c

dictEntry *dictAddRaw(dict *d, void *key, dictEntry **existing)
{
    long index;
    dictEntry *entry;
    dictht *ht;
    //当前字典是否正在扩容，当标志位不为-1时，说明在扩容
    //当节点正在扩容是，将一个节点刷新至扩容后的数据中
    if (dictIsRehashing(d)) _dictRehashStep(d);

/* Get the index of the new element, or -1 if
     * the element already exists. */
    // 根据key的hash值计算出当前节点应该存放节点的位置
    if ((index = _dictKeyIndex(d, key, dictHashKey(d,key), existing)) == -1)
        return NULL;

/* Allocate the memory and store the new entry.
     * Insert the element in top, with the assumption that in a database
     * system it is more likely that recently added entries are accessed
     * more frequently. */
    // 判断当前map是否正在刷新，如果在刷新说明需要将数据存放在扩容后的空间中
    ht = dictIsRehashing(d) ? &d->ht[1] : &d->ht[0];
    //分配节点空间
    entry = zmalloc(sizeof(*entry));
    //将最新添加的节点分配在链表的第一个位置
    entry->next = ht->table[index];
    ht->table[index] = entry;
    //将userd加一
    ht->used++;

/* Set the hash entry fields. */
    //设置节点的key，这里会调用type的 key dup函数
    dictSetKey(d, entry, key);
    //返回新添加的节点
    return entry;
}

```

首先代码判断当前是否正在刷新，如果在刷新会去执行_dictRehashStep方法。这块我们在扩容时说明

下面又通过_dictKeyIndex 方法计算出应该保存在table数组的索引位置

再下一步就是获取到dict结构，将我们新创建的节点插入的链表的第一个，其中。根据当前字典释放在刷新，会保存在不同的ht中。没有刷新时，会保存在第一个dict中，正在刷新时，会保存在扩容后的dict中。

### dictSetVal

前面我们看了节点插入的过程，而dictSetVal就是将我们需要保存的值设置到当前节点上

```c
#define dictSetVal(d, entry, _val_) do { \
    if ((d)->type->valDup) \
        (entry)->v.val = (d)->type->valDup((d)->privdata, _val_); \
    else \
        (entry)->v.val = (_val_); \
} while(0)
```
这块代码比较简单，就是一个负值的过程，但会根据我们之前传人type不同，会有不同的负值策略。

## 跟新

跟新采用的是dictReplace方法，当我们插入的key已经存在时，会替换原来旧的数据

```java

int dictReplace(dict *d, void *key, void *val)
{
    dictEntry *entry, *existing, auxentry;

/* Try to add the element. If the key
     * does not exists dictAdd will succeed. */
    //添加数据，如果entry不为null说明是新的节点直接添加
    entry = dictAddRaw(d,key,&existing);
    if (entry) {
        dictSetVal(d, entry, val);
        return 1;
    }

/* Set the new value and free the old one. Note that it is important
     * to do that in this order, as the value may just be exactly the same
     * as the previous one. In this context, think to reference counting,
     * you want to increment (set), and then decrement (free), and not the
     * reverse. */
    //设置新的值，并释放原来数据的值内存
    auxentry = *existing;
    dictSetVal(d, existing, val);
    dictFreeVal(d, &auxentry);
    return 0;
}
```

代码基本与之前add相同，唯一不同点就是调用dictAddRaw时，传人了existing指针，如果当前key已经存在，那么旧的节点会被负值给existing，我们就可以在下面代码中修改这个节点，实现数据跟新的功能。

## rehash机制

随着不断操作，哈希表的数据会增加或减少，为了维持哈希表维持在一个合理的范围内，就需要对已经保存的数据进行rehash操作。具体步骤如下：

1. 为ht[1]分配空间，如果是扩容，那么ht[1]是原来ht[0]的两倍
2. 将ht[0]上的数据全部负值到ht[1] 上，这里需要根据key的哈希值重新计算索引位置。
3. 当所有数据全部迁移完成后，将ht[1] 设置为ht[0],并释放原来数据的空间。

在java的hashmap中，rehash时是一次性的全部将原来的数据负值到新的空间中，这样会导致如果节点较多时，可能导致服务器在一定时间内停止服务。

而redis采用的是渐进式的方法，具体步骤如下：

1. 同样也是分配新的空间，此时让字典同时拥有ht0和ht1两个哈希表。
2. 将rehashidx设置为0表示正在进行刷新操作。
3. 而在刷新时，我们执行任何curd操作时，都会将ht0哈希表的一个索引数据复制到ht1上。并将rehashidx加一。
4. 随着每次操作的刷新，最后所有数据都会被复制到ht1上。

在add方法中我们看到有这样的判断：
```c
if (dictIsRehashing(d)) _dictRehashStep(d);
#define dictIsRehashing(d) ((d)->rehashidx != -1)

```
这一步就是在判断当前字典是否在刷新，如果在刷新就会去执行_dictRehashStep。

```c
/**
 * 将节点刷新至扩容后的空间中。
 * 每次刷新数组那个元素的链表的数据，
 *
 * redis扩容采用的是渐进式刷新，需要扩容时，会创建一个扩容后的空间，此时系统不会直接将所有数据全部复制到新的空间中，
 * 而是每次操作当前map时，会将一个节点刷新至新的空间中，直到所有数据全部刷新至扩容后的空间。
 *
 * 这也导致当map正在刷新时，对于数据的curd都需要对两个空间同时进行操作
 *
 *
 * @param d
 * @param n
 * @return
 */
int dictRehash(dict *d, int n) {
    int empty_visits = n*10; /* Max number of empty buckets to visit. */
    if (!dictIsRehashing(d)) return 0;
    //循环n次，每次属性一个链表数据
    while(n-- && d->ht[0].used != 0) {
        dictEntry *de, *nextde;

/* Note that rehashidx can't overflow as we are sure there are more
         * elements because ht[0].used != 0 */
        assert(d->ht[0].size > (unsigned long)d->rehashidx);
        //判断数组中哪个元素有值，并将当前元素下标记录在rehashidx中。
        //最多遍历 n*10个元素
        while(d->ht[0].table[d->rehashidx] == NULL) {
            d->rehashidx++;
            if (--empty_visits == 0) return 1;
        }
        //获取到当前链表的第一个元素
        de = d->ht[0].table[d->rehashidx];
        /* Move all the keys in this bucket from the old to the new hash HT */
        //将整个链表属性至新的空间中
        while(de) {
            uint64_t h;

nextde = de->next;
            /* Get the index in the new hash table */
            h = dictHashKey(d, de->key) & d->ht[1].sizemask;
            de->next = d->ht[1].table[h];
            d->ht[1].table[h] = de;
            d->ht[0].used--;
            d->ht[1].used++;
            de = nextde;
        }
        d->ht[0].table[d->rehashidx] = NULL;
        d->rehashidx++;
    }

/* Check if we already rehashed the whole table... */
    //判断是否刷新完成
    if (d->ht[0].used == 0) {
        //释放ht0的空间
        zfree(d->ht[0].table);
        //将ht1负值到ht0
        d->ht[0] = d->ht[1];
        //初始化ht1
        _dictReset(&d->ht[1]);
        //将rehashidx记录为-1说明刷新完成
        d->rehashidx = -1;
        return 0;
    }

/* More to rehash... */
    return 1;
}

```

因为渐进式的刷新，会将数据分配到ht0和ht1两个哈希表中，所以curd操作时，就需要对两个表进行判断。

例如：我们上面的add方法，如果正在刷新，新的数据就会被添加到ht1中，如果没有在刷新，数据就添加在ht0中。

## 删除数据

```c
int dictDelete(dict *ht, const void *key) {
    return dictGenericDelete(ht,key,0) ? DICT_OK : DICT_ERR;
}

/**
 * 删除元素，根据nofree判断不同情况
 * 0、在词典中删除，并释放空间
 * 非0、只是将节点在字典中删除，对于其内存空间不进行是否
 * @param d
 * @param key
 * @param nofree
 * @return
 */
static dictEntry *dictGenericDelete(dict *d, const void *key, int nofree) {
    uint64_t h, idx;
    dictEntry *he, *prevHe;
    int table;

if (d->ht[0].used == 0 && d->ht[1].used == 0) return NULL;
    //判断刷新
    if (dictIsRehashing(d)) _dictRehashStep(d);
    //计算hash
    h = dictHashKey(d, key);
    //在两个数组中检索数据
    for (table = 0; table <= 1; table++) {
        idx = h & d->ht[table].sizemask;
        he = d->ht[table].table[idx];
        prevHe = NULL;
        while(he) {
            //当找到元素时删除节点
            if (key==he->key || dictCompareKeys(d, key, he->key)) {
                /* Unlink the element from the list */
                //在链表中删除当前节点
                if (prevHe)
                    prevHe->next = he->next;
                else
                    d->ht[table].table[idx] = he->next;
                //判断释放要释放空间
                if (!nofree) {
                    dictFreeKey(d, he);
                    dictFreeVal(d, he);
                    zfree(he);
                }
                d->ht[table].used--;
                return he;
            }
            //遍历下一个元素
            prevHe = he;
            he = he->next;
        }
        //非刷新时，只判断第一个元素
        if (!dictIsRehashing(d)) break;
    }
    return NULL; /* not found */
}

```

## 查询数据

```c
dictEntry *dictFind(dict *d, const void *key)
{
    dictEntry *he;
    uint64_t h, idx, table;
    //判断是否有数据
    if (dictSize(d) == 0) return NULL; /* dict is empty */
    //对于正在刷新数据执行刷新操作
    if (dictIsRehashing(d)) _dictRehashStep(d);
    //计算当前key的下标
    h = dictHashKey(d, key);
    //在两个字典中都获取数据
    for (table = 0; table <= 1; table++) {
        idx = h & d->ht[table].sizemask;
        he = d->ht[table].table[idx];
        while(he) {
            if (key==he->key || dictCompareKeys(d, key, he->key))
                return he;
            he = he->next;
        }
        if (!dictIsRehashing(d)) return NULL;
    }
    return NULL;
}

```

回帖

消灭零回复