usehook - 前端码易

哈希碰撞

函数曲线

在上一章中，我们讨论了哈希表的直接定址法，但是这种哈希函数的设计理论，仅仅适合关键字连续分布的情况，如果关键字不连续，则会造成空间浪费。

例如，一个班级在大一的时候，有 100 个学生，学号是 1001 到 1100。但是随着后续的发展变化，会经历分班、转专业、休学、退学等情况，导致学号空缺比较多。此时，如果我们依然使用直接定址法，那么就会造成空间的大量浪费。

因此，我们期望设计一个哈希函数，能够在不连续的学号中，将学号尽可能的映射到有限数组的连续空间中。

那么针对这种情况，一个比较常用的哈希函数是：除留余数法，他的核心就是用一个关键字都整数，除以一个整数，用得到的余数去映射数组的下标

因此，这里我们以数组的长度作为除数，用关键字对数组长度取余，得到的结果就是数组的下标。

title="除留余数法"

1function hash(key: number, length: number) {
2  return key % length;
3}

由于我们这里的关键字学号是四位数，数组长度为 8，因此，我们可以针对实际情况对哈希函数进行调整，例如：

title="除留余数法"

1function hash(key: number) {
2  return (key - 1000) % 8;
3}

代入这个哈希函数，我们可以得到如下图所示的分布结果

我们可以看到，由于数组的空间有限，但是学生的个数比较多，因此，必然会有多个学生的学号映射到同一个数组下标中，例如上图中的 1001 和 1033 都映射到了下标为 1 的位置，这就是哈希碰撞。

解决哈希碰撞的问题

很明显，哈希碰撞是一个无法正常使用的问题。因此，我们需要在此基础之上，进一步改变数据的存储方式，来解决冲突问题。

在哈希碰撞的情况之下，我们需要区分两种情况，

一种是输入量大于数组的存储空间
一种是输入量小于等于数组的存储空间「日常中几乎不会出现这种情况」

我们上面的图例演示了当输入量大于数组的存储空间时的情况，此时，同一个数组下标可能会映射到多个输入值。我们可以使用拉链法来解决冲突，也就是说，多个映射到同一个下标地址的输入，通过链表的形式存储在同一个数组下标的位置中

在这个基础之上，为了回顾之前学过的数据结构，我们这里使用环形链表来存储映射同一个下标的多个数据。

首先定义一个链表节点的格式，每一个节点上存储一个键值对，与链表指针

链表节点

1type HNode<V> = {
2  // 由于案例单一，因此这里我们将 key 的类型强制限定为 number
3  key: number,
4  value: V,
5  next: HNode<V> | null
6}

这里之所以需要使用环形链表，是因为在插入新的链表节点时，我们希望直接插入到链表尾部，但是单向链表要拿到最后一个节点的引用需要依次向后查询，这非常不方便，而我们使用环形链表可以通过指向尾部节点的方式，快速拿到尾部节点和头部节点，从而简化查询这个过程

除此之外，使用环形链表更多的用意是基于学习的目的，如果在逻辑中，本身就需要一个查询过程，例如在插入节点时判断链表上是否存在重复节点，就需要遍历链表，那么此时就可以使用单链表在共用这个逻辑去找到尾节点

然后，我们定一个头部指针指向链表的尾部节点，存储在数组中

头部指针

1type Queue<V> = {
2  length: number,
3  last: HNode<V> | null
4}

由于每一个数组节点都有可能存储一个链表，因此，我们可以把每一个数组节点看成是一个存储桶 bucket，因此，在定义 HashMap 类时，我们可以先简单写这样一个代码出来，把基础属性和需要的 API 先标记好

简版

10export default class HashMap<V> {
20  private buckets: Queue<V>[];
30  private size: number;
40
50  constructor(size = 32) {
60    this.buckets = new Array(size);
70    this.size = size;
80  }
90
10  // 哈希函数
11  hash(key: number) {
12    return (key - 1000) % this.size;
13  }
14
15  // 插入键值对
16  set(key: number, value: V) {
17    ...
18  }
19
20  // 获取值
21  get(key: number) {
22    ...
23  }
24
25  // 删除键值对
26  delete(key: number) {
27    ...
28  }
29
30  // 检查是否包含键
31  has(key: number) {
32    ...
33  }
34
35  // 获取所有键
36  keys() {
37    ...
38  }
39
40  // 获取所有值
41  values() {
42    ...
43  }
44
45  // 清空哈希表
46  clear() {
47    this.buckets = new Array(this.size);
48  }
49}

哈希函数我们使用除留余数法已经写好。接下来要考虑把节点存入数组中的情况。构建完成之后，我们期望得到的链表结构如下所示

第一种情况，初始化时，对应数组位置里面什么东西都没有，此时，我们需要创建 queue 对象，并 queue.last 指向新节点

第二种情况，链表节点全部被删除，只剩下 queue 节点，此时我们只需要创建新的节点即可

最终得到一个只有一个节点的链表，首尾节点都是自己

第三种情况，往链表尾节点之后添加新的节点。这里需要注意的是，queue.last 指向的是尾节点，因此指针的变化要小心理解错

再新增一个节点

环形链表往尾部新增节点理解起来比较困难，因此我这里多绘制了几张图，大家可以对比图例仔细揣摩。理解了新增节点情况与过程，我们的代码就很好写出来了

10// 插入键值对
20set(key: number, value: V) {
30  const index = this.hash(key);
40  const bucket = this.buckets[index]
50
60  const node: HNode<V> = {
70    key: key,
80    value: value,
90    next: null
10  }
11  
12  // 如果桶是空的，初始化一个链表
13  if (!bucket) {
14    node.next = node
15    this.buckets[index] = {
16      length: 1,
17      last: node
18    };
19    return
20  }
21
22  const last = bucket.last
23
24  if (!last) {
25    node.next = node
26    bucket.last = node
27    return
28  }
29
30  const first = last.next
31
32  // 检查是否已存在相同的key
33  let current = last
34  while(true) {
35    if (current.key === key) {
36      current.value = value
37      break
38    }
39    current = current.next!
40    if (current === bucket.last) {
41      break
42    }
43  }
44
45  // add the node to last
46  last.next = node
47  node.next = first
48  bucket.last = node
49  bucket.length += 1
50}

完整理解了如何插入链表节点，那么获取和删除就比较简单了。我们直接通过完整代码学习即可

HashMap.ts

100type HNode<V> = {
200  key: number,
300  value: V,
400  next: HNode<V> | null
500}
600
700type Queue<V> = {
800  length: number,
900  last: HNode<V> | null
100}
110
120export default class HashMap<V> {
130  private buckets: Queue<V>[];
140  private size: number;
150
160  constructor(size = 32) {
170    this.buckets = new Array(size);
180    this.size = size;
190  }
200
210  // 哈希函数
220  hash(key: number) {
230    return (key - 1000) % this.size;
240  }
250
260  // 插入键值对
270  set(key: number, value: V) {
280    const index = this.hash(key);
290    const bucket = this.buckets[index]
300
310    const node: HNode<V> = {
320      key: key,
330      value: value,
340      next: null
350    }
360    
370    // 如果桶是空的，初始化一个链表
380    if (!bucket) {
390      node.next = node
400      this.buckets[index] = {
410        length: 1,
420        last: node
430      };
440      return
450    }
460
470    const last = bucket.last
480
490    if (!last) {
500      node.next = node
510      bucket.last = node
520      return
530    }
540
550    const first = last.next
560
570    // 检查是否已存在相同的key
580    let current = last
590    while(true) {
600      if (current.key === key) {
610        current.value = value
620        break
630      }
640      current = current.next!
650      if (current === bucket.last) {
660        break
670      }
680    }
690
700    // add the node to last
710    last.next = node
720    node.next = first
730    bucket.last = node
740    bucket.length += 1
750  }
760
770  // 获取值
780  get(key: number) {
790    const index = this.hash(key);
800    const bucket = this.buckets[index];
810    
820    if (!bucket) return undefined;
830
840    if (!bucket.last) {
850      return undefined
860    }
870    
880    let current: HNode<V> = bucket.last
890    while(true) {
900      if (current.key === key) {
910        return current.value
920      }
930      current = current.next as HNode<V>
940      if (current === bucket.last) {
950        break
960      }
970    }
980    
990    return undefined;
100  }
101
102  // 删除键值对
103  delete(key: number) {
104    const index = this.hash(key);
105    const bucket = this.buckets[index];
106    
107    if (!bucket) return false;
108    
109    let prev = bucket.last
110    let current = bucket.last
111
112    if (current === null || prev === null) {
113      return false
114    }
115
116    while(true) {
117      if (current.key === key) {
118        if (prev === current) {
119          bucket.last = null
120        } else {
121          let next = current.next
122          prev.next = next
123        }
124        return true
125      }
126      prev = current
127      current = current.next as HNode<V>
128      if (current === bucket.last) {
129        break
130      }
131    }
132    
133    return false;
134  }
135
136  // 检查是否包含键
137  has(key: number) {
138    return this.get(key) !== undefined;
139  }
140
141  // 获取所有键
142  keys() {
143    const keys = [];
144    for (const bucket of this.buckets) {
145      if (bucket && bucket.last) {
146        let current = bucket.last;
147        do {
148          keys.push(current.key);
149          current = current.next!;
150        } while (current !== bucket.last);
151      }
152    }
153    return keys;
154  }
155
156  // 获取所有值
157  values() {
158    const values = [];
159    for (const bucket of this.buckets) {
160      if (bucket && bucket.last) {
161        let current = bucket.last;
162        do {
163          values.push(current.value);
164          current = current.next!;
165        } while (current !== bucket.last);
166      }
167    }
168    return values;
169  }
170
171  // 清空哈希表
172  clear() {
173    this.buckets = new Array(this.size);
174  }
175}
176

在个别少数情况下，会有输入量关键字不连续，但是总体数量小于等于存储空间的。这种情况下我们可以使用开放定址法来解决冲突。这个方式的思路是遇到冲突则在数组中寻找下一个下标位置放下输入值。因为数组空间大于等于输入量，因此我们总能把所有的数据直接放到数组中去

12、哈希表

14、哈希函数的运用

专栏首页

到顶

专栏目录