Java Map的基本原理剖析 - 摩柯技术社区

Java Map 概述

在Java编程中，Map是一种非常重要的数据结构，它用于存储键值对（key - value pairs）。Map接口提供了一种将唯一键映射到相应值的方式，使得我们可以通过键快速查找对应的值。这与数组或列表不同，数组和列表是通过索引来访问元素，而Map通过键来访问值。

Map接口有多个实现类，例如HashMap、TreeMap、LinkedHashMap等，每个实现类都有其独特的特性和适用场景，它们都遵循Map接口定义的基本契约。

Map 接口的主要方法

put(K key, V value)：将指定的键值对插入到Map中。如果Map中已经存在该键，则用新值替换旧值，并返回旧值；如果不存在，则插入新的键值对并返回null。
get(Object key)：根据指定的键获取对应的值。如果Map中不存在该键，则返回null。
containsKey(Object key)：判断Map中是否包含指定的键。
size()：返回Map中键值对的数量。
isEmpty()：判断Map是否为空，即键值对数量是否为0。
remove(Object key)：从Map中移除指定键对应的键值对，并返回该键对应的值；如果不存在该键，则返回null。

HashMap 的基本原理

HashMap是Map接口最常用的实现类之一，它基于哈希表（hash table）数据结构来实现。哈希表的核心思想是通过哈希函数（hash function）将键映射到一个特定的桶（bucket）中，从而实现快速的查找和插入操作。

哈希函数

哈希函数是HashMap实现高效查找的关键。在HashMap中，通过调用键对象的hashCode()方法来获取键的哈希码，然后将哈希码进一步处理得到桶的索引。理想情况下，哈希函数应该将不同的键均匀地分布到各个桶中，以减少哈希冲突（hash collision）的发生。

例如，下面是一个简单的自定义类及其hashCode方法的示例：

class CustomKey {
    private int id;
    private String name;

    public CustomKey(int id, String name) {
        this.id = id;
        this.name = name;
    }

    @Override
    public int hashCode() {
        return 31 * id + name.hashCode();
    }
}

哈希冲突

尽管哈希函数设计得尽可能均匀，但在实际应用中，不同的键可能会产生相同的哈希码，这种情况称为哈希冲突。HashMap采用链地址法（separate chaining）来解决哈希冲突。

当发生哈希冲突时，HashMap会将冲突的键值对存储在同一个桶中，形成一个链表。在Java 8及以后的版本中，如果链表长度超过一定阈值（默认为8），链表会转换为红黑树（Red - Black Tree），以提高查找性能。

存储结构

HashMap的存储结构主要由一个数组（称为桶数组）和链表（或红黑树）组成。数组的每个元素称为一个桶，桶中存储的是键值对的链表（或红黑树）。

以下是HashMap存储结构的简化代码示例：

class Node<K, V> {
    final int hash;
    final K key;
    V value;
    Node<K, V> next;

    Node(int hash, K key, V value, Node<K, V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }
}

class MyHashMap<K, V> {
    private static final int DEFAULT_INITIAL_CAPACITY = 16;
    private static final float DEFAULT_LOAD_FACTOR = 0.75f;
    transient Node<K, V>[] table;
    int size;
    int threshold;
    final float loadFactor;

    public MyHashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR;
        this.threshold = (int) (DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR);
        this.table = new Node[DEFAULT_INITIAL_CAPACITY];
    }

    private int hash(Object key) {
        int h;
        return (key == null)? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

    public V put(K key, V value) {
        int hash = hash(key);
        int i = indexFor(hash, table.length);
        for (Node<K, V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) {
                V oldValue = e.value;
                e.value = value;
                return oldValue;
            }
        }
        addEntry(hash, key, value, i);
        return null;
    }

    private void addEntry(int hash, K key, V value, int bucketIndex) {
        if (size >= threshold) {
            resize(2 * table.length);
        }
        createEntry(hash, key, value, bucketIndex);
    }

    private void createEntry(int hash, K key, V value, int bucketIndex) {
        Node<K, V> e = table[bucketIndex];
        table[bucketIndex] = new Node<>(hash, key, value, e);
        size++;
    }

    private void resize(int newCapacity) {
        Node<K, V>[] oldTable = table;
        int oldCapacity = oldTable.length;
        if (oldCapacity == Integer.MAX_VALUE) {
            threshold = Integer.MAX_VALUE;
            return;
        }
        Node<K, V>[] newTable = new Node[newCapacity];
        transfer(newTable);
        table = newTable;
        threshold = (int) (newCapacity * loadFactor);
    }

    private void transfer(Node<K, V>[] newTable) {
        int newCapacity = newTable.length;
        for (Node<K, V> e : table) {
            while (null != e) {
                Node<K, V> next = e.next;
                int i = indexFor(e.hash, newCapacity);
                e.next = newTable[i];
                newTable[i] = e;
                e = next;
            }
        }
    }

    private int indexFor(int hash, int length) {
        return hash & (length - 1);
    }

    public V get(Object key) {
        int hash = hash(key);
        int i = indexFor(hash, table.length);
        for (Node<K, V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) {
                return e.value;
            }
        }
        return null;
    }
}

TreeMap 的基本原理

TreeMap是基于红黑树（Red - Black Tree）实现的Map。红黑树是一种自平衡的二叉搜索树，它保证了树的高度大致为O(log n)，从而使得TreeMap的查找、插入和删除操作的时间复杂度也为O(log n)。

红黑树特性

节点颜色：每个节点要么是红色，要么是黑色。
根节点：根节点是黑色。
叶子节点：每个叶子节点（NIL节点，空节点）是黑色。
红色节点：如果一个节点是红色，那么它的两个子节点都是黑色（即不存在两个连续的红色节点）。
路径黑色高度：从任一节点到其每个叶子节点的所有路径都包含相同数目的黑色节点。

插入操作

当向TreeMap中插入一个新的键值对时，首先按照二叉搜索树的规则找到合适的插入位置。然后，可能会破坏红黑树的特性，需要通过旋转（rotation）和颜色调整（color adjustment）操作来恢复红黑树的平衡。

例如，以下是TreeMap插入操作的简化代码示例（仅展示关键逻辑）：

class RedBlackTreeNode<K, V> {
    K key;
    V value;
    RedBlackTreeNode<K, V> left, right, parent;
    boolean color = RED;
    static final boolean RED = false;
    static final boolean BLACK = true;

    RedBlackTreeNode(K key, V value, RedBlackTreeNode<K, V> parent) {
        this.key = key;
        this.value = value;
        this.parent = parent;
    }
}

class MyTreeMap<K extends Comparable<K>, V> {
    private RedBlackTreeNode<K, V> root;

    public V put(K key, V value) {
        RedBlackTreeNode<K, V> t = root;
        if (t == null) {
            root = new RedBlackTreeNode<>(key, value, null);
            fixAfterInsertion(root);
            return null;
        }
        int cmp;
        RedBlackTreeNode<K, V> parent;
        Comparator<? super K> cpr = null;
        do {
            parent = t;
            cmp = cpr == null? ((Comparable<? super K>) key).compareTo(t.key) : cpr.compare(key, t.key);
            if (cmp < 0) {
                t = t.left;
            } else if (cmp > 0) {
                t = t.right;
            } else {
                V oldValue = t.value;
                t.value = value;
                return oldValue;
            }
        } while (t != null);
        RedBlackTreeNode<K, V> newNode = new RedBlackTreeNode<>(key, value, parent);
        if (cmp < 0) {
            parent.left = newNode;
        } else {
            parent.right = newNode;
        }
        fixAfterInsertion(newNode);
        return null;
    }

    private void fixAfterInsertion(RedBlackTreeNode<K, V> x) {
        x.color = RED;
        while (x != null && x != root && x.parent.color == RED) {
            if (parentOf(x) == leftOf(parentOf(parentOf(x)))) {
                RedBlackTreeNode<K, V> y = rightOf(parentOf(parentOf(x)));
                if (colorOf(y) == RED) {
                    setColor(parentOf(x), BLACK);
                    setColor(y, BLACK);
                    setColor(parentOf(parentOf(x)), RED);
                    x = parentOf(parentOf(x));
                } else {
                    if (x == rightOf(parentOf(x))) {
                        x = parentOf(x);
                        rotateLeft(x);
                    }
                    setColor(parentOf(x), BLACK);
                    setColor(parentOf(parentOf(x)), RED);
                    rotateRight(parentOf(parentOf(x)));
                }
            } else {
                // 对称情况
            }
        }
        root.color = BLACK;
    }

    private RedBlackTreeNode<K, V> parentOf(RedBlackTreeNode<K, V> p) {
        return (p == null)? null : p.parent;
    }

    private RedBlackTreeNode<K, V> leftOf(RedBlackTreeNode<K, V> p) {
        return (p == null)? null : p.left;
    }

    private RedBlackTreeNode<K, V> rightOf(RedBlackTreeNode<K, V> p) {
        return (p == null)? null : p.right;
    }

    private boolean colorOf(RedBlackTreeNode<K, V> p) {
        return (p == null)? BLACK : p.color;
    }

    private void setColor(RedBlackTreeNode<K, V> p, boolean c) {
        if (p != null) {
            p.color = c;
        }
    }

    private void rotateLeft(RedBlackTreeNode<K, V> p) {
        if (p != null) {
            RedBlackTreeNode<K, V> r = p.right;
            p.right = r.left;
            if (r.left != null) {
                r.left.parent = p;
            }
            r.parent = p.parent;
            if (p.parent == null) {
                root = r;
            } else if (p.parent.left == p) {
                p.parent.left = r;
            } else {
                p.parent.right = r;
            }
            r.left = p;
            p.parent = r;
        }
    }

    private void rotateRight(RedBlackTreeNode<K, V> p) {
        if (p != null) {
            RedBlackTreeNode<K, V> l = p.left;
            p.left = l.right;
            if (l.right != null) {
                l.right.parent = p;
            }
            l.parent = p.parent;
            if (p.parent == null) {
                root = l;
            } else if (p.parent.right == p) {
                p.parent.right = l;
            } else {
                p.parent.left = l;
            }
            l.right = p;
            p.parent = l;
        }
    }

    public V get(Object key) {
        RedBlackTreeNode<K, V> p = getTreeNode((K) key);
        return (p == null)? null : p.value;
    }

    private RedBlackTreeNode<K, V> getTreeNode(K key) {
        RedBlackTreeNode<K, V> p = root;
        while (p != null) {
            int cmp = key.compareTo(p.key);
            if (cmp < 0) {
                p = p.left;
            } else if (cmp > 0) {
                p = p.right;
            } else {
                return p;
            }
        }
        return null;
    }
}

LinkedHashMap 的基本原理

LinkedHashMap是HashMap的一个子类，它在HashMap的基础上增加了双向链表的结构，以维护插入顺序或访问顺序。

维护顺序

插入顺序：按照键值对插入LinkedHashMap的顺序进行维护。
访问顺序：当一个键值对被访问（通过get方法获取值或put方法更新值）时，将该键值对移动到链表的末尾。

存储结构

LinkedHashMap的存储结构是在HashMap的基础上，为每个节点增加了两个双向链表指针before和after，用于维护链表顺序。

以下是LinkedHashMap存储结构的简化代码示例：

class LinkedHashMapNode<K, V> extends Node<K, V> {
    LinkedHashMapNode<K, V> before, after;
    LinkedHashMapNode(int hash, K key, V value, Node<K, V> next) {
        super(hash, key, value, next);
    }
}

class MyLinkedHashMap<K, V> extends MyHashMap<K, V> {
    private transient LinkedHashMapNode<K, V> head;
    private transient LinkedHashMapNode<K, V> tail;
    private final boolean accessOrder;

    public MyLinkedHashMap(int initialCapacity, float loadFactor, boolean accessOrder) {
        super(initialCapacity, loadFactor);
        this.accessOrder = accessOrder;
    }

    @Override
    public V get(Object key) {
        Node<K, V> e;
        if ((e = getNode(hash(key), key)) == null) {
            return null;
        }
        if (accessOrder) {
            afterNodeAccess((LinkedHashMapNode<K, V>) e);
        }
        return e.value;
    }

    @Override
    public V put(K key, V value) {
        V oldValue = super.put(key, value);
        if (oldValue == null) {
            afterNodeInsertion(true);
        } else if (accessOrder) {
            afterNodeAccess((LinkedHashMapNode<K, V>) getNode(hash(key), key));
        }
        return oldValue;
    }

    void afterNodeAccess(LinkedHashMapNode<K, V> e) {
        LinkedHashMapNode<K, V> last;
        if (accessOrder && (last = tail) != e) {
            LinkedHashMapNode<K, V> p = e, b = p.before, a = p.after;
            p.after = null;
            if (b == null) {
                head = a;
            } else {
                b.after = a;
            }
            if (a != null) {
                a.before = b;
            } else {
                last = b;
            }
            if (last == null) {
                head = p;
            } else {
                p.before = last;
                last.after = p;
            }
            tail = p;
            ++modCount;
        }
    }

    void afterNodeInsertion(boolean evict) {
        LinkedHashMapNode<K, V> first;
        if (evict && (first = head) != null && removeEldestEntry(first)) {
            K key = first.key;
            removeNode(hash(key), key, null, false, true);
        }
    }

    protected boolean removeEldestEntry(Map.Entry<K, V> eldest) {
        return false;
    }
}

性能比较

HashMap：在一般情况下，HashMap的性能非常好，插入、查找和删除操作的平均时间复杂度为O(1)。但是在哈希冲突严重的情况下，性能会下降到O(n)，不过在Java 8及以后，当链表长度过长时转换为红黑树，性能有所改善。
TreeMap：TreeMap的插入、查找和删除操作的时间复杂度为O(log n)，因为它基于红黑树实现。适用于需要对键进行排序的场景，但性能一般不如HashMap。
LinkedHashMap：在维护顺序的同时，性能与HashMap相近。插入、查找和删除操作的时间复杂度与HashMap基本相同，但由于维护链表结构，会有额外的空间开销。

选择合适的 Map 实现

无序且追求高性能：如果不需要对键进行排序，并且追求最高的插入和查找性能，HashMap是最佳选择。
有序场景：如果需要按键的自然顺序或自定义顺序对键值对进行排序，TreeMap是合适的选择。
维护插入或访问顺序：如果需要维护插入顺序或访问顺序，LinkedHashMap是最佳选择。例如，实现LRU（Least Recently Used）缓存时，LinkedHashMap可以方便地实现按照访问顺序淘汰最近最少使用的元素。

通过深入理解Java Map的各种实现类的基本原理，开发者可以根据具体的应用场景选择最合适的数据结构，从而优化程序的性能和资源利用。