Java集合框架源码分析与性能对比

Java集合框架概述

Java集合框架（Java Collections Framework）是Java提供的一组用于存储和操作对象集合的类和接口。它提供了丰富的数据结构和算法，使得开发人员可以轻松地处理各种类型的集合。Java集合框架主要包括以下几个部分：

接口（Interfaces）：定义了集合的操作规范，如Collection、List、Set、Map等。
实现类（Implementing Classes）：具体实现了接口定义的功能，如ArrayList、LinkedList、HashSet、TreeSet、HashMap、TreeMap等。
算法（Algorithms）：提供了一些通用的集合操作算法，如排序、查找等，这些算法定义在Collections类中。

Collection接口

Collection接口是Java集合框架中最基本的接口，它定义了一组操作集合的方法，如添加元素、删除元素、查询元素等。Collection接口有两个重要的子接口：List和Set。

import java.util.ArrayList;
import java.util.Collection;

public class CollectionExample {
    public static void main(String[] args) {
        Collection<String> collection = new ArrayList<>();
        collection.add("apple");
        collection.add("banana");
        collection.add("cherry");

        System.out.println("Size: " + collection.size());
        System.out.println("Contains banana: " + collection.contains("banana"));
        collection.remove("cherry");
        System.out.println("After removal: " + collection);
    }
}

在上述代码中，我们创建了一个ArrayList对象，它实现了Collection接口。然后我们使用add方法添加元素，contains方法查询元素，remove方法删除元素。

List接口及实现类

ArrayList

ArrayList是List接口的动态数组实现。它允许随机访问元素，并且在尾部添加元素的性能较好。

源码分析：

public class ArrayList<E> extends AbstractList<E>
        implements List<E>, RandomAccess, Cloneable, java.io.Serializable
{
    private static final long serialVersionUID = 8683452581122892189L;

    private transient Object[] elementData;
    private int size;

    public ArrayList(int initialCapacity) {
        if (initialCapacity > 0) {
            this.elementData = new Object[initialCapacity];
        } else if (initialCapacity == 0) {
            this.elementData = EMPTY_ELEMENTDATA;
        } else {
            throw new IllegalArgumentException("Illegal Capacity: "+
                                               initialCapacity);
        }
    }

    public boolean add(E e) {
        ensureCapacityInternal(size + 1);
        elementData[size++] = e;
        return true;
    }

    private void ensureCapacityInternal(int minCapacity) {
        if (elementData == EMPTY_ELEMENTDATA) {
            minCapacity = Math.max(DEFAULT_CAPACITY, minCapacity);
        }
        ensureExplicitCapacity(minCapacity);
    }

    private void ensureExplicitCapacity(int minCapacity) {
        modCount++;
        if (minCapacity - elementData.length > 0)
            grow(minCapacity);
    }

    private void grow(int minCapacity) {
        int oldCapacity = elementData.length;
        int newCapacity = oldCapacity + (oldCapacity >> 1);
        if (newCapacity - minCapacity < 0)
            newCapacity = minCapacity;
        if (newCapacity - MAX_ARRAY_SIZE > 0)
            newCapacity = hugeCapacity(minCapacity);
        elementData = Arrays.copyOf(elementData, newCapacity);
    }
}

从源码中可以看出，ArrayList内部使用一个Object数组来存储元素。当添加元素时，如果数组容量不足，会进行扩容，扩容的方式是将原数组大小增加50%。

性能对比：在随机访问元素时，ArrayList的时间复杂度为O(1)，因为它可以通过索引直接访问元素。在中间插入或删除元素时，时间复杂度为O(n)，因为需要移动元素。在尾部添加元素时，平均时间复杂度为O(1)，最坏情况下为O(n)（当需要扩容时）。

LinkedList

LinkedList是List接口的双向链表实现。它不支持随机访问，但是在链表头部和尾部插入或删除元素的性能较好。

源码分析：

public class LinkedList<E>
    extends AbstractSequentialList<E>
    implements List<E>, Deque<E>, Cloneable, java.io.Serializable
{
    transient int size = 0;
    transient Node<E> first;
    transient Node<E> last;

    private static class Node<E> {
        E item;
        Node<E> next;
        Node<E> prev;

        Node(Node<E> prev, E element, Node<E> next) {
            this.item = element;
            this.next = next;
            this.prev = prev;
        }
    }

    public boolean add(E e) {
        linkLast(e);
        return true;
    }

    void linkLast(E e) {
        final Node<E> l = last;
        final Node<E> newNode = new Node<>(l, e, null);
        last = newNode;
        if (l == null)
            first = newNode;
        else
            l.next = newNode;
        size++;
        modCount++;
    }
}

从源码中可以看出，LinkedList内部使用双向链表结构，每个节点包含前驱节点、后继节点和元素。添加元素时，直接在链表尾部添加新节点。

性能对比：在随机访问元素时，LinkedList的时间复杂度为O(n)，因为需要从头开始遍历链表。在链表头部或尾部插入或删除元素时，时间复杂度为O(1)，因为只需要修改节点的引用。在中间插入或删除元素时，时间复杂度为O(n)，因为需要先找到插入或删除的位置。

Set接口及实现类

HashSet

HashSet是Set接口的哈希表实现，它不允许集合中存在重复元素。

源码分析：

public class HashSet<E>
    extends AbstractSet<E>
    implements Set<E>, Cloneable, java.io.Serializable
{
    static final long serialVersionUID = -5024744406713321676L;

    private transient HashMap<E,Object> map;

    private static final Object PRESENT = new Object();

    public HashSet() {
        map = new HashMap<>();
    }

    public boolean add(E e) {
        return map.put(e, PRESENT)==null;
    }
}

从源码中可以看出，HashSet内部使用HashMap来存储元素，将元素作为HashMap的键，值固定为PRESENT。添加元素时，实际上是调用HashMap的put方法。

性能对比：在添加、删除和查找元素时，HashSet的平均时间复杂度为O(1)，这是因为HashMap的哈希算法可以快速定位元素。但是在最坏情况下，当哈希冲突严重时，时间复杂度会退化为O(n)。

TreeSet

TreeSet是Set接口的红黑树实现，它不仅不允许集合中存在重复元素，还会对元素进行排序。

源码分析：

public class TreeSet<E> extends AbstractSet<E>
    implements NavigableSet<E>, Cloneable, java.io.Serializable
{
    private transient NavigableMap<E,Object> m;

    private static final Object PRESENT = new Object();

    public TreeSet() {
        this(new TreeMap<>());
    }

    public TreeSet(Comparator<? super E> comparator) {
        this(new TreeMap<>(comparator));
    }

    public boolean add(E e) {
        return m.put(e, PRESENT)==null;
    }
}

从源码中可以看出，TreeSet内部使用TreeMap来存储元素，同样将元素作为键，值固定为PRESENT。添加元素时，调用TreeMap的put方法。

性能对比：在添加、删除和查找元素时，TreeSet的时间复杂度为O(log n)，因为红黑树的平衡特性保证了操作的高效性。与HashSet相比，TreeSet由于需要维护元素的顺序，性能会稍慢一些。

Map接口及实现类

HashMap

HashMap是Map接口的哈希表实现，它存储键值对，并且允许键为null（但最多只能有一个键为null）。

源码分析：

public class HashMap<K,V> extends AbstractMap<K,V>
    implements Map<K,V>, Cloneable, Serializable
{
    static final long serialVersionUID = 362498820763181265L;

    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;
    static final int MAXIMUM_CAPACITY = 1 << 30;
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

    transient Node<K,V>[] table;
    transient Set<Map.Entry<K,V>> entrySet;
    transient int size;
    transient int modCount;
    int threshold;
    final float loadFactor;

    static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;

        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }

        public final K getKey()        { return key; }
        public final V getValue()      { return value; }
        public final String toString() { return key + "=" + value; }

        public final int hashCode() {
            return Objects.hashCode(key) ^ Objects.hashCode(value);
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }

        public final boolean equals(Object o) {
            if (o == this)
                return true;
            if (o instanceof Map.Entry) {
                Map.Entry<?,?> e = (Map.Entry<?,?>)o;
                if (Objects.equals(key, e.getKey()) &&
                    Objects.equals(value, e.getValue()))
                    return true;
            }
            return false;
        }
    }

    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }

    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1)
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) {
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }
}

从源码中可以看出，HashMap内部使用数组和链表（或红黑树）来存储键值对。当哈希冲突较少时，使用链表解决冲突；当链表长度超过一定阈值时，会将链表转换为红黑树，以提高查找性能。

性能对比：在添加、删除和查找元素时，HashMap的平均时间复杂度为O(1)，最坏情况下为O(n)（当哈希冲突严重时）。

TreeMap

TreeMap是Map接口的红黑树实现，它存储键值对，并且会按键的自然顺序或自定义顺序进行排序。

源码分析：

public class TreeMap<K,V>
    extends AbstractMap<K,V>
    implements NavigableMap<K,V>, Cloneable, java.io.Serializable
{
    private transient Entry<K,V> root;
    private transient int size = 0;
    private transient int modCount = 0;
    private final Comparator<? super K> comparator;

    static final class Entry<K,V> implements Map.Entry<K,V> {
        K key;
        V value;
        Entry<K,V> left;
        Entry<K,V> right;
        Entry<K,V> parent;
        boolean color = BLACK;

        Entry(K key, V value, Entry<K,V> parent) {
            this.key = key;
            this.value = value;
            this.parent = parent;
        }

        public K getKey() {
            return key;
        }

        public V getValue() {
            return value;
        }

        public V setValue(V value) {
            V oldValue = this.value;
            this.value = value;
            return oldValue;
        }

        public boolean equals(Object o) {
            if (!(o instanceof Map.Entry))
                return false;
            Map.Entry<?,?> e = (Map.Entry<?,?>)o;
            return valEquals(key, e.getKey()) && valEquals(value, e.getValue());
        }

        public int hashCode() {
            int keyHash = (key==null? 0 : key.hashCode());
            int valueHash = (value==null? 0 : value.hashCode());
            return keyHash ^ valueHash;
        }

        public String toString() {
            return key + "=" + value;
        }
    }

    public V put(K key, V value) {
        Entry<K,V> t = root;
        if (t == null) {
            compare(key, key);
            root = new Entry<>(key, value, null);
            size = 1;
            modCount++;
            return null;
        }
        int cmp;
        Entry<K,V> parent;
        Comparator<? super K> cpr = comparator;
        if (cpr != null) {
            do {
                parent = t;
                cmp = cpr.compare(key, t.key);
                if (cmp < 0)
                    t = t.left;
                else if (cmp > 0)
                    t = t.right;
                else
                    return t.setValue(value);
            } while (t != null);
        }
        else {
            if (key == null)
                throw new NullPointerException();
            @SuppressWarnings("unchecked")
                Comparable<? super K> k = (Comparable<? super K>) key;
            do {
                parent = t;
                cmp = k.compareTo(t.key);
                if (cmp < 0)
                    t = t.left;
                else if (cmp > 0)
                    t = t.right;
                else
                    return t.setValue(value);
            } while (t != null);
        }
        Entry<K,V> e = new Entry<>(key, value, parent);
        if (cmp < 0)
            parent.left = e;
        else
            parent.right = e;
        fixAfterInsertion(e);
        size++;
        modCount++;
        return null;
    }
}

从源码中可以看出，TreeMap内部使用红黑树结构来存储键值对。添加元素时，会根据键的顺序将新节点插入到合适的位置，并维护红黑树的平衡。

性能对比：在添加、删除和查找元素时，TreeMap的时间复杂度为O(log n)，因为红黑树的特性保证了操作的高效性。与HashMap相比，TreeMap由于需要维护键的顺序，性能会稍慢一些。

总结与选择建议

在选择使用哪种集合类时，需要根据具体的需求来决定：

如果需要随机访问元素：优先选择ArrayList，它的随机访问性能最好。
如果需要频繁在链表头部或尾部插入或删除元素：选择LinkedList，它在这些操作上性能较好。
如果需要存储不重复元素，并且对顺序没有要求：选择HashSet，它的性能在平均情况下较好。
如果需要存储不重复元素，并且要求元素有序：选择TreeSet，它可以保证元素的顺序。
如果需要存储键值对，并且对键的顺序没有要求：选择HashMap，它的性能在平均情况下较好。
如果需要存储键值对，并且要求键有序：选择TreeMap，它可以保证键的顺序。

通过深入了解Java集合框架的源码和性能特点，开发人员可以更加高效地选择和使用合适的集合类，从而提升程序的性能和效率。