Go常见的并发编程陷阱 - 摩柯技术社区

竞争条件（Race Condition）

概念与本质 竞争条件是并发编程中最常见的陷阱之一。当多个并发执行的 goroutine 访问和修改共享资源时，如果对这些操作的顺序没有进行适当的同步控制，就会出现竞争条件。在 Go 语言中，共享资源可以是变量、数据结构等。

本质上，竞争条件产生的原因是现代多核处理器和操作系统的并发执行模型。不同的 goroutine 可能在不同的 CPU 核心上并行执行，并且操作系统会根据调度策略在不同的时间点暂停和恢复 goroutine 的执行。这就导致了多个 goroutine 对共享资源的访问顺序是不可预测的，从而可能引发数据不一致等问题。

代码示例

package main

import (
    "fmt"
)

var counter int

func increment() {
    counter = counter + 1
}

func main() {
    var numGoroutines = 1000
    for i := 0; i < numGoroutines; i++ {
        go increment()
    }
    // 这里简单等待一会儿，希望所有 goroutine 都能执行完，但这并不是一个可靠的同步方式
    // 实际应用中需要更好的同步机制
    fmt.Println("Final counter value:", counter)
}

在上述代码中，counter 是一个共享变量，多个 goroutine 同时调用 increment 函数对其进行加一操作。由于没有同步机制，每次运行程序可能得到不同的结果，因为不同 goroutine 对 counter 的读取、修改和写入操作的顺序是不确定的。

解决方案
- 互斥锁（Mutex）：使用 sync.Mutex 来保护共享资源。

package main

import (
    "fmt"
    "sync"
)

var counter int
var mu sync.Mutex

func increment() {
    mu.Lock()
    counter = counter + 1
    mu.Unlock()
}

func main() {
    var numGoroutines = 1000
    var wg sync.WaitGroup
    wg.Add(numGoroutines)
    for i := 0; i < numGoroutines; i++ {
        go func() {
            defer wg.Done()
            increment()
        }()
    }
    wg.Wait()
    fmt.Println("Final counter value:", counter)
}

在这个改进版本中，sync.Mutex 确保了在任何时刻只有一个 goroutine 可以访问和修改 counter 变量，从而避免了竞争条件。

读写锁（RWMutex）：如果对共享资源的操作以读操作居多，可以使用 sync.RWMutex。它允许多个 goroutine 同时进行读操作，但在写操作时会独占资源。

package main

import (
    "fmt"
    "sync"
)

var data int
var rwmu sync.RWMutex

func read() int {
    rwmu.RLock()
    defer rwmu.RUnlock()
    return data
}

func write(newData int) {
    rwmu.Lock()
    data = newData
    rwmu.Unlock()
}

func main() {
    var wg sync.WaitGroup
    wg.Add(2)
    go func() {
        defer wg.Done()
        write(10)
    }()
    go func() {
        defer wg.Done()
        value := read()
        fmt.Println("Read value:", value)
    }()
    wg.Wait()
}

在这个示例中，read 函数使用读锁（RLock），允许多个 goroutine 同时读取 data，而 write 函数使用写锁（Lock），确保写操作的原子性，避免其他 goroutine 在写操作时进行读或写操作。

死锁（Deadlock）

概念与本质 死锁是指两个或多个 goroutine 相互等待对方释放资源，导致所有 goroutine 都无法继续执行的情况。死锁通常发生在使用锁或通道进行同步时，由于不正确的资源获取顺序或同步逻辑导致。

在 Go 语言中，死锁的本质是资源依赖关系的循环。例如，goroutine A 持有资源 X 并等待资源 Y，而 goroutine B 持有资源 Y 并等待资源 X，这样就形成了一个死循环，两个 goroutine 都无法继续执行。

代码示例

package main

import (
    "fmt"
    "sync"
)

var mu1 sync.Mutex
var mu2 sync.Mutex

func goroutine1() {
    mu1.Lock()
    fmt.Println("Goroutine 1 acquired mu1")
    mu2.Lock()
    fmt.Println("Goroutine 1 acquired mu2")
    mu2.Unlock()
    mu1.Unlock()
}

func goroutine2() {
    mu2.Lock()
    fmt.Println("Goroutine 2 acquired mu2")
    mu1.Lock()
    fmt.Println("Goroutine 2 acquired mu1")
    mu1.Unlock()
    mu2.Unlock()
}

func main() {
    var wg sync.WaitGroup
    wg.Add(2)
    go func() {
        defer wg.Done()
        goroutine1()
    }()
    go func() {
        defer wg.Done()
        goroutine2()
    }()
    wg.Wait()
}

在上述代码中，goroutine1 先获取 mu1 锁，然后尝试获取 mu2 锁，而 goroutine2 先获取 mu2 锁，然后尝试获取 mu1 锁。如果 goroutine1 先获取了 mu1 锁，goroutine2 先获取了 mu2 锁，那么两个 goroutine 就会相互等待对方释放锁，从而导致死锁。

解决方案
- 避免循环依赖：仔细设计资源获取的顺序，确保不会形成循环依赖。例如，统一按照某个固定顺序获取锁。

package main

import (
    "fmt"
    "sync"
)

var mu1 sync.Mutex
var mu2 sync.Mutex

func goroutine1() {
    mu1.Lock()
    fmt.Println("Goroutine 1 acquired mu1")
    mu2.Lock()
    fmt.Println("Goroutine 1 acquired mu2")
    mu2.Unlock()
    mu1.Unlock()
}

func goroutine2() {
    mu1.Lock()
    fmt.Println("Goroutine 2 acquired mu1")
    mu2.Lock()
    fmt.Println("Goroutine 2 acquired mu2")
    mu2.Unlock()
    mu1.Unlock()
}

func main() {
    var wg sync.WaitGroup
    wg.Add(2)
    go func() {
        defer wg.Done()
        goroutine1()
    }()
    go func() {
        defer wg.Done()
        goroutine2()
    }()
    wg.Wait()
}

在这个改进版本中，goroutine1 和 goroutine2 都按照先获取 mu1 锁，再获取 mu2 锁的顺序，避免了死锁。

使用超时机制：在获取锁或通道操作时设置超时，避免无限期等待。

package main

import (
    "context"
    "fmt"
    "sync"
    "time"
)

var mu sync.Mutex

func goroutine(ctx context.Context) {
    select {
    case <-time.After(2 * time.Second):
        fmt.Println("Timeout waiting for lock")
    case <-ctx.Done():
        fmt.Println("Context cancelled")
    default:
        mu.Lock()
        fmt.Println("Acquired lock")
        mu.Unlock()
    }
}

func main() {
    ctx, cancel := context.WithTimeout(context.Background(), 1*time.Second)
    defer cancel()
    var wg sync.WaitGroup
    wg.Add(1)
    go func() {
        defer wg.Done()
        goroutine(ctx)
    }()
    wg.Wait()
}

在这个示例中，使用 context 和 time.After 实现了获取锁的超时机制，如果在规定时间内无法获取锁，就会执行相应的超时逻辑，避免死锁。

通道相关陷阱

无缓冲通道的误用
- 概念与本质 无缓冲通道在 Go 语言中是一种特殊的通道类型，它不存储任何数据，发送操作和接收操作必须同时进行，否则会导致 goroutine 阻塞。如果对无缓冲通道的这种特性理解不当，就容易出现问题。本质上，无缓冲通道用于在 goroutine 之间进行同步通信，确保数据的发送和接收是原子性的且同步进行的。
- 代码示例

package main

import (
    "fmt"
)

func sendData(ch chan int) {
    ch <- 42
    fmt.Println("Data sent")
}

func main() {
    var ch chan int
    go sendData(ch)
    data := <-ch
    fmt.Println("Received data:", data)
}

在上述代码中，sendData 函数尝试向未初始化的通道 ch 发送数据，这会导致 sendData goroutine 永远阻塞，因为没有接收者。同时，主 goroutine 尝试从 ch 接收数据，由于通道未初始化，也会阻塞，最终导致死锁。

解决方案 正确初始化通道，并确保发送和接收操作的同步。

package main

import (
    "fmt"
)

func sendData(ch chan int) {
    ch <- 42
    fmt.Println("Data sent")
}

func main() {
    ch := make(chan int)
    go sendData(ch)
    data := <-ch
    fmt.Println("Received data:", data)
}

在这个改进版本中，通道 ch 被正确初始化，sendData goroutine 可以将数据发送到通道，主 goroutine 也可以从通道接收数据，程序正常运行。

缓冲通道的满与空问题
- 概念与本质 缓冲通道是带有一定容量的通道，它可以在发送方和接收方不同步时存储一定数量的数据。但是，如果不注意通道的满与空状态，也会引发问题。当缓冲通道已满，再进行发送操作会导致 goroutine 阻塞；当缓冲通道已空，进行接收操作也会导致 goroutine 阻塞。
- 代码示例

package main

import (
    "fmt"
    "time"
)

func producer(ch chan int) {
    for i := 0; i < 10; i++ {
        ch <- i
        fmt.Printf("Produced %d\n", i)
    }
    close(ch)
}

func consumer(ch chan int) {
    for data := range ch {
        fmt.Printf("Consumed %d\n", data)
        time.Sleep(200 * time.Millisecond)
    }
}

func main() {
    ch := make(chan int, 5)
    go producer(ch)
    go consumer(ch)
    time.Sleep(2 * time.Second)
}

在这个示例中，producer 函数向容量为 5 的缓冲通道 ch 发送数据，consumer 函数从通道接收数据。如果 producer 发送数据的速度过快，而 consumer 接收数据的速度较慢，通道可能会被填满，导致 producer 阻塞。

解决方案 可以调整缓冲通道的容量，或者在发送和接收操作中增加适当的控制逻辑。例如，可以使用 select 语句结合 time.After 来处理通道满或空的情况。

package main

import (
    "fmt"
    "time"
)

func producer(ch chan int) {
    for i := 0; i < 10; i++ {
        select {
        case ch <- i:
            fmt.Printf("Produced %d\n", i)
        case <-time.After(100 * time.Millisecond):
            fmt.Printf("Timeout while producing %d\n", i)
        }
    }
    close(ch)
}

func consumer(ch chan int) {
    for data := range ch {
        fmt.Printf("Consumed %d\n", data)
        time.Sleep(200 * time.Millisecond)
    }
}

func main() {
    ch := make(chan int, 5)
    go producer(ch)
    go consumer(ch)
    time.Sleep(2 * time.Second)
}

在这个改进版本中，producer 使用 select 语句和 time.After 来处理通道满的情况，如果在 100 毫秒内无法将数据发送到通道，就执行超时逻辑，避免无限期阻塞。

上下文管理不当

概念与本质 上下文（context）在 Go 语言中用于控制并发操作的生命周期，传递截止时间、取消信号等。如果上下文管理不当，可能会导致 goroutine 泄漏、资源无法及时释放等问题。本质上，上下文是一种用于在不同 goroutine 之间传递控制信号和元数据的机制，确保并发操作能够按照预期的方式结束。
代码示例

package main

import (
    "context"
    "fmt"
    "time"
)

func longRunningTask(ctx context.Context) {
    select {
    case <-ctx.Done():
        fmt.Println("Task cancelled")
        return
    case <-time.After(5 * time.Second):
        fmt.Println("Task completed")
    }
}

func main() {
    ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
    defer cancel()
    go longRunningTask(ctx)
    time.Sleep(3 * time.Second)
}

在上述代码中，longRunningTask 函数通过 select 语句监听 ctx.Done() 信号，以判断任务是否被取消。main 函数创建了一个带有 2 秒超时的上下文，并启动了 longRunningTask goroutine。由于 longRunningTask 执行时间超过了上下文的超时时间，ctx.Done() 信号会被触发，longRunningTask 函数会正确处理取消信号并结束。

但是，如果 longRunningTask 函数没有正确处理 ctx.Done() 信号，就会导致 goroutine 泄漏。例如：

package main

import (
    "context"
    "fmt"
    "time"
)

func longRunningTask(ctx context.Context) {
    time.Sleep(5 * time.Second)
    fmt.Println("Task completed")
}

func main() {
    ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
    defer cancel()
    go longRunningTask(ctx)
    time.Sleep(3 * time.Second)
}

在这个示例中，longRunningTask 函数没有监听 ctx.Done() 信号，即使上下文超时，该 goroutine 仍会继续执行，导致 goroutine 泄漏。

解决方案 在所有长时间运行的 goroutine 中正确处理上下文的取消信号。

package main

import (
    "context"
    "fmt"
    "time"
)

func longRunningTask(ctx context.Context) {
    ticker := time.NewTicker(100 * time.Millisecond)
    defer ticker.Stop()
    for {
        select {
        case <-ctx.Done():
            fmt.Println("Task cancelled")
            return
        case <-ticker.C:
            // 模拟任务执行
            fmt.Println("Task is running...")
        }
    }
}

func main() {
    ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
    defer cancel()
    go longRunningTask(ctx)
    time.Sleep(3 * time.Second)
}

在这个改进版本中，longRunningTask 函数通过 select 语句监听 ctx.Done() 信号，确保在上下文取消时能够正确结束，避免 goroutine 泄漏。同时，使用 time.Ticker 模拟任务的执行过程。

未正确处理错误

概念与本质 在并发编程中，错误处理尤为重要。如果在 goroutine 中发生错误而未正确处理，可能会导致程序出现不可预测的行为，甚至崩溃。由于 goroutine 是并发执行的，错误可能在不同的 goroutine 中发生，并且可能不容易被及时捕获和处理。本质上，未正确处理错误会破坏程序的健壮性和可靠性。
代码示例

package main

import (
    "fmt"
)

func divide(a, b int) (int, error) {
    if b == 0 {
        return 0, fmt.Errorf("division by zero")
    }
    return a / b, nil
}

func main() {
    var results []int
    var nums = []int{10, 20, 0, 30}
    for _, num := range nums {
        go func(n int) {
            result, err := divide(100, n)
            if err != nil {
                // 这里没有正确处理错误，只是简单忽略
                return
            }
            results = append(results, result)
        }(num)
    }
    // 这里没有等待所有 goroutine 完成，并且未处理结果
    fmt.Println("Results:", results)
}

在上述代码中，divide 函数在除数为 0 时返回错误。但是，在 goroutine 中，错误被简单忽略，没有进行适当的处理。同时，主函数没有等待所有 goroutine 完成，并且没有正确收集和处理结果，导致程序的行为不可预测。

解决方案 可以使用通道来传递错误和结果，确保所有 goroutine 完成并正确处理错误。

package main

import (
    "fmt"
)

func divide(a, b int, resultChan chan<- int, errChan chan<- error) {
    if b == 0 {
        errChan <- fmt.Errorf("division by zero")
        return
    }
    resultChan <- a / b
}

func main() {
    var results []int
    var nums = []int{10, 20, 0, 30}
    resultChan := make(chan int)
    errChan := make(chan error)
    for _, num := range nums {
        go divide(100, num, resultChan, errChan)
    }
    for i := 0; i < len(nums); i++ {
        select {
        case result := <-resultChan:
            results = append(results, result)
        case err := <-errChan:
            fmt.Println("Error:", err)
        }
    }
    close(resultChan)
    close(errChan)
    fmt.Println("Results:", results)
}

在这个改进版本中，divide 函数通过 resultChan 传递结果，通过 errChan 传递错误。主函数通过 select 语句从这两个通道中接收数据，确保所有 goroutine 完成，并正确处理错误和结果。

总结常见陷阱及防范措施

竞争条件
- 陷阱：多个 goroutine 无序访问和修改共享资源。
- 防范措施：使用 sync.Mutex 或 sync.RWMutex 进行同步控制，确保同一时间只有一个 goroutine 可以访问共享资源。
死锁
- 陷阱：goroutine 之间相互等待对方释放资源，形成循环依赖。
- 防范措施：避免循环依赖，按照固定顺序获取锁，或者使用超时机制在获取锁或通道操作时设置超时。
通道相关陷阱
- 无缓冲通道误用：
  - 陷阱：未正确初始化无缓冲通道，或在发送和接收操作不同步时使用。
  - 防范措施：正确初始化通道，并确保发送和接收操作同步进行。
- 缓冲通道满与空问题：
  - 陷阱：不注意缓冲通道的满与空状态，导致 goroutine 阻塞。
  - 防范措施：调整缓冲通道容量，或使用 select 语句结合 time.After 处理通道满或空的情况。
上下文管理不当
- 陷阱：在 goroutine 中未正确处理上下文的取消信号，导致 goroutine 泄漏或资源无法及时释放。
- 防范措施：在所有长时间运行的 goroutine 中使用 select 语句监听 ctx.Done() 信号，确保在上下文取消时能够正确结束。
未正确处理错误
- 陷阱：在 goroutine 中发生错误未及时捕获和处理，导致程序出现不可预测行为。
- 防范措施：使用通道传递错误和结果，确保所有 goroutine 完成，并在主函数中正确处理错误和结果。

通过了解和避免这些常见的并发编程陷阱，Go 开发者可以编写出更健壮、可靠的并发程序。在实际开发中，要养成良好的编程习惯，仔细设计同步机制、上下文管理和错误处理逻辑，以充分发挥 Go 语言并发编程的优势。