Go语言编程笔记10：使用共享变量实现并发（续）

2021年11月26日 1609点热度 0人点赞 0条评论

本篇笔记是Go语言编程笔记9：使用共享变量实现并发的补充，将以一个《Go程序设计语言》中列举的一个函数缓存示例为基础进一步阐述如何使用并发来解决实际问题，以及说明goroutine和操作系统线程的区别。

函数缓存

所谓的函数缓存，就是对某些极其耗费时间或计算资源的函数调用进行缓存，如果系统中需要再次进行相同调用，直接返回缓存的结果，以达到节省时间或者计算资源的目的。

以下的代码为参考《Go程序设计语言》中的示例后编写。

首先我们先确定需要缓存的函数：

func getUrlBody(url string) (interface{}, error) {
    resp, err := http.Get(url)
    if err != nil {
        return nil, err
    }
    defer resp.Body.Close()
    return ioutil.ReadAll(resp.Body)
}

这是一个简单的通过网络获取http返回报文内容的函数，可以看做是一个比较消耗时间的函数，我们接下来利用函数缓存机制来缓存其调用结果，以达到提高程序执行效率的目的。

下面看完整代码：

package main

import (
    "fmt"
    "io/ioutil"
    "net/http"
    "time"
)

func getUrlBody(url string) (interface{}, error) {
    resp, err := http.Get(url)
    if err != nil {
        return make([]byte, 0), err
    }
    defer resp.Body.Close()
    return ioutil.ReadAll(resp.Body)
}

type CachedFunc func(string) (interface{}, error)

type CallResult struct {
    respBody interface{} //报文内容
    err      error       //错误
}

type FuncCache struct {
    cf      CachedFunc            //缓存的函数
    results map[string]CallResult //缓存的函数调用结果
}

func (fc *FuncCache) Get(url string) (interface{}, error) {
    result, ok := fc.results[url]
    if !ok {
        respBody, err := fc.cf(url)
        result = CallResult{respBody: respBody, err: err}
        fc.results[url] = result
    }
    return result.respBody, result.err
}

func NewFuncCache(cf CachedFunc) *FuncCache {
    var fc FuncCache
    fc.cf = cf
    fc.results = make(map[string]CallResult)
    return &fc
}

func main() {
    fc := NewFuncCache(getUrlBody)
    urls := []string{"http://baidu.com", "http://bing.com", "http://google.com", "http://baidu.com", "http://bing.com", "http://google.com"}
    for _, url := range urls {
        start := time.Now()
        respBody, err := fc.Get(url)
        usedTime := time.Since(start).Seconds()
        if err == nil {
            fmt.Printf("url:%s, used time:%.2fs, resp length:%d\n", url, usedTime, len(respBody.([]byte)))
        } else {
            fmt.Printf("url:%s, used time:%.2fs, error:%s\n", url, usedTime, err.Error())
        }
    }
    // url:http://baidu.com, used time:0.08s, error:Get "http://baidu.com": read tcp 192.168.1.13:3428->220.181.38.148:80: wsarecv: An existing connection was forcibly closed by the remote host.
    // url:http://bing.com, used time:0.47s, resp length:73874
    // url:http://google.com, used time:21.07s, error:Get "http://google.com": dial tcp 172.217.163.46:80: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
    // url:http://baidu.com, used time:0.00s, error:Get "http://baidu.com": read tcp 192.168.1.13:3428->220.181.38.148:80: wsarecv: An existing connection was forcibly closed by the remote host.
    // url:http://bing.com, used time:0.00s, resp length:73874
    // url:http://google.com, used time:0.00s, error:Get "http://google.com": dial tcp 172.217.163.46:80: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
}

这里构建了一个结构体FuncCache用来缓存函数以及其调用结果，具体的参数和结果映射使用一个map来存储。通过该结构体的Get方法查看网络请求结果时，会检查结果映射是否存在，如果没有就进行缓存。

这个顺序的缓存机制并没有什么难以实现和理解的部分，可以看到测试结果中，第二次请求同一个url时，调用过程被大大缩短，这符合我们的预期。

如我们在Go语言编程笔记7：goroutine和通道中阐述的那样，这里的多个url请求是互相独立的，是一个“并行问题”，所以事实上我们是可以通过并发的方式来进一步改写这个程序，以提高性能。

我们先使用互斥锁来进行改写：

package main

import (
    "fmt"
    "io/ioutil"
    "net/http"
    "sync"
    "time"
)

func getUrlBody(url string) (interface{}, error) {
    resp, err := http.Get(url)
    if err != nil {
        return make([]byte, 0), err
    }
    defer resp.Body.Close()
    return ioutil.ReadAll(resp.Body)
}

type CachedFunc func(string) (interface{}, error)

type CallResult struct {
    respBody interface{} //报文内容
    err      error       //错误
}

type FuncCache struct {
    cf           CachedFunc            //缓存的函数
    resultsMutex sync.RWMutex          //保护results
    results      map[string]CallResult //缓存的函数调用结果
}

func (fc *FuncCache) Get(url string) (interface{}, error) {
    fc.resultsMutex.RLock()
    result, ok := fc.results[url]
    fc.resultsMutex.RUnlock()
    if !ok {
        respBody, err := fc.cf(url)
        result = CallResult{respBody: respBody, err: err}
        fc.resultsMutex.Lock()
        if _, ok := fc.results[url]; !ok {
            fc.results[url] = result
        }
        fc.resultsMutex.Unlock()
    }
    return result.respBody, result.err
}

func NewFuncCache(cf CachedFunc) *FuncCache {
    var fc FuncCache
    fc.cf = cf
    fc.results = make(map[string]CallResult)
    return &fc
}

func main() {
    fc := NewFuncCache(getUrlBody)
    urls := []string{"http://baidu.com", "http://bing.com", "http://google.com", "http://baidu.com", "http://bing.com", "http://google.com"}
    var funcCallWG sync.WaitGroup
    for _, url := range urls {
        url := url
        funcCallWG.Add(1)
        go func() {
            defer funcCallWG.Done()
            start := time.Now()
            respBody, err := fc.Get(url)
            usedTime := time.Since(start).Seconds()
            if err == nil {
                fmt.Printf("url:%s, used time:%.2fs, resp length:%d\n", url, usedTime, len(respBody.([]byte)))
            } else {
                fmt.Printf("url:%s, used time:%.2fs, error:%s\n", url, usedTime, err.Error())
            }
        }()
    }
    funcCallWG.Wait()
    // url:http://baidu.com, used time:0.09s, error:Get "http://baidu.com": read tcp 192.168.1.13:3102->220.181.38.148:80: wsarecv: An existing connection was forcibly closed by the remote host.
    // url:http://baidu.com, used time:0.11s, resp length:81
    // url:http://bing.com, used time:0.55s, resp length:75947
    // url:http://bing.com, used time:0.57s, resp length:75947
    // url:http://google.com, used time:21.08s, error:Get "http://google.com": dial tcp 172.217.163.46:80: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
    // url:http://google.com, used time:21.08s, error:Get "http://google.com": dial tcp 172.217.163.46:80: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
}

改写本身很容易，只要添加上用户保护共享变量results map[string]CallResult的互斥锁，并在Get方法中添加上相应的加锁和解锁部分，以及在main函数中使用go来启动相关goroutine，并加上sync.WaitGroup以让主goroutine等待其它goroutine运行完毕。

但如果细心就能发现，这里比较奇怪的是，运行结果中两次对http://baidu.com的请求结果居然不同。仔细思考就能明白其中的要点：这里对http://baidu.com的两次并发Get请求中，一开始都是不存在缓存的，所以进入直接调用实际函数的步骤：respBody, err := fc.cf(url)。也就是说实际对http://baidu.com的请求发生了两次，只不过之后在写入缓存的环节因为我们加了互斥锁，并再次检测有无缓存，所以不会重复写入。

如果要规避这种问题，我们需要巧妙地使用通道：

...
type CallResult struct {
    ready    chan struct{}
    respBody interface{} //报文内容
    err      error       //错误
}

type FuncCache struct {
    cf           CachedFunc             //缓存的函数
    resultsMutex sync.Mutex             //保护results
    results      map[string]*CallResult //缓存的函数调用结果
}

func (fc *FuncCache) Get(url string) (interface{}, error) {
    fc.resultsMutex.Lock()
    result, ok := fc.results[url]
    if !ok {
        result = &CallResult{ready: make(chan struct{})}
        fc.results[url] = result
        fc.resultsMutex.Unlock()
        result.respBody, result.err = fc.cf(url)
        close(result.ready)
    } else {
        fc.resultsMutex.Unlock()
        <-result.ready
    }
    return result.respBody, result.err
}

func NewFuncCache(cf CachedFunc) *FuncCache {
    var fc FuncCache
    fc.cf = cf
    fc.results = make(map[string]*CallResult)
    return &fc
}

func main() {
    fc := NewFuncCache(getUrlBody)
    urls := []string{"http://baidu.com", "http://bing.com", "http://google.com", "http://baidu.com", "http://bing.com", "http://google.com"}
    var funcCallWG sync.WaitGroup
    for _, url := range urls {
        url := url
        funcCallWG.Add(1)
        go func() {
            defer funcCallWG.Done()
            start := time.Now()
            respBody, err := fc.Get(url)
            usedTime := time.Since(start).Seconds()
            if err == nil {
                fmt.Printf("url:%s, used time:%.2fs, resp length:%d\n", url, usedTime, len(respBody.([]byte)))
            } else {
                fmt.Printf("url:%s, used time:%.2fs, error:%s\n", url, usedTime, err.Error())
            }
        }()
    }
    funcCallWG.Wait()
    // request http->http://baidu.com
    // request http->http://bing.com
    // request http->http://google.com
    // url:http://baidu.com, used time:0.10s, resp length:81
    // url:http://baidu.com, used time:0.10s, resp length:81
    // url:http://bing.com, used time:0.57s, resp length:75947
    // url:http://bing.com, used time:0.57s, resp length:75947
    // url:http://google.com, used time:21.08s, error:Get "http://google.com": dial tcp 172.217.163.46:80: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
    // url:http://google.com, used time:21.08s, error:Get "http://google.com": dial tcp 172.217.163.46:80: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
}

这里在返回结果中添加了一个标识结果是否已经准备完毕的通道ready，事实上这个通道起到了同步不同goroutine的效果。当第一个没有缓存的goroutine调用Get方法后，程序将在生成一个CallResult类型的实例，并将其添加到缓存映射后解除互斥锁，在此时我们还没有真正调用http请求。这样做的好处在于，如果将http请求也纳入到互斥锁的范围中，将极大的影响性能，这意味着所有的http请求的初始化goroutine都会是串行的，这显然不符合并发的优化要求。但将http请求放在互斥锁范围外又会有一个问题，如果负责初始化的goroutine还没有将结果回写到缓存，那么其他的goroutine尝试读取缓存结果怎么办？

此时就是ready通道发挥作用的地方，初始化goroutine在放开互斥锁前缓存的CallResult实例中虽然不包含真正的http请求结果，但是包含一个ready通道，所有稍后进入Get方法的只读goroutine在读取真实返回数据前，都需要先尝试从ready通道中读取东西。通过前边的学习，我们已经知道，如果尝试从一个没有数据的通道中读取数据，将造成阻塞。在这个示例中，这种阻塞行为一直会等到负责缓存的goroutine调用玩http请求并将结果回写到缓存的CallResult实例中并关闭ready通道之后。

需要注意的是，这里因为不同的goroutine需要访问相同的ready通道进行同步，所以map也要修改为保存指针元素。并且虽然不同的goroutine访问了同一个指针的数据，但读行为都会在写行为之后发生（ready通道的同步保证了这一点），所以并不存在实际的数据竟态。

现在可以看到结果：真实的http请求都仅会发生一次，不会重复调用，且相同http的缓存和非缓存调用时长完全相等，因为实际上它们是同时发生，并同时结束的（缓存调用一直在等待ready通道关闭）。

这个示例存在一个缺陷，当前代码是不能使用读写锁的，只能使用普通的互斥锁。

这里使用ready通道对CallResult的改造很像是Python中的Future类，尝试通过协程访问该类的实例同样会阻塞，直到真实结果返回。

在Go语言编程笔记9：使用共享变量实现并发中我们讨论了几种规避数据竟态的方式，除了上面使用互斥锁外，还可以使用通道来共享数据，下面我们用这种方式来改写：

package main

import (
    "fmt"
    "io/ioutil"
    "net/http"
    "sync"
    "time"
)

func getUrlBody(url string) (interface{}, error) {
    fmt.Printf("request http->%s\n", url)
    resp, err := http.Get(url)
    if err != nil {
        return make([]byte, 0), err
    }
    defer resp.Body.Close()
    return ioutil.ReadAll(resp.Body)
}

type CachedFunc func(string) (interface{}, error)

type CallResult struct {
    ready    chan struct{} //结果是否已经准备好了
    respBody interface{}   //报文内容
    err      error         //错误
}

// 向FuncCache发起请求的结构体
type FCRequest struct {
    url      string          //请求的url
    respChan chan CallResult //返回结果的通道
}

type FuncCache struct {
    cf         CachedFunc             //缓存的函数
    results    map[string]*CallResult //缓存的函数调用结果
    requetChan chan FCRequest
}

func (fc *FuncCache) Get(url string) (interface{}, error) {
    respChan := make(chan CallResult)
    fc.requetChan <- FCRequest{url: url, respChan: respChan}
    result := <-respChan
    return result.respBody, result.err
}

//启动FC服务
func (fc *FuncCache) Service() {
    go func() {
        for fcr := range fc.requetChan {
            fcr := fcr
            result, ok := fc.results[fcr.url]
            if !ok {
                result = &CallResult{ready: make(chan struct{})}
                fc.results[fcr.url] = result
                go func() {
                    result.respBody, result.err = fc.cf(fcr.url)
                    close(result.ready)
                }()
            }
            go func() {
                <-result.ready
                fcr.respChan <- *result
            }()
        }
    }()
}

//关闭FC服务
func (fc *FuncCache) Close() {
    close(fc.requetChan)
}

func NewFuncCache(cf CachedFunc) *FuncCache {
    var fc FuncCache
    fc.cf = cf
    fc.results = make(map[string]*CallResult)
    fc.requetChan = make(chan FCRequest)
    return &fc
}

func main() {
    fc := NewFuncCache(getUrlBody)
    urls := []string{"http://baidu.com", "http://bing.com", "http://google.com", "http://baidu.com", "http://bing.com", "http://google.com"}
    var funcCallWG sync.WaitGroup
    fc.Service()
    for _, url := range urls {
        url := url
        funcCallWG.Add(1)
        go func() {
            defer funcCallWG.Done()
            start := time.Now()
            respBody, err := fc.Get(url)
            usedTime := time.Since(start).Seconds()
            if err == nil {
                fmt.Printf("url:%s, used time:%.2fs, resp length:%d\n", url, usedTime, len(respBody.([]byte)))
            } else {
                fmt.Printf("url:%s, used time:%.2fs, error:%s\n", url, usedTime, err.Error())
            }
        }()
    }
    funcCallWG.Wait()
    fc.Close()
    // request http->http://baidu.com
    // request http->http://bing.com
    // request http->http://google.com
    // url:http://baidu.com, used time:0.12s, resp length:81
    // url:http://baidu.com, used time:0.12s, resp length:81
    // url:http://bing.com, used time:0.52s, resp length:75947
    // url:http://bing.com, used time:0.52s, resp length:75947
    // url:http://google.com, used time:21.07s, error:Get "http://google.com": dial tcp 172.217.160.78:80: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
    // url:http://google.com, used time:21.07s, error:Get "http://google.com": dial tcp 172.217.160.78:80: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
}

个人觉得通道版本的代码更难理解一些，这里的核心思想就是将共享变量results map[string]CallResult的访问都集中在一个服务goroutineService中，其它的goroutine无论是缓存还是非缓存访问都通过通道来请求和获取结果。

因为其它goroutine需要通过通道来获取结果，所以通过FuncCache实例的请求通道传入数据的时候需要随数据附带一个返回数据用的通道是一个很自然的想法，所以这里添加了一个新的类型FCRequest。

此外，Service方法中的服务goroutine中，并发的瓶颈同样在真实的网络请求调用上，所以我们需要想办法让其并发调用，这里使用的技巧同样是通过CallResult实例的ready通道来实现同步，在并发调用后等待通道关闭，然后通过回写通道将结果返回给请求的goroutine。

需要注意的是这里并不能不等待同步直接通过通道回写结果，因为我们通过通道回写的实际上是数据的“拷贝”，所以这样做很可能将真实结果还没有生成的CallResult实例的拷贝回写。

main函数中唯一的改动就是先调用Service方法启动服务goroutine，在所有并发任务结束后调用Close方法关闭请求通道以让服务goroutine退出。

goroutine和线程

goroutine和操作系统线程有以下区别。

可增长的栈

对线程或者goroutine，调度器都需要分配一部分内存作为栈来保存其对应的数据，这个栈的大小决定了线程或goroutine的运行情况，比如函数进行深递归调用时的递归层数。

对于操作系统线程，这个栈的大小是固定的，通常为2MB。这个值对一般情况下的goroutine来说太大，会造成性能浪费，而对某些深度递归的goroutine又太小。所以Go语言的goroutine的栈大小是动态的，最大可以支持1GB的栈。

goroutine调度

操作系统的线程调度是由硬件时钟触发的，这保证了多任务的操作系统上多个应用获取到的执行时间机会是均等的，不会造成某个应用长时间没有响应的状况出现。但这么做的代价是频繁的“上下文切换”，即将下一个线程从内存读取到CPU和寄存器，将上一个线程从CPU和寄存器保存到内存。虽然有多级缓存等其他机制进行优化，但这依然会造成某些高负载的计算型任务线程执行效率较低。

Go语言的goroutine调度器不会频繁切换goroutine，它采用的策略是让m个goroutine在n个处理器线程执行（n一般表示当前电脑的CPU核心数），如果其中某个goroutine发生阻塞，就切换其它的goroutine替代。这样就可以保证较低上下文切换消耗的情况下保证执行效率。

当然实际调度算法可能不会这么简单，因为goroutine之间实际上可能是协作关系，比如示例中的服务goroutine，如果长时间没有CPU执行机会，可能会导致其它goroutine卡死。当然如果只有一个服务goroutine这并不是问题，因为其他goroutine阻塞了自然就轮到服务goroutine执行了，但如果有多个服务型的goroutine呢？

GOMAXPROCS

Go语言的goroutine调度器默认使用当前机器的CPU核心数个操作系统线程来执行goroutine，即如果是8核CPU，就会用8个操作系统线程来执行goroutine，这样可以最大发挥硬件的性能。

如果某些时候需要指定调度器的最大OS线程数，可以通过环境变量GOMAXPROCS来修改：

package main

import (
    "fmt"
    "runtime"
)

func main() {
    runtime.GOMAXPROCS(2)
    for {
        go fmt.Print(0)
        fmt.Print(1)
    }
}