跳到主要内容

源码 上解读 GMP模型

· 阅读需 15 分钟
ahKevinXy

path : src/runtime/runtime1.go

go version 1.16.1

G

type g struct {
// Stack (栈)parameters(参数).
// stack describes the actual stack memory: [stack.lo, stack.hi).
// stackguard0 is the stack pointer compared in the Go stack growth prologue.
//stackguard0是在Go堆栈增长序言中比较的堆栈指针
// It is stack.lo+StackGuard normally, but can be StackPreempt to trigger a preemption.
//lo+StackGuard正常,但可以是StackPreempt触发抢占
// stackguard1 is the stack pointer compared in the C stack growth prologue.
//stackguard1是在C堆栈增长序言中比较的堆栈指针

// It is stack.lo+StackGuard on g0 and gsignal stacks.
// It is ~0 on other goroutine stacks, to trigger a call to morestackc (and crash).
stack stack // offset known to runtime/cgo 已知的偏移量为runtime/cgo
stackguard0 uintptr // offset known to liblink 已知liblink的偏移量
stackguard1 uintptr // offset known to liblink

_panic *_panic // innermost panic - offset known to liblink
_defer *_defer // innermost defer
m *m // current m; offset known to arm liblink
sched gobuf
syscallsp uintptr // if status==Gsyscall, syscallsp = sched.sp to use during gc
syscallpc uintptr // if status==Gsyscall, syscallpc = sched.pc to use during gc
stktopsp uintptr // expected sp at top of stack, to check in traceback
param unsafe.Pointer // passed parameter on wakeup
atomicstatus uint32
stackLock uint32 // sigprof/scang lock; TODO: fold in to atomicstatus
goid int64
schedlink guintptr
waitsince int64 // approx time when the g become blocked
waitreason waitReason // if status==Gwaiting

preempt bool // preemption signal, duplicates stackguard0 = stackpreempt
preemptStop bool // transition to _Gpreempted on preemption; otherwise, just deschedule
preemptShrink bool // shrink stack at synchronous safe point

// asyncSafePoint is set if g is stopped at an asynchronous
// safe point. This means there are frames on the stack
// without precise pointer information.
asyncSafePoint bool

paniconfault bool // panic (instead of crash) on unexpected fault address
gcscandone bool // g has scanned stack; protected by _Gscan bit in status
throwsplit bool // must not split stack
// activeStackChans indicates that there are unlocked channels
// pointing into this goroutine's stack. If true, stack
// copying needs to acquire channel locks to protect these
// areas of the stack.
activeStackChans bool
// parkingOnChan indicates that the goroutine is about to
// park on a chansend or chanrecv. Used to signal an unsafe point
// for stack shrinking. It's a boolean value, but is updated atomically.
parkingOnChan uint8

raceignore int8 // ignore race detection events
sysblocktraced bool // StartTrace has emitted EvGoInSyscall about this goroutine
sysexitticks int64 // cputicks when syscall has returned (for tracing)
traceseq uint64 // trace event sequencer
tracelastp puintptr // last P emitted an event for this goroutine
lockedm muintptr
sig uint32
writebuf []byte
sigcode0 uintptr
sigcode1 uintptr
sigpc uintptr
gopc uintptr // pc of go statement that created this goroutine
ancestors *[]ancestorInfo // ancestor information goroutine(s) that created this goroutine (only used if debug.tracebackancestors)
startpc uintptr // pc of goroutine function
racectx uintptr
waiting *sudog // sudog structures this g is waiting on (that have a valid elem ptr); in lock order
cgoCtxt []uintptr // cgo traceback context
labels unsafe.Pointer // profiler labels
timer *timer // cached timer for time.Sleep
selectDone uint32 // are we participating in a select and did someone win the race?

// Per-G GC state

// gcAssistBytes is this G's GC assist credit in terms of
// bytes allocated. If this is positive, then the G has credit
// to allocate gcAssistBytes bytes without assisting. If this
// is negative, then the G must correct this by performing
// scan work. We track this in bytes to make it fast to update
// and check for debt in the malloc hot path. The assist ratio
// determines how this corresponds to scan work debt.
gcAssistBytes int64
}

M


type m struct {
g0 *g // goroutine with scheduling stack
morebuf gobuf // gobuf arg to morestack
divmod uint32 // div/mod denominator for arm - known to liblink

// Fields not known to debuggers.
procid uint64 // for debuggers, but offset not hard-coded
gsignal *g // signal-handling g
goSigStack gsignalStack // Go-allocated signal handling stack
sigmask sigset // storage for saved signal mask
tls [6]uintptr // thread-local storage (for x86 extern register)
mstartfn func()
curg *g // current running goroutine
caughtsig guintptr // goroutine running during fatal signal
p puintptr // attached p for executing go code (nil if not executing go code)
nextp puintptr
oldp puintptr // the p that was attached before executing a syscall
id int64
mallocing int32
throwing int32
preemptoff string // if != "", keep curg running on this m
locks int32
dying int32
profilehz int32
spinning bool // m is out of work and is actively looking for work
blocked bool // m is blocked on a note
newSigstack bool // minit on C thread called sigaltstack
printlock int8
incgo bool // m is executing a cgo call
freeWait uint32 // if == 0, safe to free g0 and delete m (atomic)
fastrand [2]uint32
needextram bool
traceback uint8
ncgocall uint64 // number of cgo calls in total
ncgo int32 // number of cgo calls currently in progress
cgoCallersUse uint32 // if non-zero, cgoCallers in use temporarily
cgoCallers *cgoCallers // cgo traceback if crashing in cgo call
doesPark bool // non-P running threads: sysmon and newmHandoff never use .park
park note
alllink *m // on allm
schedlink muintptr
lockedg guintptr
createstack [32]uintptr // stack that created this thread.
lockedExt uint32 // tracking for external LockOSThread
lockedInt uint32 // tracking for internal lockOSThread
nextwaitm muintptr // next m waiting for lock
waitunlockf func(*g, unsafe.Pointer) bool
waitlock unsafe.Pointer
waittraceev byte
waittraceskip int
startingtrace bool
syscalltick uint32
freelink *m // on sched.freem

// mFixup is used to synchronize OS related m state (credentials etc)
// use mutex to access.
mFixup struct {
lock mutex
fn func(bool) bool
}

// these are here because they are too large to be on the stack
// of low-level NOSPLIT functions.
libcall libcall
libcallpc uintptr // for cpu profiler
libcallsp uintptr
libcallg guintptr
syscall libcall // stores syscall parameters on windows

vdsoSP uintptr // SP for traceback while in VDSO call (0 if not in call)
vdsoPC uintptr // PC for traceback while in VDSO call

// preemptGen counts the number of completed preemption
// signals. This is used to detect when a preemption is
// requested, but fails. Accessed atomically.
preemptGen uint32

// Whether this is a pending preemption signal on this M.
// Accessed atomically.
signalPending uint32

dlogPerM

mOS

// Up to 10 locks held by this m, maintained by the lock ranking code.
locksHeldLen int
locksHeld [10]heldLockInfo
}

P


type p struct {
id int32
status uint32 // one of pidle/prunning/...
link puintptr
schedtick uint32 // incremented on every scheduler call
syscalltick uint32 // incremented on every system call
sysmontick sysmontick // last tick observed by sysmon
m muintptr // back-link to associated m (nil if idle)
mcache *mcache
pcache pageCache
raceprocctx uintptr

deferpool [5][]*_defer // pool of available defer structs of different sizes (see panic.go)
deferpoolbuf [5][32]*_defer

// Cache of goroutine ids, amortizes accesses to runtime·sched.goidgen.
goidcache uint64
goidcacheend uint64

// Queue of runnable goroutines. Accessed without lock.
runqhead uint32
runqtail uint32
runq [256]guintptr
// runnext, if non-nil, is a runnable G that was ready'd by
// the current G and should be run next instead of what's in
// runq if there's time remaining in the running G's time
// slice. It will inherit the time left in the current time
// slice. If a set of goroutines is locked in a
// communicate-and-wait pattern, this schedules that set as a
// unit and eliminates the (potentially large) scheduling
// latency that otherwise arises from adding the ready'd
// goroutines to the end of the run queue.
runnext guintptr

// Available G's (status == Gdead)
gFree struct {
gList
n int32
}

sudogcache []*sudog
sudogbuf [128]*sudog

// Cache of mspan objects from the heap.
mspancache struct {
// We need an explicit length here because this field is used
// in allocation codepaths where write barriers are not allowed,
// and eliminating the write barrier/keeping it eliminated from
// slice updates is tricky, moreso than just managing the length
// ourselves.
len int
buf [128]*mspan
}

tracebuf traceBufPtr

// traceSweep indicates the sweep events should be traced.
// This is used to defer the sweep start event until a span
// has actually been swept.
traceSweep bool
// traceSwept and traceReclaimed track the number of bytes
// swept and reclaimed by sweeping in the current sweep loop.
traceSwept, traceReclaimed uintptr

palloc persistentAlloc // per-P to avoid mutex

_ uint32 // Alignment for atomic fields below

// The when field of the first entry on the timer heap.
// This is updated using atomic functions.
// This is 0 if the timer heap is empty.
timer0When uint64

// The earliest known nextwhen field of a timer with
// timerModifiedEarlier status. Because the timer may have been
// modified again, there need not be any timer with this value.
// This is updated using atomic functions.
// This is 0 if the value is unknown.
timerModifiedEarliest uint64

// Per-P GC state
gcAssistTime int64 // Nanoseconds in assistAlloc
gcFractionalMarkTime int64 // Nanoseconds in fractional mark worker (atomic)

// gcMarkWorkerMode is the mode for the next mark worker to run in.
// That is, this is used to communicate with the worker goroutine
// selected for immediate execution by
// gcController.findRunnableGCWorker. When scheduling other goroutines,
// this field must be set to gcMarkWorkerNotWorker.
gcMarkWorkerMode gcMarkWorkerMode
// gcMarkWorkerStartTime is the nanotime() at which the most recent
// mark worker started.
gcMarkWorkerStartTime int64

// gcw is this P's GC work buffer cache. The work buffer is
// filled by write barriers, drained by mutator assists, and
// disposed on certain GC state transitions.
gcw gcWork

// wbBuf is this P's GC write barrier buffer.
//
// TODO: Consider caching this in the running G.
wbBuf wbBuf

runSafePointFn uint32 // if 1, run sched.safePointFn at next safe point

// statsSeq is a counter indicating whether this P is currently
// writing any stats. Its value is even when not, odd when it is.
statsSeq uint32

// Lock for timers. We normally access the timers while running
// on this P, but the scheduler can also do it from a different P.
timersLock mutex

// Actions to take at some time. This is used to implement the
// standard library's time package.
// Must hold timersLock to access.
timers []*timer

// Number of timers in P's heap.
// Modified using atomic instructions.
numTimers uint32

// Number of timerModifiedEarlier timers on P's heap.
// This should only be modified while holding timersLock,
// or while the timer status is in a transient state
// such as timerModifying.
adjustTimers uint32

// Number of timerDeleted timers in P's heap.
// Modified using atomic instructions.
deletedTimers uint32

// Race context used while executing timer functions.
timerRaceCtx uintptr

// preempt is set to indicate that this P should be enter the
// scheduler ASAP (regardless of what G is running on it).
preempt bool

pad cpu.CacheLinePad
}

Go 的模型调度

M(thread) 内核线程 , P(processor) 进程 ,G (goroutine) 协程

  • G : Go 运行时对goroutine的描述,G中存放并发执行的 代码入口地址,上下文,运行环境 (关联的P和M),运行栈等执行相关的信息,G的新建,休眠,恢复,停止都受到Go运行时的管理

GO运行时的监控线程会监控G的调度,G不会长久地阻塞系统线程,运行时的调度器会自动切换到其他G上运行。G新建或恢复时会添加到运行队列,等待M取出并运行。

  • M : OS内核线程,是操作系统层面调度和执行的实体.M仅负责执行,M不停的唤醒或创建,然后执行

  • P : 代表M和P所需要的资源,是对资源的一种抽象管理,P 不是一段代码实体,而是一个管理的数据结构,P主要是降低 M对G的复杂性,增加一个间接的控制层数据结构,P控制Go的并行度,它不是实体

P持有G的队列,P可以隔离调度,解除P和M的绑定就解除了M对一串G的调用。

G并不是执行体,而是存放并发执行体的元信息,包括并发执行的入口函数、堆栈、上下文等信息。G由于保存的是元信息,为了减少对象的分配和回收,G对象是可以复用,只需要将相关元信息初始化为新值即可。

M仅负责执行,M启动时进入运行时的管理代码,这段管理代码必须拿到P后,才能执行调度。

P的数目默认是CPU核心的数量。M和P的数目差不多,但运行时会根据当前的状态动态地创建M,M有一个最大值上限:10000;G与P是M:N的关系,M可以成千上万,远远大于N.

Work Stealing算法的基本原理

M和P构成一个运行时环境,每个P有一个本地的可调度的G队列,队列里面的G会被M依次调度执行,如果本地队列空了,则去全局队列偷取一部分G,如果全局队列也是空的,则去其他的P中偷取一部分G。

什么时候创建M、P、G

在程序启动过程中会初始化空闲P列表,P是这个时候创建的,同时第一个G也是在初始化过程中被创建的。

每个并发调用都会初始化一个新的G任务,如何唤醒M执行任务。这个唤醒不是特定唤醒某个线程去工作,而是先尝试获取当前线程M,如果无法获取,则从全局调度的空闲M列表中获取可用的M,如果没有可用的M,则新建M,然后绑定P和GY运行。M和P不是一一对应的,而是按需分配的

M线程有管理调度和切换堆栈的逻辑,但是M必须拿到P后才能运行,可用看到M是自驱动的,单需要P的配合。

goroutine经历的过程

  1. 通过go func()来创建一个 goroutine
  2. 有两个存储的G队列,一个是局部调度器P的本地队列,一个是全局G队列,新创建的G会先保存在P的本地队列中,如果P的本地队列已经满了就会保存在全局队列中
  3. G只能运行在M中,一个M必须有一个P,M与P是1:1的关系. M会对P的本地队列弹出一个可执行状态的G来执行,如果P的本地队列为空,就会向其它的MP组合取一个可执行的G来执行
  4. 一个M 调度执行的过程是一个循环机制
  5. 当 M执行某一个G 时候 如果发生了 syscall或其余阻塞操作,M会阻塞,如果