@Xudong Huang 大家好,我是May的作者,在这里我想多说几句这些限制。其实前三个限制是rust异步库里都存在的问题。我就拿futrue来说说吧
不可以调用ThreadBlock的API,不解释了
虽然可以让一个future长时间烧cpu,但是这一定会影响其他future的调度,一样存在cpu公平性的问题,如果你不在乎公平性,那么在May中跑一个长loop也是允许的。
访问TLS的问题,future也是一样不可以的,不然就不会有个专门的future local的概念了。具体的解释可以看我写的一个评论:
I'm sure that rust panic exception will be correctly caught before schedule another coroutine.
But this is not enough. Even with only a single thread schedule all the coroutines, access TLS still problematic. I thinks this is a safety hole in rust, that any sendable struct with method that access TLS could cause undefined behavior if they are executing across threads.
but there is a special case or pattern that access TLS is safe in coroutine. That is before coroutine yield to the scheduler, the TLS access is done and it's value is not used by next access after the coroutine scheduled again, under this pattern access TLS is just like access a local variable.
Or before schedule another coroutine, the TLS is make sure to be a fixed value which is safe to access from any thread. the libstd panic is just this case, it's TLS panic count would be cleared to zero before schedule a new ready coroutine.'
Let coroutine bound to a single thread is not safe, e.g. coroutine A in the thread set a TLS into value A', and then another coroutine B is scheduled and it set the TLS into value B', after coroutine A rescheduled on the same thread, it will access a dirty value!
剩下一个唯一问题就是栈空间访问越界的问题,这也是导致将spawn API标志成unsafe的最重要的原因。目前这个问题也没有好的解决方案,只能把问题抛给user,让user静态设置合理的栈大小。默认栈大小情况下,只要你的协程没有递归调用,没有开辟大块的栈内存等都不会出问题。如果你不放心,还可以查看你的协程具体用了多少栈空间。
I'm sure that rust panic exception will be correctly caught before schedule another coroutine.
But this is not enough. Even with only a single thread schedule all the coroutines, access TLS still problematic. I thinks this is a safety hole in rust, that any sendable struct with method that access TLS could cause undefined behavior if they are executing across threads.
but there is a special case or pattern that access TLS is safe in coroutine. That is before coroutine yield to the scheduler, the TLS access is done and it's value is not used by next access after the coroutine scheduled again, under this pattern access TLS is just like access a local variable.
Or before schedule another coroutine, the TLS is make sure to be a fixed value which is safe to access from any thread. the libstd panic is just this case, it's TLS panic count would be cleared to zero before schedule a new ready coroutine.'
Let coroutine bound to a single thread is not safe, e.g. coroutine A in the thread set a TLS into value A', and then another coroutine B is scheduled and it set the TLS into value B', after coroutine A rescheduled on the same thread, it will access a dirty value!
I'm sure that rust panic exception will be correctly caught before schedule another coroutine.
But this is not enough. Even with only a single thread schedule all the coroutines, access TLS still problematic. I thinks this is a safety hole in rust, that any sendable struct with method that access TLS could cause undefined behavior if they are executing across threads.
but there is a special case or pattern that access TLS is safe in coroutine. That is before coroutine yield to the scheduler, the TLS access is done and it's value is not used by next access after the coroutine scheduled again, under this pattern access TLS is just like access a local variable.
Or before schedule another coroutine, the TLS is make sure to be a fixed value which is safe to access from any thread. the libstd panic is just this case, it's TLS panic count would be cleared to zero before schedule a new ready coroutine.'
Let coroutine bound to a single thread is not safe, e.g. coroutine A in the thread set a TLS into value A', and then another coroutine B is scheduled and it set the TLS into value B', after coroutine A rescheduled on the same thread, it will access a dirty value!
评论区
写评论已经undefined behavior了,没有机会抛异常,如果程序没有挂会打出一些信息告诉你stack溢出了
栈越界了抛异常会不会更好呢。
有关栈空间的问题其实最近的tokio的一篇博文已经提到了go的做法。一种是使用segmented stack,但是这需要编译器的支持,rust已经放弃了这个思路,所以此路不通。另外一个就是把协程栈拷贝到一块足够大的地址空间上。但是这需要编译器生产栈地址无关的代码。如果不行就必须把栈空间拷贝到固定的地址上,但是这样做会在多线程中产生竞争,比如两个线程同时要把两个协程调度到同一个地址上,必然会产生竞争。另外拷贝栈空间会给任务切换带来额外的性能损耗,目前may上切换一个协程要几十ns,再多花几十ns拷贝我认为有点得不偿失了。
把栈空间的大小调整交给user来控制是一个没有办法的折中,但问题不至于失控,因为user的app是可以调整栈的大小的,甚至可以用来优化内存的使用。
访问粒度是一个影响因素。你要是能保证一个future只poll一次就完成了,这个是没有问题。但问题是future可能会被poll多次,每次都会访问TLS就是有问题了。关键还是TLS的语义被破坏了。本来认为这个变量是这个future自己的,只有自己才能访问,但是其他future也这么认为,整个系统就乱了。
coroutine访问TLS也不是不能变,只要不用这个语义就可以。比如统计一个API的访问次数。一般的做法是使用一个全局的AtomicUsize,每次访问加1.一个更高效的方式就是使用TLS,每次访问就将这个TLS加1,表示调度线程里访问了多少次API,然后将所有的调度线程的TLS都加起来。这样的做法是安全的。因为每次访问TLS并不依赖于它的上一个状态。
另外future由于是在thread的栈上运行,爆栈的可能性比较小。但coroutine是运行在自己的栈上,如果你优化了栈大小,比如1kbytes(go最小是4k,May的栈允许小于这个值)跑着没有问题。但是哪天你需要加一条log,又没有注意到这个栈已经很紧张了,很肯能造成爆栈。
1和2:不用多说。
3:tls应该看调度的粒度吧,比如用tls做BUF(CpuFuture里读完再使用),在futures-cpupool里是安全的,因为Poll的粒度是一次CpuFuture在一个线程运行完,所以是安全的,不会读取到脏值。但是不能跨Future使用tls。
当然多核协程是不需要cpupool这样的东西的,但也不能知道future运行的粒度, 所以tls是完全不可以变的。
4:futures一样爆栈。 不知道我理解的对不对?
大家好,我是May的作者,在这里我想多说几句这些限制。其实前三个限制是rust异步库里都存在的问题。我就拿futrue来说说吧
剩下一个唯一问题就是栈空间访问越界的问题,这也是导致将
spawn
API标志成unsafe
的最重要的原因。目前这个问题也没有好的解决方案,只能把问题抛给user,让user静态设置合理的栈大小。默认栈大小情况下,只要你的协程没有递归调用,没有开辟大块的栈内存等都不会出问题。如果你不放心,还可以查看你的协程具体用了多少栈空间。限制有好多条啊,还是等脱裤挨殴吧。