第224讲:Spark Shuffle Pluggable框架ShuffleBlockManager解析
ShuffleBlockManager,1.6.0之后改成了ShuffleBlockResolver: 具体读取shuffle数据,是一个trait。
trait ShuffleBlockResolver {
type ShuffleId = Int
/**
* Retrieve the data for the specified block. If the data for that block is not available,
* throws an unspecified exception.
*/
def getBlockData(blockId: ShuffleBlockId): ManagedBuffer
def stop(): Unit
}
ShuffleBlockResolver中已无getBytes方法。
getBlockData(blockId: ShuffleBlockId)方法返回的是ManagedBuffer,这个是核心
spark 1.6.0之后通过IndexShuffleBlockResolver来实现ShuffleBlockResolver(SortBasedShuffle方式),已无FileShuffleBlockManager(Hashshuffle方式)
IndexShuffleBlockResolver: 创建和维护逻辑块和物理文件位置之间的shuffle blocks映射关系。来自于相同map task任务的shuffle blocks数据存储在单个合并数据文件中(a single consolidated data fi