第148讲:Spark RDD中Transformation的combineByKey、reduceByKey详解
我们看一下PairRDDFunctions.scala的reduceByKey:类似于Hadoop中combiner,reduceByKey在每一个mapper进行本地合并,合并以后才把结果发送给reduce。他调用的其实就是combineByKey。
/** * Merge the values for each key using an associative and commutative reduce function. This will * also perform the merging locally on each mapper before sending results to a reducer, similarly * to a "combiner" in MapReduce. */ def reduceByKey(partitioner: Partitioner, func: (