排序算法在JDK中的应用（一）归并排序

欢迎点击「算法与编程之美」↑关注我们！

本文首发于微信公众号："算法与编程之美"，欢迎关注，及时了解更多此系列文章。

作者|杨旭

来源|

https://blog.csdn.net/Alex_NINE/article/details/90612759

JDK8中的排序算法

JDK中对于数组的排序使用比较的多的是Arrays.sort()和Arrays.parallelSort()，前者是传统的排序算法，后者是JDK8新增的并行排序算法，基于fork/join框架,今天主要是分析Arrays.sort()的底层实现。

Arrays.sort()可以对所有的基本类型（除boolean）和Object进行排序，如果是数值类型的数就直接按照从大到小的顺序排列，如果是String类型的数组就按照其ASCII码进行排序，如果是对象就按照对象内部实现的compareTo()方法进行比较排序，并且使用的是归并排序。

这里我们通过分析Arrays.sort(int[] a)来分析一下JDK的实现流程和方式。

外层方法

第一步，调用Arrays.sort(int[] a)后，直接调用了DualPivotQuicksort.sort()字面意思是双轴快速排序，如下面代码所示。

/**

* Sorts the specified array into ascending numerical order.

* 将数组按照升序排列

Implementation note: The sorting algorithm is a Dual-Pivot Quicksort

* by Vladimir Yaroslavskiy, Jon Bentley, and Joshua Bloch. This algorithm

* offers O(n log(n)) performance on many data sets that cause other

* quicksorts to degrade to quadratic performance, and is typically

* faster than traditional (one-pivot) Quicksort implementations.

* 简单的翻译：算法的三个大佬作者。这个双轴快速排序可以为那些让传统快速排序性能下降的数据集提供O(n log(n))的时间复杂度，所以速度比传统的快速排序快

* 这里是为什么双轴快排比普通快排更快的Paper：https://arxiv.org/pdf/1511.01138.pdf

* 另外一个博主的笔记：https://www.jianshu.com/p/2c6f79e8ce6e

* @param a the array to be sorted

public static void sort(int[] a) {

DualPivotQuicksort.sort(a, 0, a.length - 1, null, 0, 0);

}

现在我们继续跟踪代码进入sort()的内部实现：

/**

* Sorts the specified range of the array using the given

* workspace array slice if possible for merging

* 在条件允许的情况下，使用给定的辅助空间对指定的数组范围内进行排序。

* @param a the array to be sorted 要被排序的数组

* @param left the index of the first element, inclusive, to be sorted 第一个元素的位置

* @param right the index of the last element, inclusive, to be sorted 最后一个元素的位置

* @param work a workspace array (slice) 工作辅助数组

* @param workBase origin of usable space in work array 工作辅助数组的扩容参数

* @param workLen usable size of work array 工作辅助数组的长度

* 此代码我进行了一些缩进和对齐

static void sort(int[] a, int left, int right,int[] work, int workBase, int workLen) {

// Use Quicksort on small arrays

//If the length of an array to be sorted is less than this constant, Quicksort is used in preference to merge sort.

//当待排序的数组长度小于286时使用快速排序，这时快排的表现好于归并排序

// private static final int QUICKSORT_THRESHOLD = 286;

if (right - left < QUICKSORT_THRESHOLD){

sort(a, left, right, true);

return;

}

* Index run[i] is the start of i-th run

* (ascending or descending sequence).

* 设置run数组第一个位置即需要排序的序列的第一个元素，长度就是是否有序的临界值

* //The maximum number of runs in merge sort.

* private static final int MAX_RUN_COUNT = 67;

int[] run = new int[MAX_RUN_COUNT + 1];

int count = 0; run[0] = left;

// Check if the array is nearly sorted

//检查数组是否接近有序（无序程度）

//这里的检查思路是：任一的数组都可以分割成若干个递增或递减的数组，比如{12,33,2,10,23,21,15}可以分成{12,33},{2,10,23},{21,15}这三个子数组要么是递增的要么是递减的

//上面的run数组就是用于存取子数组开始的下标的，然后通过比较run数组中元素个数来确定数组是否基本有序。如果大于MAX_RUN_COUNT那么就认定数组基本无序

for (int k = left; k < right; run[count] = k) {

// ascending 处理递增数组的位置

if (a[k] < a[k + 1]) {

while (++k <= right && a[k - 1] <= a[k]);

// descending 处理递减数组的位置

} else if (a[k] > a[k + 1]) {

while (++k <= right && a[k - 1] >= a[k]);

//同时将其转换为递增

for (int lo = run[count] - 1, hi = k; ++lo < --hi; ) {

int t = a[lo]; a[lo] = a[hi]; a[hi] = t;

}

// equal 处理相等元素

} else {

//The maximum length of run in merge sort. 归并排序的最多运行次数

//private static final int MAX_RUN_LENGTH = 33;

//这一步的意思是如果相等的序列等于MAX_RUN_LENGTH就直接执行快速排序。*不是很不清楚为什么要这么做*

for (int m = MAX_RUN_LENGTH; ++k <= right && a[k - 1] == a[k]; ) {

if (--m == 0) {

sort(a, left, right, true);

return;

}

* The array is not highly structured,

* use Quicksort instead of merge sort.

* 每次循环完成检查count值是否已经等于无序临界值，如果直接等于临界值，那么就直接使用快速排序

* 待排序列越接近有序，归并排序效率越高

if (++count == MAX_RUN_COUNT) {

sort(a, left, right, true);

return;

}

// Check special cases 检查特殊的情况

// Implementation note: variable "right" is increased by 1.在right变量的基础上加1

if (run[count] == right++) { // The last run contains one element run数组的最后添加一个哨兵元素，值为right+1

run[++count] = right;

} else if (count == 1) { // The array is already sorted 如果只有一个单调序列那么说明数组本身就是有序的直接跳出

return;

}

//这下面就是归并排序的相关操作了，到这一步时说明数组通过了初步筛选，即数组长度大于286且接近有序

// Determine alternation base for merge 确定归并排序的交替标准

byte odd = 0;

for (int n = 1; (n <<= 1) < count; odd ^= 1);

// Use or create temporary array b for merging 为归并使用或者创建一个临时数组

int[] b; // temp array; alternates with a 临时数组，为a归并时提供临时存储，即辅助空间

int ao, bo; // array offsets from 'left'设置偏移指针

int blen = right - left; // space needed for b 数组b需要的长度

//操作work数组的相关属性

if (work == null || workLen < blen || workBase + blen > work.length) {

work = new int[blen];

workBase = 0;

}

//通过odd确定a b数组的内容

//通过此操作后 a和b中必然有一个为原数组有一个为作为辅助空间的数组

if (odd == 0) {

System.arraycopy(a, left, work, workBase, blen);

b = a;

bo = 0;

a = work;

ao = workBase - left;

} else {

b = work;

ao = 0;

bo = workBase - left;

}

// Merging 归并操作

for (int last; count > 1; count = last) { //归并次数边界

//具体的归并过程

for (int k = (last = 0) + 2; k <= count; k += 2) {

int hi = run[k], mi = run[k - 1];

for (int i = run[k - 2], p = i, q = mi; i < hi; ++i) {

if (q >= hi || p < mi && a[p + ao] <= a[q + ao]) {

b[i + bo] = a[p++ + ao];

} else {

b[i + bo] = a[q++ + ao];

}

run[++last] = hi;

}

//如果其中的一个子序列元素个数为奇数，就直接把这个序列复制到b中

if ((count & 1) != 0) {

for (int i = right, lo = run[count - 1]; --i >= lo;

b[i + bo] = a[i + ao]

);

run[++last] = right;

}

//交换数组，更新指针，准备进行下一次归并

int[] t = a; a = b; b = t;

int o = ao; ao = bo; bo = o;

}

以上便是JDK对于sort排序中归并排序部分的优化处理，总结而言就是当待排序的数组长度大于等于286时开始考虑使用归并排序。在此同时还需要考虑的条件是待排序的数组是否是基本有序的，JDK采用的办法是将待排序数组分成若干个单调递增或者递减的数组，如果分成的小数组的个数

大于67就认为这个数组基本上是无序的此时就直接调用了快速排序，还有个我不是很理解的条件就是当带待排序的数组中相等的元素子序列长度大于等于MAX_RUN_LENGTH（33）时就直接使用快速排序。

参考文献

双轴快排原理解析
JDK源码解析(1)

END

主编 | 张祯悦

责编 | 杨旭

where2go 团队

微信号：算法与编程之美

长按识别二维码关注我们!

温馨提示：点击页面右下角“写留言”发表评论，期待您的参与！期待您的转发！

排序算法在JDK中的应用（一）归并排序

最近更新

热门博客

[ 申请 ]友情链接：