Haoop selfjoin 左表 右表 自连接
map context.write写2次,作为左表、右表 左表: context.write(new Text(array[1].trim()), new Text("1_"+array[0].trim())); 左表第一列是父亲,第二列是孩子; 右表: context.write(new Text(array[0].trim()), new Text("0_"+array[1].trim())); 右表第一列是孩子,第二列是父亲;
reduce: 判断孩子还是父亲,生成grandChildList和grandParentList,做笛卡尔积
1、数据文件
1列是孩子 2列是父亲,找祖父亲
[root@master IMFdatatest]#hadoop dfs -cat /library/selfjoin/selfjoin.txt DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it.
16/02/20 17:22:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Tom Lucy Tom Jack Jone Lucy Jone Jack Lucy Mary