前置课程: HDFS开发环境搭建
数据Call me by your name and I'll call you by mine.
请以你的名字呼唤我,我亦将如此。
In spite of you and me and the whole silly world going to pieces around us, I love you.
我爱你,直到世界终结。
Then she's horrible alcohol, tobacco, swearing, everything is not bad.
后来她勇敢的可怕烟酒脏话样样不差
具体代码
Mapper
public class NLineMapper extends Mapper {
protected void map(LongWritable key, Text value,Context context) throws IOException, InterruptedException {
final String line = value.toString();
final String[] split = line.split("\\s+");
for (String word : split) {
//在for循环体内,临时变量word的出现次数是常量1
context.write(new Text(word), new LongWritable(1));
}
}
}
测试代码:
public static void main(String[] args) throws Exception {
// 数据输入路径和输出路径
args = new String[2];
args[0] = "src/main/resources/nlinei/";
args[1] = "src/main/resources/nlineo";
Configuration cfg = new Configuration();
cfg.set("mapreduce.framework.name", "local");
cfg.set("fs.defaultFS", "file:///");
//设置每个map可以处理多少条记录,默认是1行
cfg.setInt("mapreduce.input.lineinputformat.linespermap", 2);
final FileSystem filesystem = FileSystem.get(cfg);
if (filesystem.exists(new Path(args[1]))) {
filesystem.delete(new Path(args[1]), true);
}
// 定义job
final Job job = Job.getInstance(cfg); // 新建一个任务
job.setJarByClass(NLineDriver.class);
// 设置map
job.setMapperClass(NLineMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(LongWritable.class);
job.setInputFormatClass(NLineInputFormat.class);
job.setNumReduceTasks(0);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
结果: