您当前的位置: 首页 > 

梁云亮

暂无认证

  • 4浏览

    0关注

    1201博文

    0收益

  • 0浏览

    0点赞

    0打赏

    0留言

私信
关注
热门博文

InputFormat 之 CombineTextInputFormat

梁云亮 发布时间:2019-12-06 15:36:44 ,浏览量:4

前置课程: HDFS开发环境搭建

示例:统计单词个数 准备工作

在hdfs的根目录下创建input文件夹,然后在里面放置4个大小分别为1.5M、35M、5.5M、6.5M的小文件作为输入数据

具体代码 Mapper类
public class WordCountMapper extends Mapper {
    private Text mapOutputKey = new Text();
    private IntWritable mapOutputValue = new IntWritable();

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String linevalue = value.toString();  //1.将读取的文件变成,偏移量+内容//读取一行数据
        StringTokenizer st = new StringTokenizer(linevalue);//使用空格分隔
        while (st.hasMoreTokens()) {//判断是否还有分隔符,有的话代表还有单词
            String word = st.nextToken();//返回从当前位置到下一个分隔符之间的字符串(单词)
            mapOutputKey.set(word);
            mapOutputValue.set(1);
            context.write(mapOutputKey, mapOutputValue);
        }
    }
}
Reducer类
public class WordCountReducer extends Reducer {
    private IntWritable outputValue = new IntWritable();
    @Override
    protected void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException {
        int sum = 0;    //汇总
        for (IntWritable value : values) {
            sum += value.get();
        }
        outputValue.set(sum);
        context.write(key, outputValue);
    }
}
Driver类
public class WordCountDriver {
    public static void main(String[] args) throws Exception {
       //需要在resources下面提供core-site.xml文件
        args = new String[]{
                "/input/",
                "/output/"
        };

        Configuration cfg = new Configuration();   //获取配置

        Job job = Job.getInstance(cfg, WordCountDriver.class.getSimpleName());
        job.setJarByClass(WordCountDriver.class);

        //如果不设置InputFormat,默认是TextInputFormat
        job.setInputFormatClass(CombineTextInputFormat.class);
		//虚拟存储切片最大值设为20M
        CombineTextInputFormat.setMaxInputSplitSize(job,20*1024*1024);
  
        //设置map与需要设置的内容类 + 输出key与value
        job.setMapperClass(WordCountMapper.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);

        //设置reduce
        job.setReducerClass(WordCountReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        //设置input与output
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        //将job交给Yarn
        boolean issucess = job.waitForCompletion(true);
        int status=  issucess ? 0 : 1;
        System.exit(status);
    }
}
运行

在这里插入图片描述

关注
打赏
1665023148
查看更多评论
立即登录/注册

微信扫码登录

0.0520s