ubuntu configure

liyihongcug · 发表于 2015-4-8 10:34

本帖最后由 liyihongcug 于 2015-4-8 11:23 编辑

pReduce WordCount教程对于如何编译WordCount.java几乎是一笔带过… 而有写到的，大多又是 0.20 等旧版本版本的做法，即 javac -classpath /usr/local/Hadoop/hadoop-1.0.1/hadoop-core-1.0.1.jar WordCount.java，但较新的 2.X 版本中，已经没有 hadoop-core*.jar 这个文件，因此编辑和打包自己的MapReduce程序与旧版本有所不同。

本文以 Hadoop 2.4.1 环境下的WordCount实例来介绍 2.x 版本中如何编辑自己的MapReduce程序。

Hadoop 2.x 版本中的依赖 jar

Hadoop 2.x 版本中jar不再集中在一个 hadoop-core*.jar 中，而是分成多个 jar，如运行WordCount实例需要如下三个 jar:

$HADOOP_HOME/share/hadoop/common/hadoop-common-2.4.1.jar
$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.4.1.jar
$HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar
编译、打包 Hadoop MapReduce 程序

将上述 jar 添加至 classpath 路径：

export CLASSPATH="$HADOOP_HOME/share/hadoop/common/hadoop-common-2.4.1.jar:$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.4.1.jar:$HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar:$CLASSPATH"

接着就可以编译 WordCount.java 了（使用的是 2.4.1 源码中的 WordCount.java，源码在文本最后面）：

javac WordCount.java
编译时会有警告，可以忽略。编译后可以看到生成了几个.class文件。

使用Javac编译自己的MapReduce程序使用Javac编译自己的MapReduce程序

接着把 .class 文件打包成 jar，才能在 Hadoop 中运行：

jar -cvf WordCount.jar ./WordCount*.class
打包完成后，运行试试，创建几个输入文件：

Mkdir input
echo "echo of the rainbow" > ./input/file0
echo "the waiting game" > ./input/file1
创建WordCount的输入创建WordCount的输入

开始运行：

/usr/local/hadoop/bin/hadoop jar WordCount.jar WordCount input output
不过这边可能会遇到如下的提示 Exception in thread "main" java.lang.NoClassDefFoundError: WordCount ：

提示找不到 WordCount 类提示找不到 WordCount 类

因为程序中声明了 package ，所以在命令中也要 org.apache.hadoop.examples 写完整：

/usr/local/hadoop/bin/hadoop jar WordCount.jar org.apache.hadoop.examples.WordCount input output
正确运行后的结果如下：

WordCount 运行结果WordCount 运行结果

进阶：使用Eclipse编译运行MapReduce程序

使用命令行编译运行MapReduce程序毕竟有些麻烦，修改一次就得手动编译、打包一次，使用Eclipse编译运行MapReduce程序会更加方便。

WordCount.java 源码

文件位于 hadoop-2.4.1-src\hadoop-mapreduce-project\hadoop-mapreduce-examples\src\main\java\org\apache\hadoop\examples 中：

/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.examples;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
publicclassWordCount{
publicstaticclassTokenizerMapper
extendsMapper<Object,Text,Text,IntWritable>{
privatefinalstaticIntWritable one =newIntWritable(1);
privateText word =newText();
publicvoid map(Object key,Text value,Context context
)throwsIOException,InterruptedException{
StringTokenizer itr =newStringTokenizer(value.toString());
while(itr.hasMoreTokens()){
word.set(itr.nextToken());
context.write(word, one);
}
}
}
publicstaticclassIntSumReducer
extendsReducer<Text,IntWritable,Text,IntWritable>{
privateIntWritable result =newIntWritable();
publicvoid reduce(Text key,Iterable<IntWritable> values,
Context context
)throwsIOException,InterruptedException{
int sum =0;
for(IntWritable val : values){
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
publicstaticvoid main(String[] args)throwsException{
Configuration conf =newConfiguration();
String[] otherArgs =newGenericOptionsParser(conf, args).getRemainingArgs();
if(otherArgs.length !=2){
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}
Job job =newJob(conf,"word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job,newPath(otherArgs[0]));
FileOutputFormat.setOutputPath(job,newPath(otherArgs[1]));
System.exit(job.waitForCompletion(true)?0:1);
}
}
CentOS安装和配置Hadoop2.2.0  http://www.linuxidc.com/Linux/2014-01/94685.htm

Ubuntu 13.04上搭建Hadoop环境 http://www.linuxidc.com/Linux/2013-06/86106.htm

Ubuntu 12.10 +Hadoop 1.2.1版本集群配置 http://www.linuxidc.com/Linux/2013-09/90600.htm

Ubuntu上搭建Hadoop环境（单机模式+伪分布模式） http://www.linuxidc.com/Linux/2013-01/77681.htm

Ubuntu下Hadoop环境的配置 http://www.linuxidc.com/Linux/2012-11/74539.htm

单机版搭建Hadoop环境图文教程详解 http://www.linuxidc.com/Linux/2012-02/53927.htm

搭建Hadoop环境（在Winodws环境下用虚拟机虚拟两个Ubuntu系统进行搭建） http://www.linuxidc.com/Linux/2011-12/48894.htm

更多Hadoop相关信息见Hadoop 专题页面 http://www.linuxidc.com/topicnews.aspx?tid=13

本文永久更新链接地址：http://www.linuxidc.com/Linux/2015-02/113489.htm
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements.  See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership.  The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License.  You may obtain a copy of the License at
*
*    http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.examples;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

  public static class TokenizerMapper
   extends Mapper<Object, Text, Text, IntWritable>{

private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(Object key, Text value, Context context
                  ) throws IOException, InterruptedException {
   StringTokenizer itr = new StringTokenizer(value.toString());
   while (itr.hasMoreTokens()) {
      word.set(itr.nextToken());
      context.write(word, one);
   }
}
  }

  public static class IntSumReducer
   extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values,
                     Context context
                     ) throws IOException, InterruptedException {
   int sum = 0;
   for (IntWritable val : values) {
      sum += val.get();
   }
   result.set(sum);
   context.write(key, result);
}
  }

  public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length < 2) {
   System.err.println("Usage: wordcount <in> [<in>...] <out>");
   System.exit(2);
}
Job job = new Job(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
for (int i = 0; i < otherArgs.length - 1; ++i) {
   FileInputFormat.addInputPath(job, new Path(otherArgs));
}
FileOutputFormat.setOutputPath(job,
   new Path(otherArgs[otherArgs.length - 1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
} javac WordCount.java
/usr/local/hadoop/share/hadoop/common/hadoop-common-2.6.0.jar(org/apache/hadoop/fs/Path.class): warning: Cannot find annotation method 'value()' in type 'LimitedPrivate': class file for org.apache.hadoop.classification.InterfaceAudience not found
Note: WordCount.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
1 warning'develop envirmont http://momodev.blog.51cto.com/4916132/1186313    eclplise-plugin

liyihongcug · 发表于 2015-4-9 17:19

课程大纲：
第1课 Hadoop的源起与体系介绍；实施Hadoop集群；CDH家族
第2课分布式文件系统HDFS原理与操作，HDFS API编程；2.x下HDFS新特性，高可用，联邦，快照
第3课具有全部新特性的2.x企业级集群实施
第4课 Map-Reduce原理、体系架构和工作机制，eclipse与Hadoop集群连接，使用maven
第5课 Map-Reduce编程实战，日志分析
第6课 Map-Reduce复杂应用案例，Hadoop流
第7课新一代计算框架YARN
第8课 Pig原理，部署与Pig Latin语言，应用案例
第9课 Hive体系架构、安装与HiveQL
第10课 Hive应用案例，impala子项目
第11课 Zookeeper与分布式系统开发
第12课 HBase体系架构，集群部署，管理
第13课 HBase数据模型，实战案例建模剖析
第14课数据集成Sqoop，Flume，Chukwa，商业数据库与Hadoop集群的连接
第15课与应用连接，REST和Thrift接口，UDF实战，RHadoop，数据分析软件与Hadoop集群的连接
第16课进军Hadoop源代码
第17课 Hadoop在互联网企业中的应用案例；集成各个子项目形成企业级数据分析平台；Hadoop与机器学习

liyihongcug · 发表于 2015-4-16 21:03

本帖最后由 liyihongcug 于 2015-4-30 22:24 编辑

http://blog.csdn.net/u012308776/article/details/42194083
Hadoop学习全程记录——在Eclipse中运行第一个MapReduce程序
http://sqcjy111.iteye.com/blog/1735203  (伪分布式配置详细)
博客分类： Hadoop云计算
HadoopEclipseMapReducejavalinux

网友分享，拿来共享一下

这是Hadoop学习全程记录第2篇，在这篇里我将介绍一下如何在Eclipse下写第一个MapReduce程序。

新说明一下我的开发环境：

操作系统：在windows下使用wubi安装了ubuntu 10.10
hadoop版本：hadoop-0.20.2.tar.gz
Eclipse版本：eclipse-jee-helios-SR1-linux-gtk.tar.gz

为了学习方便这个例子在“伪分布式模式”Hadoop安装方式下开发。

第一步，我们先启动Hadoop守护进程。
如果你读过我第1篇文章Hadoop学习全程记录——hadoop 入门应该比较清楚在“伪分布式模式”下启动Hadoop守护进程的方法，在这里就不多说了。

第二步，在Eclipse下安装hadoop-plugin。

1.复制 hadoop安装目录/contrib/eclipse-plugin/hadoop-0.20.2-eclipse-plugin.jar 到 eclipse安装目录/plugins/ 下。

2.重启eclipse，配置hadoop installation directory。
如果安装插件成功，打开Window-->Preferens，你会发现Hadoop Map/Reduce选项，在这个选项里你需要配置Hadoop installation directory。配置完成后退出。

3.配置Map/Reduce Locations。
在Window-->Show View中打开Map/Reduce Locations。
在Map/Reduce Locations中新建一个Hadoop Location。在这个View中，右键-->New Hadoop Location。在弹出的对话框中你需要配置Location name，如myubuntu，还有Map/Reduce Master和DFS Master。这里面的Host、Port分别为你在mapred-site.xml、core-site.xml中配置的地址及端口。如：

Map/Reduce Master

Java代码收藏代码
localhost
9001

DFS Master
Java代码收藏代码
localhost
9000

配置完后退出。点击DFS Locations-->myubuntu如果能显示文件夹(2)说明配置正确，如果显示"拒绝连接"，请检查你的配置(我的配置完成之后只看到了一个文件夹tmp，不知道是什么原因)。

第三步，新建项目。
File-->New-->Other-->Map/Reduce Project
项目名可以随便取，如hadoop-test。
复制 hadoop安装目录/src/example/org/apache/hadoop/example/WordCount.java到刚才新建的项目下面。

第四步，上传模拟数据文件夹。
为了运行程序，我们需要一个输入的文件夹，和输出的文件夹。输出文件夹，在程序运行完成后会自动生成。我们需要给程序一个输入文件夹。

1.在当前目录（如hadoop安装目录）下新建文件夹input，并在文件夹下新建两个文件file01、file02，这两个文件内容分别如下：

file01
Java代码收藏代码
Hello World Bye World

file02
Java代码收藏代码
Hello Hadoop Goodbye Hadoop

2.将文件夹input上传到分布式文件系统中。

在已经启动Hadoop守护进程终端中cd 到hadoop安装目录，运行下面命令：
Java代码收藏代码
bin/hadoop fs -put input input01

这个命令将input文件夹上传到了hadoop文件系统了，在该系统下就多了一个input01文件夹，你可以使用下面命令查看：
Java代码收藏代码
bin/hadoop fs -ls

第五步，运行项目。

1.在新建的项目hadoop-test，点击WordCount.java，右键-->Run As-->Run Configurations
2.在弹出的Run Configurations对话框中，点Java Application，右键-->New，这时会新建一个application名为WordCount
3.配置运行参数，点Arguments，在Program arguments中输入“你要传给程序的输入文件夹和你要求程序将计算结果保存的文件夹”，如：
Java代码收藏代码
hdfs://localhost:9000/user/panhuizhi/input01 hdfs://localhost:9000/user/panhuizhi/output01

这里面的input01就是你刚传上去文件夹。文件夹地址你可以根据自己具体情况填写。

4.点击Run，运行程序。

点击Run，运行程序，过段时间将运行完成，等运行结束后，可以在终端中用命令：
Java代码收藏代码
bin/hadoop fs -ls

查看是否生成文件夹output01。

用下面命令查看生成的文件内容：
Java代码收藏代码
bin/hadoop fs -cat output01/*

如果显示如下，恭喜你一切顺利，你已经成功在eclipse下运行第一个MapReduce程序了。
Java代码收藏代码
Bye 1
Goodbye 1
Hadoop  2
Hello 2
World 2  http://phz50.iteye.com/blog/932373http://www.cnblogs.com/linjiqin/p/3147902.html  集群配置
http://blog.csdn.net/xiaocaichonga/article/details/8078704 （非常详细的配置以及与你相那个）
http://blog.sina.com.cn/s/blog_7deb436e0101kh0d.html  其他上面左方的Map/Reduce Master设置Host和Port对应的是你hadoop安装目录下conf下mapred-site.xml文件设置的host和port，右边的DFS Master对应的是core-site.xml.如果core-site.xml下只有localhost而没有端口号，默认的就是8020.设置好后finish。然后就能在eclipse里面浏览并操作HDFS了。hdfs://localhost:8001/psiInput hdfs://localhost:8001/psiOutput

liyihongcug · 发表于 2015-5-17 20:50

http://www.powerxing.com/install-hadoop/
sudo useradd -m hadoop -s /bin/bash
这条命令创建了可以登陆的 hadoop 用户，并使用 /bin/bash 作为shell。
Ubuntu终端复制粘贴快捷键
在Ubuntu终端窗口中，复制粘贴的快捷键需要加上shift，即粘贴是 ctrl+shift+v。

接着使用如下命令修改密码，按提示输入两次密码 hadoop :

sudo passwd hadoop
可为 hadoop 用户增加管理员权限，方便部署，避免一些对新手来说比较棘手的权限问题：
sudo adduser hadoop sudo
http://blog.csdn.net/ggz631047367/article/details/42426391 http://www.cnblogs.com/kinglau/p/3796164.html http://www.linuxidc.com/Linux/2015-01/111258.htm