`

RHadoop安装和使用

 
阅读更多

环境 hortonworks 2.3版本,ambari2.1.1, hadoop版本2.7.1

 

1. 下载RHadoop相关软件包

从地址(https://cran.r-project.org/src/base/R-3/)下载R语言的tar包

我下载的是:

https://cran.r-project.org/src/base/R-3/R-3.2.3.tar.gz

https://github.com/RevolutionAnalytics/rmr2/releases/download/3.3.1/rmr2_3.3.1.tar.gz

https://github.com/RevolutionAnalytics/rhdfs/blob/master/build/rhdfs_1.0.8.tar.gz

https://github.com/RevolutionAnalytics/rhbase/blob/master/build/rhbase_1.2.1.tar.gz

 

2. centos6.5 上安装R

然后安装相关依赖包:

#yum install gcc-gfortran

#yum install gcc gcc-c++

#yum install readline-devel

#yum install libXt-devel

 

# tar xvf R-3.2.3.tar.gz

# cd R-3.2.3

# ./configure

# make

# make install

 

3:确认Java环境变量

RHadoop依赖于rJava包,安装rJava前确认已经配置了Java环境变量,然后进行R对jvm建立连接。

[root@dataserver R-3.2.3]# cat /etc/profile结尾添加

########################################

export JAVA_HOME=/usr/java/jdk1.7.0_79

export JRE_HOME=/usr/java/jdk1.7.0_79/jre

export PATH=/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin

export CLASSPATH=.:/lib/dt.jar:/lib/tool.jar

export HADOOP_CMD=/usr/bin/hadoop

export HADOOP_STREAMING=/usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming.jar

export HADOOP_HOME=/usr/hdp/current/hadoop-client

export JAVA_HOME JRE_HOME PATH CLASSPATH

########################################

[root@dataserver R-3.2.3]# R CMD javareconf

 

4:安装相关的依赖包,确保RHadoop软件包能正常使用

[root@dataserver R-3.2.3]# R 

> install.packages("rJava")

> install.packages("reshape2")

> install.packages("Rcpp")

> install.packages("iterators")

> install.packages("itertools")

> install.packages("digest")

> install.packages("RJSONIO")

> install.packages("functional")

> install.packages("bitops")

> install.packages("caTools")

> quit()

或者

install.packages(c("rJava", "Rcpp", "RJSONIO", "bitops", "digest", "functional", "stringr", "plyr", "reshape2", "caTools"))

 

5:安装RHadoop软件包

[root@dataserver R-3.2.3]# export HADOOP_CMD=/usr/bin/hadoop

[root@dataserver R-3.2.3]# export HADOOP_STREAMING=/usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming.jar

[root@dataserver R-3.0.2]# R CMD INSTALL rhdfs_1.0.8.tar.gz

[root@dataserver R-3.0.2]# R CMD INSTALL rmr2_3.3.1.tar.gz

[root@dataserver R-3.0.2]# R CMD INSTALL rhbase_1.2.1.tar.gz

 

6:使用RHadoop软件包

[root@dataserver R-3.2.3]# R

> library(rhdfs)

> hdfs.init()

> hdfs.ls("/")

 

 

[root@dataserver R-3.2.3]# export HADOOP_HOME=/usr/hdp/current/hadoop-client

> library(rmr2)

 

 

普通的R语言程序:

> small.ints = 1:10

> sapply(small.ints, function(x) x^2)

MapReduce的R语言程序:

> small.ints = to.dfs(1:10)

> mapreduce(input = small.ints, map = function(k, v) cbind(v, v^2))

> from.dfs("/tmp/RtmpWnzxl4/file5deb791fcbd5")

 

如果出现如下异常:

Caused by: java.io.IOException: Cannot run program "Rscript": error=2, No such file or directory
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
        at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
        ... 23 more
Caused by: java.io.IOException: error=2, No such file or directory
        at java.lang.UNIXProcess.forkAndExec(Native Method)
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:248)
        at java.lang.ProcessImpl.start(ProcessImpl.java:134)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
        ... 24 more

需要做个链接:

ln -s /usr/local/bin/Rscript /usr/bin/Rscript

 

 

如果在centos7上安装R就简单多了:

步骤如下:

yum install epel-release

yum install R

分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics