12
返回列表 发新帖
楼主: jieforest

Deploying the Aurelius Graph Cluster

[复制链接]
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
11#
 楼主| 发表于 2012-12-10 11:34 | 只看该作者
Once in the machine via ssh, Titan 0.1.0 is downloaded, unzipped, and the Gremlin console is started.
  1. 01.ubuntu@ip-10-117-55-34:~$ ssh 54.242.14.83
  2. 02....ubuntu@ip-10-12-27-208:~$ wget <a href="https://github.com/downloads/thinkaurelius/titan/titan-0.1.0.zip">https://github.com/downloads/thinkaurelius/titan/titan-0.1.0.zip</a>
  3. 03.ubuntu@ip-10-12-27-208:~$ sudo apt-get install unzip
  4. 04.ubuntu@ip-10-12-27-208:~$ unzip titan-0.1.0.zip
  5. 05.ubuntu@ip-10-12-27-208:~$ cd titan-0.1.0/
  6. 06.ubuntu@ip-10-12-27-208:~/titan-0.1.0$ bin/gremlin.sh
  7. 07.
  8. 08.\,,,/
  9. 09.(o o)
  10. 10.-----oOOo-(_)-oOOo-----
  11. 11.gremlin>
复制代码

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
12#
 楼主| 发表于 2012-12-10 11:34 | 只看该作者
A toy 1 million vertex/edge graph is loaded into Titan using the Gremlin/Groovy script below (simply cut-and-paste the source into the Gremlin console and wait approximately 3 minutes).

The code implements a preferential attachment algorithm. For an explanation of this algorithm, please see the second column of page 33 in Mark Newman‘s article The Structure and Function of Complex Networks.
  1. 01.// connect Titan to HBase in batch loading mode
  2. 02.conf = new BaseConfiguration()
  3. 03.conf.setProperty('storage.backend','hbase')
  4. 04.conf.setProperty('storage.hostname','localhost')
  5. 05.conf.setProperty('storage.batch-loading','true');
  6. 06.g = TitanFactory.open(conf)
  7. 07.
  8. 08.// preferentially attach a growing vertex set
  9. 09.size = 1000000; ids = [g.addVertex().id]; rand = new Random();
  10. 10.(1..size).each{
  11. 11.v = g.addVertex();
  12. 12.u = g.v(ids.get(rand.nextInt(ids.size())))
  13. 13.g.addEdge(v,u,'linked');
  14. 14.ids.add(u.id);
  15. 15.ids.add(v.id);
  16. 16.if(it % 10000 == 0) {
  17. 17.g.stopTransaction(SUCCESS)
  18. 18.println it
  19. 19.}
  20. 20.}; g.shutdown()
复制代码

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
13#
 楼主| 发表于 2012-12-11 19:42 | 只看该作者
Batch Analytics with Faunus



Faunus is a Hadoop-based graph computing framework. It supports performant global graph analyses by making use of sequential reads from disk (see The Pathologies of Big Data).

Faunus provides connectivity to Titan/HBase, Titan/Cassandra, any Rexster-fronted graph database, and to text/binary files stored in HDFS.

From the 1 zookeeper+hadoop-namenode+hadoop-jobtracker+hbase-master machine, Faunus 0.1-alpha is downloaded and unzipped.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
14#
 楼主| 发表于 2012-12-11 19:43 | 只看该作者
The provided titan-hbase.properties file should be updated withhbase.zookeeper.quorum=10.12.27.208 instead of localhost. The IP address 10.12.27.208 is provided by ~/.whirr/agc/instances on agc-master. Finally, the Gremlin console is started.
  1. 01.ubuntu@ip-10-12-27-208:~$ wget <a href="https://github.com/downloads/thinkaurelius/faunus/faunus-0.1-alpha.zip">https://github.com/downloads/thinkaurelius/faunus/faunus-0.1-alpha.zip</a>
  2. 02.ubuntu@ip-10-12-27-208:~$ unzip faunus-0.1-alpha.zip
  3. 03.ubuntu@ip-10-12-27-208:~$ cd faunus-0.1-alpha/
  4. 04.ubuntu@ip-10-12-27-208:~/faunus-0.1-alpha$ vi bin/titan-hbase.properties
  5. 05.ubuntu@ip-10-12-27-208:~/faunus-0.1-alpha$ bin/gremlin.sh
  6. 06.
  7. 07.\,,,/
  8. 08.(o o)
  9. 09.-----oOOo-(_)-oOOo-----
  10. 10.gremlin>
复制代码

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
15#
 楼主| 发表于 2012-12-11 19:43 | 只看该作者
A few example Faunus jobs are provided below. The final job on line 9 generates an in-degree distribution. The in-degree of a vertex is defined as the number of incoming edges to the vertex.

The outputted result states how many vertices (second column) have a particular in-degree (first column). For example, 167,050 vertices have only 1 incoming edge.
  1. The provided titan-hbase.properties file should be updated withhbase.zookeeper.quorum=10.12.27.208 instead of localhost. The IP address 10.12.27.208 is provided by ~/.whirr/agc/instances on agc-master. Finally, the Gremlin console is started.
  2. 01.ubuntu@ip-10-12-27-208:~$ wget <a href="https://github.com/downloads/thinkaurelius/faunus/faunus-0.1-alpha.zip">https://github.com/downloads/thinkaurelius/faunus/faunus-0.1-alpha.zip</a>
  3. 02.ubuntu@ip-10-12-27-208:~$ unzip faunus-0.1-alpha.zip
  4. 03.ubuntu@ip-10-12-27-208:~$ cd faunus-0.1-alpha/
  5. 04.ubuntu@ip-10-12-27-208:~/faunus-0.1-alpha$ vi bin/titan-hbase.properties
  6. 05.ubuntu@ip-10-12-27-208:~/faunus-0.1-alpha$ bin/gremlin.sh
  7. 06.
  8. 07.\,,,/
  9. 08.(o o)
  10. 09.-----oOOo-(_)-oOOo-----
  11. 10.gremlin>
  12. A few example Faunus jobs are provided below. The final job on line 9 generates an in-degree distribution. The in-degree of a vertex is defined as the number of incoming edges to the vertex. The outputted result states how many vertices (second column) have a particular in-degree (first column). For example, 167,050 vertices have only 1 incoming edge.
  13. 01.gremlin> g = FaunusFactory.open('bin/titan-hbase.properties')
  14. 02.==>faunusgraph[titanhbaseinputformat]
  15. 03.gremlin> g.V.count() // how many vertices in the graph?
  16. 04.==>1000001
  17. 05.gremlin> g.E.count() // how many edges in the graph?
  18. 06.==>1000000
  19. 07.gremlin> g.V.out.out.out.count() // how many length 3 paths are in the graph?
  20. 08.==>988780
  21. 09.gremlin> g.V.sideEffect('{it.degree = it.inE.count()}').degree.groupCount// what is the graph's in-degree distribution?
  22. 10.==>1 167050
  23. 11.==>10    2305
  24. 12.==>100   6
  25. 13.==>108   3
  26. 14.==>119   3
  27. 15.==>122   3
  28. 16.==>133   1
  29. 17.==>144   2
  30. 18.==>155   1
  31. 19.==>166   2
  32. 20.==>18    471
  33. 21.==>188   1
  34. 22.==>21    306
  35. 23.==>232   1
  36. 24.==>254   1
  37. 25.==>...
  38. 26.gremlin>
复制代码

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
16#
 楼主| 发表于 2012-12-11 19:44 | 只看该作者


To conclude, the in-degree distribution result is pulled from Hadoop’s HDFS (stored in output/job-0). Next, scp is used to download the file to agc-master and then again to download the file to a local machine (e.g. a laptop). If the local machine has R installed, then the file can be plotted and visualized (see the final diagram below).

The log-log plot demonstrates the known result that the preferential attachment algorithm generates a graph with a power-law degree distribution (i.e. “natural statistics”).
  1. 01.ubuntu@ip-10-12-27-208:~$ hadoop fs -getmerge output/job-0 distribution.txt
  2. 02.ubuntu@ip-10-12-27-208:~$ head -n5 distribution.txt
  3. 03.1   167050
  4. 04.10  2305
  5. 05.100 6
  6. 06.108 3
  7. 07.119 3
  8. 08.ubuntu@ip-10-12-27-208:~$ exit
  9. 09....
  10. 10.ubuntu@ip-10-117-55-34:~$ scp 54.242.14.83:~/distribution.txt .
  11. 11.ubuntu@ip-10-117-55-34:~$ exit
  12. 12....
  13. 13.~$ scp ubuntu@ec2-184-72-209-80.compute-1.amazonaws.com:~/distribution.txt .
  14. 14.~$ r
  15. 15.> t = read.table('distribution.txt')
  16. 16.> plot(t,log='xy',xlab='in-degree',ylab='frequency')
复制代码

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
17#
 楼主| 发表于 2012-12-11 19:45 | 只看该作者


Conclusion

The Aurelius Graph Cluster is used for processing massive-scale graphs, where massive-scaledenotes a graph so large it does not fit within the resource confines of a single machine. In other words, the Aurelius Graph Cluster is all about Big Graph Data. The two cluster technologies explored in this post were Titan and Faunus.

They serve two distinct graph computing needs. Titan supports thousands of concurrent real-time, topologically local graph interactions. Faunus, on the other hand, supports long running, topologically global graph analyses. In other words, they provide OLTP andOLAP functionality, respectively.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
18#
 楼主| 发表于 2012-12-11 19:45 | 只看该作者
over.

使用道具 举报

回复

您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

TOP技术积分榜 社区积分榜 徽章 团队 统计 知识索引树 积分竞拍 文本模式 帮助
  ITPUB首页 | ITPUB论坛 | 数据库技术 | 企业信息化 | 开发技术 | 微软技术 | 软件工程与项目管理 | IBM技术园地 | 行业纵向讨论 | IT招聘 | IT文档
  ChinaUnix | ChinaUnix博客 | ChinaUnix论坛
CopyRight 1999-2011 itpub.net All Right Reserved. 北京盛拓优讯信息技术有限公司版权所有 联系我们 未成年人举报专区 
京ICP备16024965号-8  北京市公安局海淀分局网监中心备案编号:11010802021510 广播电视节目制作经营许可证:编号(京)字第1149号
  
快速回复 返回顶部 返回列表