分布式处理平台节能计算研究综述A Survey on Energy-Efficient Distributed Computing System
于炯,蒲勇霖,鲁亮,刘粟
摘要(Abstract):
分布式处理平台作为大数据技术重要组成部分,其低效率、高能耗问题不容忽视.针对这一问题,本文通过将现有大规模的数据处理节能算法划分为流式数据处理、批量数据处理、图数据处理以及彼此交互的数据处理四种节能算法进行分析探讨,其中彼此交互的数据处理节能算法又可划分为偏向批的交互数据处理、偏向流的交互数据处理以及偏向图的交互数据处理三种节能算法,并进行综合的讨论分析,讨论了现有分布式处理架构与节能算法存在的一系列问题(如对集群服务质量的影响、对集群性能的影响等).最后,对适应节能的分布式处理体系结构、节能计算与集群数据处理的适应性、节能计算与集群数据处理的普适性、集群执行节能算法的QoS约束保证以及集群执行节能算法的性能质量保证五个方面进行了展望分析.
关键词(KeyWords): 分布式处理;大数据技术;绿色计算;能耗效益;服务质量
基金项目(Foundation): 国家自然科学基金项目(61262088,61462079,61562086,61363083,61562078);; 国家科技部科技支撑项目(2015BAH02F01);; 新疆研究生科研创新项目(XJGRI2016028)
作者(Author): 于炯,蒲勇霖,鲁亮,刘粟
DOI: 10.13568/j.cnki.651094.2018.04.002
参考文献(References):
- [1]孟小峰,慈祥.大数据管理:概念、技术与挑战[J].计算机研究与发展,2013,50(1):146-169.
- [2]Liao B,Yu J,Zhang T,et al.Energy-efficient algorithms for distributed storage system based on block storage structure reconfiguration[J].Journal of Computer Research&Development,2015,48(2):71-86.
- [3]Chen C L P,ZHANG C Y.Data-intensive applications,challenges,techniques and technologies:a survey on big data[J].Information Sciences,2014,275(11):314-347.
- [4]刘月超,于炯,鲁亮.Storm环境下一种改进的任务调度策略[J].新疆大学学报(自然科学版),2017,34(1):90-95.
- [5]英昌甜,王维庆,于炯,等.内存计算环境下基于索引结构的内存优化策略[J].新疆大学学报(自然科学版),2018,35(1):13-21.
- [6]Garcíagil D,Ram?rezgallego S,García S,et al.A comparison on scalability for batch big data processing on Apache Spark and Apache Flink[J].Big Data Analytics,2017,2(1):1.
- [7]Kamburugamuve S,Ramasamy K,Swany M,et al.Low Latency Stream Processing:Apache Heron with Infiniband&Intel Omni-Path[C].International Conference,2017:101-110.
- [8]鲁亮,于炯,卞琛,等.大数据流式计算框架Storm的任务迁移策略[J].计算机研究与发展,2018,55(1):71-92
- [9]Khan M,Jin Y,Li M,et al.Hadoop Performance Modeling for Job Estimation and Resource Provisioning[J].IEEETransactions on Parallel&Distributed Systems,2016,27(2):441-454.
- [10]卞琛,于炯,英昌甜,等.并行计算框架Spark的自适应缓存管理策略[J].电子学报,2017,45(2):278-284.
- [11]张陶,于炯,廖彬,等.基于GraphX的传球网络构建及分析研究[J].计算机研究与发展,2016,52(12):2729-2752.
- [12]Jinhong Jung,Kijung Shin,Lee Sael,et al.Random Walk with Restart on Large Graphs Using Block Elimination[J].ACMTransactions on Database Systems(TODS),2016,41(2):1-43.
- [13]Zaharia M,Xin R S,Wendell P,et al.Apache Spark:a unified engine for big data processing[J].Communications of the Acm,2016,59(11):56-65.
- [14]廖彬,张陶,于炯,等.温度感知的MapReduce节能任务调度策略[J].通信学报,2016,37(1):61-75.
- [15]Melnik S,Gubarev A,Long J J,et al.Dremel:Interactive Analysis of Web-Scale Datasets[J].Communications of the Acm,2010,3(12):114-123.
- [16]蒲勇霖,于炯,鲁亮,等.Storm平台下工作节点的内存电压调控节能策略[J].通信学报,2018.
- [17]蒲勇霖,于炯,鲁亮,等.基于实时流式计算系统的数据分类节能策略[J].计算机工程与设计,2017,38(1):59-64.
- [18]Guo B,Yu J,Liao B,et al.A green framework for DBMS based on energy-aware query optimization and energy-efficient query processing[J].Journal of Network and Computer Applications,2017,84:118-130.
- [19]蒲勇霖,于炯,王跃飞,等.大数据流式计算环境下的阈值调控节能策略[J].计算机应用,2017,37(6):1580-1586.
- [20]蒲勇霖,于炯,鲁亮,等.大数据流式计算环境下的内存节能策略[J].小型微型计算机系统,2017,38(9):1988-1993.
- [21]Barroso L A,Hlzle U.The datacenter as a computer:An introduction to the design of warehouse-scale machines[R].Morgan:Synthesis Lectures on Computer Architecture,Morgan&Claypool Publishers,2009.
- [22]于炯,廖彬,张陶,等.云存储系统节能研究综述[J].计算机科学与探索,2014,8(9):1025-1040.
- [23]Cordeschi N,Shojafar M,Amendola D,et al.Energy-efficient adaptive networked datacenters for the QoS support of real-time applications[J].The Journal of Supercomputing,2014,71(2):448-478.
- [24]Van D V J S,Van D W B,Lazovik E,et al.dynamically scaling apache storm for the analysis of streaming data[C].The2015 IEEE 1st Int Conf on Big Data Computing Service and Applications,2015:154-161.
- [25]Sun D,Zhang G,Yang S,et al.Re-Stream:Real-time and energy-efficient resource scheduling in big data stream computing environments[J].Information Sciences,2015,319:92-112.
- [26]Hidalgo N,Wladdimiro D,Rosas E.Self-adaptive processing graph with operator fission for elastic stream processing[J].Journal of Systems and Software,2017,127:205-216.
- [27]Kaushik R T,Bhandarkar M,Nahrstedt K.Evaluation and analysis of GreenHDFS:a self-adaptive,energy conserving variant of the Hadoop distributed file system[C]//Proceedings of the 2nd IEEE International Conference on Cloud Computing Technology and Science(CloudCom 10),Piscataway,NJ,USA:IEEE,2011:274-287.
- [28]廖彬,于炯,孙华,等.基于存储结构重配置的分布式存储系统节能算法[J].计算机研究与发展,2013,50(1):3-18.
- [29]廖彬,于炯,张陶,等.基于分布式文件系统HDFS的节能算法[J].计算机学报,2013,36(5):1047-1064.
- [30]Faisal S M,Tziantzioulis G,Gok A M,et al.Edge importance identification for energy efficient graph processing[C]//IEEEInternational Conference on Big Data.IEEE,2015:347-354.
- [31]Kaur R,Kaur R.Improving Energy Efficiency of Wireless Sensor Network with the Fusion of Graph Theory and Genetic Algorithm[J].International Journal,2014,2(7):815.
- [32]Zhou S,Chelmis C,Prasanna V K.High-Throughput and Energy-Efficient Graph Processing on FPGA[C]//IEEE,International Symposium on Field-Programmable Custom Computing Machines.IEEE,2016:103-110.
- [33]Lang W,Patel J M.Energy management for mapreduce clusters[J].Proceedings of the VLDB Endowment,2010,3(1-2):129-139.
- [34]Leverich J,Kozyrakis C.On the energy(in)efficiency of Hadoop clusters[J].ACM SIGOPS Operating Systems Review,2010,44(1):61-65.
- [35]De Matteis T,Mencagli G.Keep calm and react with foresight:strategies for low-latency and energy-efficient elastic data stream processing[J].Journal of Systems and Software,2016,51(8):1-12.
- [36]武志学.基于Spark Streaming的实时能耗分项计量系统[J].计算机应用,2017,37(4):928-935.
- [37]Cheng D,Chen Y,Zhou X,et al.Adaptive scheduling of parallel jobs in spark streaming[C]//INFOCOM 2017-IEEEConference on Computer Communications,IEEE,2017:1-9.
- [38]Song J,Li T T,Zhu Z L,et al.Benchmarking and analyzing the energy consumption of cloud data management system[J].Chinese Journal of Computers,2013,36(7):1485-1499.
- [39]廖彬,张陶,于炯,等.MapReduce能耗建模及优化分析[J].计算机研究与发展,2016,53(9):2107-2131.
- [40]Bonamy R,Bilavarn S,Muller F.An energy-aware scheduler for dynamically reconfigurable multi-core systems[C]//International Symposium on Reconfigurable Communication-Centric Systems-On-Chip.2015 10th International Symposium on IEEE,2015:1-6.
- [41]卞琛,于炯,修位蓉,等.基于分配适应度的Spark渐进填充分区映射算法[J].通信学报,2017,38(9):133-147.
- [42]Zong Z,Manzanares A,Ruan X,et al.EAD and PEBD:two energy-aware duplication scheduling algorithms for parallel tasks on homogeneous clusters[J].IEEE Transactions on Computers,2010,60(3):360-374.
- [43]Goiri I,Le K,Nguyen T D,et al.GreenHadoop:leveraging green energy in data-processing frameworks[C]//Proceedings of the 7th ACM european conference on Computer Systems.ACM,2012:57-70.
- [44]卞琛,于炯,修位蓉,等.内存计算框架局部数据优先拉取策略[J].计算机研究与发展,2017,54(4):787-803.
- [45]Greenan K M,Long D D E,Miller E L,et al.A spin-up saved is energy earned:achieving power-efficient,erasure-coded storage[C]//Conference on Hot Topics in System Dependability.USENIX Association,2008:4-4.
- [46]卞琛,于炯,修位蓉.基于回归检测的滑动块重复数据删除算法[J].新疆大学学报(自然科学版),2017,34(3):259-266.
- [47]Yao Xiaoyu,Wang Jun.Rimac:a novel redundancy-based hierarchical cache architecture for energy efficient,high performance storage systems[C]//Proceedings of the 1stACM SIGOPS/EuroSys European Conference on Computer Systems(EuroSys 06),Leuven,Belgium,Apr 18-21,2006.New York,NY,USA:ACM,2006:249-262.
- [48]Patan R,Rajasekhara B M.Re-Storm:real-time energy efficient data analysis adapting Storm platform[J].Jurnal Teknologi,2016,78(10):139-146.
- [49]Sampson A,Dietl W,Fortuna E,et al.EnerJ:approximate data types for safe and general low-power computation[J].Acm Sigplan Notices,2011,46(6):164-174.
- [50]Esmaeilzadeh H,Sampson A,Ceze L,et al.Architecture support for disciplined approximate programming[J].Acm Sigarch Computer Architecture News,2012,40(1):301-312.
- [51]Shin D J,Park S K,Kim S M,et al.Adaptive page grouping for energy efficiency in hybrid PRAM-DRAM main memory[C]//ACM Research in Applied Computation Symposium.ACM,2012:395-402.
- [52]Vasudevan V,Franklin J,Andersen D.FAWN damentally power-efficient clusters[C]//Proceedings of the 12th Workshop on Hot Topics in Operating Systems(HotOS 09?),Monte Verit`a,Switzerland,Piscataway,NJ,USA:IEEE,2009:1-5.
- [53]Kim H S,Shin D I,Yu Y J,et al.Towards energy proportional cloud for data processing frameworks[M].San Jose:USENIXAssociation,2010.
- [54]Hsu C H,Slagter K D,Chen S C,et al.Optimizing energy consumption with task consolidation in clouds[J].Information Sciences,2014,258(3):452-462.
- [55]Nitesh M,Nanduri R,Varma V.Dynamic energy efficient data placement and cluster reconfiguration algorithm for MapReduce framework[J].Future Generation Computer Systems,2011,28(1):119-127.
- [56]Yin Shu,Alghamdi M I,Ruan Xiaojun,et al.Improving energy efficiency and security for disk systems[C]//Proceedings of the 12th IEEE International Conference on High Performance Computing and Communications(HPCC 10),Piscataway,NJ,USA:IEEE,2010:442-449.