Storm环境下一种改进的任务调度策略An Improved Storm Task Schedule Strategy in Storm Environment
刘月超;于炯;鲁亮;
摘要(Abstract):
随着大数据业务的快速增长,针对大规模数据处理的实时计算变成了一种业务上的需求,缺少"实时的Hadoop系统"经成为整个大数据生态系统中的一个巨大缺失,Storm的出现很好地满足了这一需求.然而,Storm默认的round-robin资源调度,既不考虑内部组件的传输延迟时间,也不考虑节点间的传输延迟.因此,本文提出了Storm环境下一种改进的任务调度策略(Improved Storm Task Schedule Strategy,ISTS),该策略旨在提高通过最大限度地提高资源利用率和吞吐量同时最小化网络延迟.实验结果表明,使用ISTS进行任务调度时,可以同时满足软、硬资源限制以及最小化组件之间的网络传输距离.
关键词(KeyWords): 大数据;Storm;数据处理;调度
基金项目(Foundation): 国家自然科学基金项目(61462079,61262088,61562086,61363083,61562078)
作者(Authors): 刘月超;于炯;鲁亮;
DOI: 10.13568/j.cnki.651094.2017.01.017
参考文献(References):
- [1]孙大为.大数据流式计算:应用特征和技术挑战[J].大数据,2015(3):99-105.
- [2]孙大为,张广艳,郑纬民.大数据流式计算:关键技术及系统实例[J].软件学报,2014,25(4):839-862.
- [3]Sun D,Zhang G,Yang S,et al.Re-Stream:Real-time and energy-efficient resource scheduling in big data stream computing environments[J].Information Sciences,2015,319:92-112.
- [4]Daoud M I,Kharma N.A hybrid heuristic–genetic algorithm for task scheduling in heterogeneous processor networks[J].Journal of Parallel&Distributed Computing,2011,71(11):1518–1531.
- [5]Sinnen O,To A,Kaur M.Contention-aware scheduling with task duplication[J].Journal of Parallel and Distributed Computing,2011,71(1):77-86.
- [6]Wang C D,Lai J H,Huang D,et al.SVStream:A Support Vector-Based Algorithm for Clustering Data Streams[J].IEEE Transactions on Knowledge&Data Engineering,2013,25(6):1410-1424.
- [7]Xu Y,Li K,He L,et al.A DAG scheduling scheme on heterogeneous computing systems using double molecular structurebased chemical reaction optimization[J].Journal of Parallel&Distributed Computing,2013,73(9):1306–1322.
- [8]Xu J,Chen Z,Tang J,et al.T-storm:Traffic-aware online scheduling in storm[C]//Distributed Computing Systems(ICDCS),2014 IEEE 34th International Conference on.IEEE,2014:535-544.
- [9]Zaharia M,Das T,Li H,et al.Discretized streams:an efficient and fault-tolerant model for stream processing on large clusters[C]//Proceedings of the 4th USENIX conference on Hot Topics in Cloud Computing.USENIX Association,2012:10-10.
- [10]Borthakur D,Sarma JS,Gray J,et al.Apache hadoop goes realtime at Facebook[C]//Proc.of the ACM SIGMOD Int’l Conf.on Management of Data(SIGMOD 2011 and PODS 2011).Athens:ACM Press,2011.1071-1080.
- [11]Neumeyer L,Robbins B,Nair A,et al.S4:Distributed stream computing platform[C]//Proc.of the 10th IEEE Int’l Conf.on Data Mining Workshops(ICDMW 2010).Sydney:IEEE Press,2010.170-177.
- [12]Sun D,Zhang G,Zheng W,et al.Key Technologies for Big Data Stream Computing[M]//Li Kuanching,et al.Big Data:Algorithms,Analytics,and Applications,CRC Press,2015:193-214.