- Access to network
- Account creation in MareNostrum
ISSUES/PROBLEMS:
In this part, please mention the minor and major problems that you are facing in your research.
PLANS:
In this part, please mention your short- and long-term plans for the weeks and months to com. Barcelona Research Objectives
Overall Objective: Achieve accurate and timely performance prediction on compute cluster to be used in Grid Computing Environment Meta-Scheduling.
1. Get mpidtrace linking properly with WRF compiled on GCB, then Mind.
2. Use generated MPI tracefiles (Paraver and Dimemas) to do prediction between Mind and GCB
3. a) Install Amon and Aprof on MareNostrum.
b) Run benchmarks on MareNostrum
4. Experiment with how well Amon and Aprof scale to larger number of nodes
5. Analyze how Amon and Aprof relate to/could possibly combined with Dimemas.
6. Work with Marc to see how we can optimize the gridification of WRF
SUMMARIES/CRITIQUES OF PAPERS:
In this part, please include a short review of the papers you have read during the last week. The review should include three short paragraphs for each paper. The first paragraph should be a short summary of the paper; The second paragraph should include a short critique of the paper; and The last paragraph should include a discussion on how this paper is related to your research.
Review: The Vision of Autonomic Computing
The essential idea of autonomic computing is that systems are able to manage themselves according to a system administrator’s policies and goals. This should free the system administrator up from many time consuming tasks and should ensure peak and optimal performance for the system at all times by reducing opportunities for error. The four concepts of self-management are self-configuration, self-optimization, self-healing, and self-protection. The term autonomic sums up the direction in which efforts are being made. They are being made to take most of the human element out of the configuration of components and systems, performance improvement and optimization, error detection and repair, and security. Furthermore, things are complicated when taking into consideration the interaction of separate heterogeneous autonomic systems which may be located across company and country boundaries.
This article described the vision and need for autonomic computing. It brings up the important fact that we are now reaching the limits of our computational limits as we know them and that autonomic computing is our only option left. All this text did not explicitly cover this, it did say that grid and web services fell underneath this umbrella. Autonomic computing seems to me to be a step in the direction towards artificial intelligence. With this said, the article makes sure to mention at the end the people that will need to be involved in the development of this process. These people were not only computer scientists and engineers, but they were scientists, psychologists, economists, and people in the legal profession. Many different factors have to be taken into account such as robustness, costs, security, effect on humans, and legal aspects. With this advancements have to be made carefully in this direction to assure that humans to not develop things too big for them to handle.
This paper lays the basis and overarching foundation for my research. Moving down the hierarchy, my research then falls under grid computing then to job-management then to meta-scheduling then to application cluster performance prediction. These are all essential topics in a move towards autonomic computing. Our goal is to take job scheduling to the grid environment and develop applications that help meta-schedulers make better scheduling decisions to optimize the performance of applications across the grid. From there it will be the individual systems’ responsibility to autonomically manage the jobs, its performance, and its security. If there is a problem with the system, then the scheduler should recognize this and not submit any jobs to the system until the issue is resolved.
Reference:
Jeffrey O. Kephart and David M. Chess. The vision of autonomic computing. IEEE Computer, 36(1):41-50, January 2003.
Review: Transparent Grid Enablement of Weather Research and Forecasting
This paper gives a high level overview of the current status of the transparent grid enablement of the Weather Research and Forecasting code. It goes into great detail in describing the motivation for this project as being the accurate and timely prediction of such wether disaster phenomena such as hurricanes. It details the need for specific zip code level forecasts for the benefit of meteorologists, business owners, and emergency response personnel. From there it talks about the different aspects of the project such as the transparent grid enablement through such technologies as TRAP/J and Grid Superscalar. Other aspects of the project are then described such as the middleware, meta-scheduling, job flow management and profiling tools.
This paper serves to lay out the over arching foundation and motivation for my current research. My current research is in the area of profiling, optimizing the tools Amon and Aprof in considering more compute characteristics and offering a better comparison with the prediction tool Dimemas.
Reference:
Transparent Grid Enablement of Weather Research and Forecasting S. Masoud Sadjadi1, Liana Fong6, Rosa M. Badia2, Javier Figueroa1,9, Javier Delgado1, Xabriel J. Collazo-Mojica8, Khalid Saleem1, Raju Rangaswami1, Shu Shimizu4, Hector A. Duran Limon5, Pat Welsh3, Sandeep Pattnaik10, Anthony Praino6, David Villegas1, Selim Kalayci1, Gargi Dasgupta7, Onyeka Ezenwoye1, Juan Carlos Martinez1, Ivan Rodero2, Shuyi Chen9, Javier Muñoz1, Diego Lopez1, Julita Corbalan2, Hugh Willoughby1, Michael McFail1, Christine Lisetti1, and Malek Adjouadi1 1: Florida International University (FIU), Miami, Florida, USA; 2: Barcelona Supercomputing Center, Barcelona, Spain; 3: University of North Florida, Jacksonville, Florida, USA; 4: IBM Tokyo Research Laboratory, Tokyo, Japan; 5: University of Guadalajara, CUCEA, Mexico; 6: IBM T. J. Watson, NY, USA; 7: IBM IRL, India; 8: University of Puerto Rico, Mayaguez Campus, Puerto Rico; 9: University of Miami, Coral Gables, Florida, USA; 10: Florida State University, Tallahassee, Florida, USA
Review: A Modeling Approach for Estimating Execution Time of Long-Running Scientific Applications
This paper describes a modeling approach for the estimation of application execution times on a compute cluster contained in the software application tools Amon and Aprof. While there are several factors that should be taken into account when considering the run time of an application. A first assumption was made that the most significant contributors are the number of processors and the CPU clock speed. To summarize the details, the following model was used:in which Texec is the execution time, Ci is the i-th contribution, and m is the number of the contribution terms. Essentially, in this case, there was C0 and C1 due to the degree of parallelism (number of nodes) and the CPU performance contribution. For this paper the final model expanded to being: where α0, α1, β0, β1, we constants such as execution overhead and application characteristics.
Experiments were performed on two clusters at Florida International University, GCB and Mind, with 8 nodes and 16 nodes respectively. Experiments were executed varying the number of nods and the CPU utilization. Monitoring information from Amon was used as input data to Aprof and prediction were made for within the same cluster and intra-cluster. All predictions were made in less than 10% error of their actual values, thereby supporting the model.
The Amon and Aprof approach differs from other related works. For instance, it differs from the performance prediction simulator, Dimemas [2], in that it focuses on online prediction. Yang et al. [3] requires the analysis of the application source code and a sample execution on the target platform. The Amon and Aprof approach differs in being application agnostic, doesn’t require a sample execution on the target platform, and models execution scale addressing distributed applications. An interesting point here is the fact that the experiments were performed on small clusters. We now want to make benchmarks on a larger number of resources of which it will now be beneficial to access the supercomputer MareNostrum. However, these results will be used to optimize the model that will be used in grid environments.
My current research is an extension of this work. What we want to now do is validate this model for a larger number of resources, and get a more in-depth comparison with the performance simulator Dimemas. We are, also, are considering adding in memory parameters into Aprof’s prediction model as part of the optimization process that is discussed. In addition, this research focuses on the Weather Research and Forecasting Code of which it is our ultimate goal to grid enable this application. Down the road, however, we would like to further validate our assumption that this model should be application agnostic and further its application.
References:
[1] S. Masoud Sadjadi, Shu Shimizu, Javier Figueroa, Raju Rangaswami, Javier Delgado, Hector Duran, and Xabriel Collazo. A modeling approach for estimating execution time of long-running scientific applications. In Proceedings of the 22nd IEEE International Parallel & Distributed Processing Symposium (IPDPS-2008), the Fifth High-Performance Grid Computing Workshop (HPGC-2008), Miami, Florida, April 2008.
[2] R. Badia, F. Escale, E. Gabriel , J. Gimenez, R. Keller, J. Labarta, M. S. Müller, Performance Prediction in a Grid Environment. European Across Grids Conference, 2003. [3] L. T. Yang, X. Ma, and F. Mueller. Cross-Platform Performance Prediction of Parallel Applications Using Partial Execution. Proceedings of Supercomputing, 2005.