Log on:
Powered by Elgg

Camilo Silva :: Blog

July 29, 2008

  

WEEKLY STATUS REPORT
Camilo A. Silva

July 21-27

 

ACTIVITIES:
Time has gone fast. This was my last week in Guadalajara. But, let me tell you—what a great time I had! I am happy because I had a great opportunity to live. To go to another country and get to familiarize with it is a memorable experience. In my case, I was able to excel in my studies. But, also I was able to meet new Mexican friends.

 

Ok. Going back to my activities, during this past week I had the meeting with my team members on Monday, where we discussed the progress of our work—and guess what? We completed it. All the parallelization of our project is completed. Only one thing was left: the error handling of the program. Thus, both Gary and I decided to work together in this task.

 

My friend Gary did a great job in getting this running promptly, while it took me a bit longer to complete mine. Therefore, we were able to run the first set of data during the weekend (Michael kindly asked GCB users to let us the cluster). The first set of data results were completed with no errors so far. There was one complication, though—this will be commented below in the “ISSUES/PROBLEMS” sections.

 

All in all, I was able to complete my program. I was able to fulfill my goal. My program is a parallel program that is capable of managing MPI communication errors only.

 

On Wednesday, I had my last meeting with Dr. Duran. During that meeting I shared with him the progress of my job and the steps to follow afterwards. He was of great help through out my visit.

 

Another important thought that I want to emphasize is that I never expected to have such a great working experience with my friends from the Bioinformatics group. I was amazed at how well we were able to “virtually” work together from different parts of the globe: China, Mexico, and USA. Truly, I was also glad by the great leadership and companionship from Mr. Michael Robinson. He was always there to help me, full of patience, and good guidance. My partner Gary is a great member as well—a hard worker and very knowledgeable about programming. I sincerely feel that I am with the best team members.

 

Looking back to the wonderful time that I have spent in Guadalajara, I am proud to say that I do not regret anything at all. I am sincerely grateful with FIU CIS and the PIRE program for giving me this prestigious opportunity.  

 

One important thing, through out my PIRE experience I had the chance to work with many important people that helped me solved issues. I never had the opportunity to thank them publicly so I want to take the time to thank them all:

  1. I want to thank God for giving the opportunity to participate in this research experience
  2. All the FIU students that helped me: David, Juan Carlos, both Javier Delgado and Javier Figueroa, and Michael Robinson
  3. Big thanks to Michael Robinson because he was a great team leader I am both proud and happy to have partnered with him and Gary
  4. Many thanks to Gary because he gave me tons of insight in my programs
  5. Special thanks to all Professors and Administrators in charge of PIRE: Dr. Sadjadi, Dr. Graham, Ms. Carbajo, Dr. Hector Duran, Dean Yi Deng, and every single person that helped out with the PIRE program
  6. Lastly, special thanks to both of my Professors and Advisors in Mexico and USA: Dr. Duran, and Dr. S. Masoud Sadjadi


ACCOMPLISHMENTS:
I am proud to say that I completed my proposed plan for the Summer. I completed my parallelized program with the capability of self-healing whenever an MPI error message is detected at the time the master node sends a message to the slave nodes.


ISSUES/PROBLEMS:
Just as I mentioned in the “ACTIVITIES” section, there was a little problem that we had during the runtime of our project. Yesterday night during our meeting, Michael shared with us that the problem dealt with something known as “memory leakage.” To be honest, I am not quite sure what the cause of the problem is and how it should be resolved. This is something that as a group we will find out and that Michael decided to look on.

PLANS:
The big plan now is to write the technical paper and complete my PIRE DVD on time.

 

SUMMARIES/CRITIQUES OF PAPERS:

  

FIRST READINGS: “MPI Error Handling”

REFERENCES:

http://beige.ucs.indiana.edu/I590/node85.html

 

http://www.mpi-forum.org/docs/mpi-11-html/node148.html

 

http://www.dartmouth.edu/~rc/classes/intro_mpi/mpi_error_functio

 

http://www.hlrs.de/people/gabriel/benchmarking/ft-vsuite-s

 

http://www.netlib.org/utk/people/JackDongarra/PAPERS/ft-mpi-l

 

The basic theory that I learned from all those articles about error handling is that the MPI communicator is more than just a group of processes that belong to it. Amongst some of the items that belong to the communicator and that it hides inside its body is the error handler. It is important to point out that whether an error message is printed or not, it depends on its implementation.

 

MPI error(s) arise whenever there are messages that are incorrectly constructed, addressed, set or received. Please note that MPI does not provide mechanisms for dealing with failures in the communication system. What MPI does is that it provides mechanisms to solve recoverable errors. Simply meaning that the default error handler of aborting an MPI program will be replaced with an appropriate error handler. Thus, in order for the application to identify the error code, the MPI_Error_class routine converts any error code into one of a small set of standard errors codes known as error classes. Furthermore, MPI provides only two types of predefined error handlers: MPI_ERRORS_ARE_FATAL which is the default that causes the MPI program to abort whenever an error is found. And, the other is MPI_ERRORS_RETURN which causes MPI to return an error value instead of aboting.

 

Since I read a whole lot of different papers or documents, I could generalize and comment that all of them were helpful. Some of the documents were heavy in a lot of theory which was a bit monotonic, while others were very simple in the definitions and provided great examples. All of them were easy to comprehend.

  

This topic was extremely important for the last part of my project because it helped me a lot in establishing a self-healing system to my project.

  

SECOND READINGS: “MPI debugging”

RESOURCES:

 

http://cw.squyres.com/columns/2004-12-CW-MPI-Mechanic.pdf

 

http://www.clustermonkey.net/index.php?option=com_search&searchw

 

http://www.hlrs.de/organization/amt/services/tools/debugge

 

http://www.cs.utah.edu/research/techreports/2007/pdf/UUCS-07-0

 

http://www.nacad.ufrj.br/sgi/007-3687-010/sgi_html/ch04.html

 

I wanted to learn what strategies were present in the debugging process of MPI. And I was able to be exposed to techniques that are used nowadays to debug parallel programs. The first technique that I learned was “printf()” debugging. This technique is not that effective in parallel programs because of the multiplicative effect meaning that many nodes will be printing the same thing unless there is something in the print out that could identify them. Also “printf()” techniques can only display a limited subset of the process state.

 

Other types of debugging techniques are to use the serial debuggers in parallel. Although serial debuggers were not developed to be used in parallel programs, they might provide some insight in finding certain bugs.

 

Memory Checking debuggers look for erroneous patterns such as accessing memory outside of an array or the local stack using heap memory that was already freed. One of the advantages of using such is that they report all errors in a file and with line numbers. The downside is that it cannot be used interactively and cannot be attached to already-running processes.

 

The last category of techniques for debugging is known as the parallel debuggers. Besides all the common functionality known from all debuggers, these type of debuggers are capable of setting breakpoints, examining variables, stepping through code, and also individually monitor and control all processes in a running MPI job.

 

This topic was very interesting. And all the authors did a great job in explaining it. Although most of the information could be found by reading only one of the five articles that I read.

 

This information was important to me because it helped me in understanding what the different types of debugging techniques are that could be used in a parallel MPI program.

     

Posted by Camilo Silva | 0 comment(s)

July 21, 2008

Just wanted to take this time and let you all know that Colombia's Independence from Spain was declared on July 20 of 1810. For all of those that have Colombian friends its not too late to congratulete us for our Independence.

There is a website that I found that talks about the story of our Independence, check it out:
http://www.historyworld.net/wrldhis/PlainTextHistories.asp?historyid=ab81

  

       

Have a nice day my friends and God Bless you and my beautiful Nation, Colombia!!!

:)

Keywords: Colombia

Posted by Camilo Silva | 1 comment(s)

  

WEEKLY STATUS REPORT
Camilo A. Silva

July 21

  

ACTIVITIES:
During last week, I participated on REU’s 2nd meeting where I shared my project progress to my peers. I presented to them the details of the MPI program and all of its communication patterns.

  

On Monday and Wednesday, I had my weekly meetings with my group members where we shared the progress of the project so far and discussed issues and challenges to be solved.

 


ACCOMPLISHMENTS:
I am happy to report that the parallel program is running! The only thing is that it is only running successfully whenever there are less or equal number of tasks than nodes. Whenever there are more tasks than nodes there is an I/O file open error found. Such bug should be solved soon.


ISSUES/PROBLEMS:
The biggest challenge that I had this week dealt with the communication with my group members.  Specifically, we were dealing with a problem about a queue implementation for the parallelized project. However, I already had such implementation active in the parallel code so there was no need to do it again. I tried to explain that to my group members via EVO, but unfortunately the message was not well understood.

 

Fortunately, on our second meeting of the week, we went over my parallel code and I showed them the queue implementation which they completely understood. Furthermore, I presented to them a file I/O error that the sequential code was throwing in cases where there are more tasks submitted than the nodes present. I provided to my group members a print out of the error. One of my group members identified the problem or bug and agreed to help solve it.

 

Another challenge that I have deals in learning the procedure of writing a technical paper. Dr. Sadjadi provided with great insight in how to learn by reading sample technical papers and ask for help from my team members.  

  

PLANS:
First of all, the biggest plan right now is to have the parallelization program running in the cluster perfectly. There is only one bug to fix which deals with some file I/O of the sequential code of the application.

 

Secondly, my goal is to have an autonomic computing implementation ready for the parallel program as well.

 

Lastly, my goal is to start writing the technical paper and do my best to have it ready by the end of next week.

  SUMMARIES/CRITIQUES OF PAPERS:

N/A

 

Keywords: progress report

Posted by Camilo Silva | 0 comment(s)

July 14, 2008

  

WEEKLY STATUS REPORT
Camilo A. Silva

July 14, 2008

 

ACTIVITIES:
This week was instrumental in fulfilling the objective of parallelizing the project of our group. A lot of work has been invested in this good cause. At the beginning of the week, I finished a testing model that would perform the message passing communication just as designed and desired. The testing model worked as planned. Without difficulties. Without worries.

  

Past Thursday, I started to modify the code that was created for the testing model to be adapted for the project18 code to be parallelized. This adaptation was “completed” on Friday. Tests started to be executed, but little did I know that many troubles awaited me. Since that Friday, I have been performing tests—nonstop. It has been a learning experience. Someone that has been instrumental in overcoming the challenges is Michael Robinson, our group leader. On Saturday, we had an informal conference call in order to work out some issues with the parallelization of the code. We were able to work something out and find one of the bugs. Once fixed, on that night some more tests were submitted. However, the tests had errors—this time those errors dealt with file permissions. On Sunday, early morning, I did some modifications as far as the access of the files needed and tried to run the program once again. To my surprise the program was executed successfully. Although the program runs successfully whenever there are less or equal number of tasks submitted to the same number of nodes, in the case when there are more tasks than nodes the program does not run successfully. I am hoping to fix that bug soon.


ACCOMPLISHMENTS:
The parallelization of the program was completed.


ISSUES/PROBLEMS:
Most of the challenges faced dealt with the debugging of the parallelization code.

  

PLANS:
The goal for this week is to have the parallelized code working efficiently. Also, I plan to implement the self-healing and self-optimized functions. I plan to start writing the technical paper as well as building the website.

  SUMMARIES/CRITIQUES OF PAPERS:

N/A

  

Keywords: bioinformatics, report

Posted by Camilo Silva | 0 comment(s)

July 07, 2008

WEEKLY STATUS REPORT
Camilo A. Silva

June 29 – July 6

 

ACTIVITIES:.

This past week was essential because it was the time for me to perform some tests on the MPI-I/O capabilities when writing files collectively (when all nodes write to the same file) and independently. About three different tests were made. I reported my results to my group of Bioinformatics and provided them with the results as well.

 

Last week, there was only one group meeting with my team and we discussed our goal in having the parallelization of our project complete by July 15. It seems that we will be able to meet our goal only if we can work on the parallelization of the program in the next couple of days and execution during the weekend.

 

I have also met with Dr. Hector Duran and I shared with him our group’s goals and deadlines for the days to come.


ACCOMPLISHMENTS:
Last week was a success. I was able to program different MPI programs that tested MPI-IO capabilities and be able to learn how those worked in the writing and reading of files in the cluster.

  

Moreover, I completed a power point presentation that presents a design in parallelizing our project. Such design would serve as a programming roadmap for the parallelization of the code. The presentation could be found here:

http://latinamericangrid.org/elgg/camilo.silva/files/23 

It is entitled as "Parallelizing ... Bio Project" 

I am glad that everything is moving forward; now it’s time to put everything together and put it to work!    


 

ISSUES/PROBLEMS:
I just had some technical problems about the proper usage of some parameters of the MPI-IO functions—especially the buffer. I was able to resolve such issues by running the tests on the Grid and, by trials I was able to understand how the buffer is supposed to be used properly.

 PLANS:
Implement MPI in the sequential code for our project, parallelize it, run some tests, and have it ready by the end of this weekend.

SUMMARIES/CRITIQUES OF PAPERS: 

FIRST PAPER: Overview of the MPI-IO parallel IO interface

by: Corbertt, Peter; et al.

 

This document talks about the MPI/IO interface and that it is supposed to be used as an asynchronous I/O allowing computation with I/O and optimization of physical file layout on storage devices. The overview of MPI-IO is to have I/O modeled as a message passing by fulfilling some proposed goals such as: target scientific applications as well as other applications, have a real world need, and have a clear performance over functionality. In essence, MPI-IO should be used in order to read and write files in a collective manner—where all processors in the cluster would be able to access them.

 

The paper starts by talking about data partitioning and the authors explain that MPI derived data types are used in MPI to describe how data is laid out in the user’s buffer. Thus, MPI-IO uses some elementary derived data types known as filetype and buftype. A filetype simply defines a data pattern that is replicated along throughout the file, MPI derived data types consist of fields of data that are that are located at specified offsets.

 

The next topic that the authors discussed was on MPI-IO data access functions. This topic explained the importance of understanding that in a parallel environment, the system must decide whether a file pointer is shared by multiple processes or if it will be accessed by a single process. The authors did a great job at explaining terms and definitions, for example they simply defined the file pointer to be used to keep track of the file position.

 

In the last topics, the authors talked about blocking and non-blocking synchronization. They explained that blocking I/O calls will block until completed, while non-blocking I/O calls only initiates an I/O operation but it does not wait for it to complete. This topic led to the last topic on file layout and coordination, which explained in detailed that MPI-IO is intended as an interface that maps between data stored in memory and a file.

 

This paper was extremely helpful for me because it provided tons of insight about MPI-IO and how its structure is defined along with its main purpose and objectives. I was able to learn more about the “inside” picture of how a collective file creation would be handled and completed. This information will help me in the completion of my project due to the fact that it seems that we will be implementing an application of our project that will be using collective I/O commands.

   

SECOND PAPER: Sowing MPICH: a Case Study in the Dissemination of a Portable Environment for Parallel Scientific computing

by: William Gropp and Ewing Lusk

 

This paper explained the whole process in how MPICH was putted together and covers interesting information on its architecture. It covers topics on preparing software for unknown environments, preparing a structure software to absorb contributions by others, automating the creation of manual pages and documentation, automating pre-releases and managing the inevitable problem reports with a minimum of resources for support.

 

The author did a great job at explaining all the details of MPICH they successfully covered and explained the goals of MPICH, multisite development, portability, managing documentation, automated testing, release fore distribution, and discussion on the tools for managing interactions with users. Something that I learned from them is that the goal of the MPI implementation, MPICH is simply to have robustness, performance, and portability. Pretty much they presented the aspects of all the development of the MPICH project. They provided the techniques and tools that might be common to any project whose goals is in creating portable, parallel tools and distributing them to a community.

 

This paper truly is not related with my research, but I found interesting in learning how MPICH was developed. I was hoping to learn more about the functions of MPICH and more details on the application during run time, however.

  

THIRD PAPER: Dynamic Process Management in an MPI Setting

by: Gropp and Lusk

 

This paper focuses on how processes are managed during runtime. A description of an architecture of the system runtime environment of a parallel program that separates the functions of a job scheduler, process manager, and message passing system is given by explaining some important components such as the job scheduler, process manager, and the message passing system. An important fact that I learned is that a parallel program never runs isolated, it must have computing and other resources processes to be started and managed. Thus, one way to decompose the complex runtime environment is to separate the functions of the job scheduler, process manager and message passing library and security.

 

The job scheduler function is to allocate the resources of a parallel program as well as the time when the parallel program will run. The process manager is in charge of managing a process once started—specifically the standard input, output, and error signals. The message passing library is used by the program for its interprocess communication. Finally, security ensures that the job scheduler does not allocate resources that are not supposed to be allocated, that the process manager indeed manages the process that it starts, and that the message passing library delivers the messages only to their propoer destinations.

 

The authors did a great job in explaining the different components needed for a parallel environment. Furthermore, they cover an important topic which is about the communication of each different component. They went over three different types of communication applications such as task farming, dynamic distribution, and client/server communication. All in all, the authors were able to explain in good detail the environment where a dynamic process management takes place and some types of applications that could use it.

 

This paper helped me in my research by better understanding how a dynamic process management works. Specifically, I wanted to find some insight in how to self optimize an MPI process. Happily, I was able to get some ideas as how—in this case by exploring the job scheduler which is the one in charge of assigning the resources to a task. I have not an idea how to do that yet, but I will be looking and researching a bit more on such.    

 

Posted by Camilo Silva | 0 comment(s)

July 01, 2008

I was having some difficulties with a simple MPI0-IO testing program. The program is supposed to ask for the user's input to give a name of a file. The file would be created if it does not exist and then it would be open so that 'x' number of nodes in the cluster could write on it collectively. Thus, there would be only one single file with the content given from the nodes.

My first problem dealt in learning that the path "PATH=/opt/mpich/gnu/bin:$PATH" is an environmental variable and every time one logs out it would be discarded. Well, I did not know that! hehehe! So every time I tried to input the information to my program I could not!

My last challenge was in broadcasting a message to all nodes. In this case, I needed to broadcast to all nodes the file name to be opened. Thanks to a MPI-IO program that a group member of my team provided me, I was able to find out what I was missing--and, I was able to fix it.

In conclusion, after a lot of trials, I was able to carry forward my goal for the day and the solution for my little challenges were found!

 

Keywords: challenge, MPI, MPI-IO, solution to a problem

Posted by Camilo Silva | 0 comment(s)

June 30, 2008

I have some pictures that I woukld like to share with all of you. These are pictures of my friends Sean and Allison in CUCEA (the university campus). You will see our contact Professor, Dr. Hector Duran, below on some of the pictures. He's a great person.

   

Since it was not that hot and it was rather chilly outside, we decided to work in the cyberforest section that this campus has. Each table has next to it electric outlet and a Ethernet outlet to connect to the LAN. Also, there is Wi/Fi around but the signal is sometimes weak. After a couple of hours working, we went to see Dr. Duran for our weekly progress report meeting.

   

We usually meet once per week. However, last week we have met for two days both Tuesday and Thursday. During our meetings Dr. Duran helps us in giving us guidance, insightful feedback, and positive criticism.

  

 

 

Keywords: CUCEA, Hector Duran, professor, Progress reports, Research experience, weekly meetings

Posted by Camilo Silva | 0 comment(s)

WEEKLY STATUS REPORT
Camilo A. Silva

June 23 – June 29
 

ACTIVITIES:
Action. That’s what this whole week has been about. I had the time to read some papers and further my studies on the topics of MPI, MPICH, autonomic computing, and a little on the usage of the Rocks GCB cluster.   On Monday June 23, I had my group meeting with my bioinformatics team members and I was able to share with them my last presentation on MPI: MPICH implementation and derived data types.

During that meeting, it was decided to start designing the structure models for the communication of the MPI program for project18 parallelization. Thus, I volunteered to design the model and I was able to present this model to my group members on the following meeting of the week on Wednesday, June 25.  During that meeting, we were able to discuss that the first parallel implementation of project18 is to simply start the processing of different discriminating probes genomes on the different nodes. In other words, project18 would be sent to each node of the GCB cluster and after it has finished computing, the result files will be saved and accessed in the /share/../bioinformatics/results/ folder.   

In order to handle files, MPI possesses a library of I/O functions; such library is known as MPI-IO. Therefore, my task from that Wednesday meeting until today was to learn MPI-IO and be able to run a simple test on the GCB cluster, where a file is created and opened collectively so that all nodes could write on it. Additionally, I wanted to create a different IO test that allows each node to open a file and read its contents and append info on it. 

I have been able to report my progress with Dr. Duran here at the University of Guadalajara. Every Tuesday and Thursday from 4:00 p.m. – 5:00 p.m, Sean, Allison and I have a meeting with Dr. Duran. I have been able to talk with Dr. Duran about autonomic computing self-healing properties for my project as well as the MPI-IO implementations that I needed to learn. I was able to share to him that the self-healing implementation of my project was not as concrete as I was expecting since I have not had a chance to program the parallel program of it, and I was not fully aware of what faults I would be expecting besides the famous ones of “a node going down or connection losses.”
 

ACCOMPLISHMENTS:.

The great accomplishment for this week is that the design structure of the MPI communication of the program was completed. The power point presentation could be found in here http://latinamericangrid.org/elgg/camilo.silva/files/23 

Another accomplishment is that I was able to learn the basis of MPI-IO after completing a lot of readings. Here are some of the materials that guided me tremendously in order to learn the basics of MPI-IO:


ISSUES/PROBLEMS:
I could say that the biggest issue until now (BTW, this is something that I am still trying to solve) it’s a technical issue. Through out this weekend, I was working on some testing programs for the MPI-IO functions in order to practice and learn how they perform in the cluster. The testing program that I created needed interaction with the user—meaning that it was asking the user to input some information such as the name of the file to be created or opened—what happened, unfortunately, was that at the time of run-time, the program would ignore the input from the user and it would just carry along until the end of the program. It was kind of funny to see a program act this way! So, the only thing left to do was to Google. And so I tried to look for more similar examples asking the user for input, and I compiled them as well with hopes of solving the problem. But, guess what? The same error was constantly happening over and over again. 
 

It was until today, Monday, June 30 2008 that I decided to consult other friends of mine that are more knowledgeable and experienced about MPI-IO to give me a hand. Thus, I am still waiting to solve this little issue of user interaction during the execution of a parallel program.   

PLANS:
The major goal for this week is to write the parallel code for project18 and hopefully have a test run over this weekend.  

SUMMARIES/CRITIQUES OF PAPERS: 

FIRST READINGS: MPI-IO

References:
http://www-unix.mcs.anl.gov/mpi/tutorial/advmpi/sc2005advmpi.pdf
http://beige.ucs.indiana.edu/I590/node88.html
http://www.mhpcc.edu/training/workshop2/mpi_io/MAIN.html
 

These are a collection of documents that I found online from credible sources that talk about the basics of MPI-IO. The important thing that I learn about MPI-IO is that it allows the programmer to design a file IO system where all the nodes can access collectively a particular file. What that means is that a file could be opened and all the nodes will be able to write on that same file by following an offset that is generated after each node has written to the file. There are different types of functions depending on the objective and purpose of the parallel program that will be run. Some of the functions are categorized as blocking and non-blocking functions. There are some other functions that allow a file to be saved non-contiguously or contiguously.  

I found that the different references were helpful in different ways. For example, the first reference that is from Argonne labs in Chicago focused their presentation not only on the basics of MPI-IO but also in some other topics such as sparse matrix I/O, passive target RMA and improving performance. In the document material of Indiana.edu I found very interesting all the program examples and detailed explanations of them. On the mhpcc.edu document, I liked very much how each function was described and how all of its parameters were presented and explained.  

The information that I learned from these documents was really important because I was able to learn all the basics of MPI-IO. Mostly everything that I earned would be used in project18. Thus, these documents will be of great reference for the work I am currently doing.

Posted by Camilo Silva | 0 comment(s)

June 24, 2008

Hello there friends! I just wanted to let you know how I was doing aftert the break-in which was my previous personal blog. Well, the following day, I moved to a suit hotel that was huge--I spent two days over there. Afterwards, I moved to another hotel that is antique and colonial Hotel de Mendoza. I spent a week over there and pretty much I had to grab a cab to go to the University because it was kind of far.

I just want to take a moment and thank God as well, just as some of my peers have done so, because I have been able to experience challenges as well opprtunities to excel during my visit to Mexico. Personally, I feel happy for this great opportunity--it has helped me expand my mind and recognize the impact of global communications of our era. Truly, thanks to the Internet, I have been able to communicate with all my close friends, family mambers, and most importantly my girlfriend (jeje).

I have been able so far to work closely with my group of Bioinformatics and even lecture them on a topic that I was not comfortable at all since I did not have experience in it: MPI and MPICH. I was able to communicate with them globally through EVO and work in our project together.

Furthermore, I wanted to share with all of you that my professor here, Dr. Hector Duran has been a tremendous help! He is more than willing to meet with us on a regular basis. Every week we meet for about thirty minutes either on Mondays or Thursdays and we discuss our challenges, ideas , and prospective plans for my project. I could say that the best thing that I like about Dr. Duran is his ability to guide and provide constructive feedback in a simple and positive way. Also, he does not mind at all explaining topics which one might not know.

As far as the research experience and laboratories, I am happy to report that our lab has AC! Yes! Believe it or not AC in Guadalajara is extremely hard to find! Everyday we go to CUCEA with my friends Sean and Allison by riding the bus and we spend the day from 9:30 a.m. - 6:30 p.m. Sean and Allison had been great friends. I am so thankful to have been sharing my time with them! I have learned really cool stuff from them ranging from YUI to WOW (World of Warcraft). On the other hand, I am happy to report that my group's first parallel program was succesfully run today in the GCB cluster around 6:30 p.m.! YEEEEEYYYY! Please take a look at the pics:

As far as cultural entertainment, I just want to share with you briefly some of the places that we have visited:
-Guadalajara's Downtown--has some of the oldest buildings of Mexico
-La Chata Restaurant
-Don Quixote Ballet Performance at Degollado Theater located in the City's Downtown
-My personal favorite, Santo Coyote restaurant! This is the best place to eat!

I have too many pictures to share with you so I would like to invite you and check my personal photo gallery at:
http://latinamericangrid.org/elgg/camilo.silva/files/ there, you will find pictures of my trip so far--BTW, I created a folder just for the restaurant Santo Coyote!

After all, everything works for the BEST! Cool

Keywords: Bioinformatics, Camilo, CUCEA, Mexico, MPI, Projects, Research

Posted by Camilo Silva | 0 comment(s)

WEEKLY STATUS REPORT
Camilo A. Silva

June 16 – June 22
 

ACTIVITIES:
Through out this week I felt like Rocky… training and training—reading, and reading! Well, I was able to read about MPI and learn about its communication systems such as point-to-point, collective, and asynchronous. As well as modular programming techniques. Tuesday, I continued with the presentations of MPI to my group and I shared with them the topic of asynchronous communications. Following that day, on Wednesday I presented the other part of my presentation which included the topics of modularity, data types, and buffering issues. On Thursday, I presented to the REU students my project and the deadlines for it.
 

ACCOMPLISHMENTS:
Completed the basic training of the theory of MPI and MPICH.

 

ISSUES/PROBLEMS:
The main challenge that I am facing at this time deals with the topic of autonomic computing. I have been reading a few papers on the topic as you shall see right below on my summary of my reading, however, I have not been able to find a code paradigm that could give me a direct guidance of how an autonomic computing code needs or seems to be written… In simple words, a programming code on the topic would help me a lot in order to see the structure of the code and its complexity.
 

PLANS:
The goals for this week are as follows:

  1. Complete the drawing model of the MPI communication implementation for project18
  2. Read IBM’s white paper entitled “An Architectural Blueprint for autonomic Programming”
  3. Finish reading the MPI and MPICH guides
  4. Run a successful MPI job on the cluster—only if the cluster is available for use
  5. Establish the different Autonomic Elements that project18 need that will help solve different issues on self-healing and self-optimization 
  6. Find out about how to request cluster jobs from the web (if any of you know how—comments are always welcome, thank you!)

SUMMARIES/CRITIQUES OF PAPERS: 

FIRST PAPER: A High Performance, Portable Implementation of the MPI Message Passing Interface Standard by William Gropper, et al. 

This is paper is an extensive document that talks about MPICH and its architecture and major components. MPICH is unique among existing implementation of MPI due to the fact that its design goal of portability and high performance are successfully combined. In this paper the history of MPICH was covered, and its portability and performance, as well as other topics that I have not covered yet such as software architecture, MPI implementation, MPICH as a portable environment for developing parallel applications and MPICH management. 

So far the authors of this document had been able to present interesting facts about MPICH—for example, I learned that the CH at the end of MPICH stands for chameleon! Besides that, I think that this document is great for beginners as me on the topic since it starts with the story of MPI and MPICH. They were also successful in presenting the introductory and basic concepts of MPI such as the provision of environmental inquiry, basic timing info for application, and profiling interfaces for performance monitoring. As an example, I was able to further enhance my knowledge in understanding that MPI makes heterogeneous data conversion a transparent part of its services by requiring data type specification for all communication operation. All in all, the authors were just full of good details and in depth info. 

This paper has been so far of great common knowledge for me. I am still reading it and hoping to get to the “action” part of the implementation of MPI. This paper is definitely a must read for me!  


SECOND PAPER: A User’s Guide to MPI by Peter S. Pacheco 

This paper focuses on all the basic knowledge that one needs to know in order to code parallel programs. The author starts the guide by introducing a parallel program of “hello world”. Then he starts talking about the simple functions needed such as MPI_init() and MPI_finalize(). The author then focuses the paper on different progressive topics going through the major components and functions or elements of MPI. So far, I have learned about the message and its data and envelope, the peer-to-peer communications, the tree structured communication, collective communication and some of its functions such as broadcast and reduce, the count parameter, derived types, and other derived data types.   

This paper has been helpful for me because I have been able to reinforce some concepts that I have learned from the presentations of MPI that I have presented to my group members. I have been able to better understand them. One of the good things that I like about this paper is how each function is explained and well defined.  Dr. Pacheco has done a good job in writing this paper. I hope that when I finished reading it I could know more than before. 

This paper so far has been essential to my research project since I need to program using MPI. I have been able to learn topics even better such as concepts as the envelope and the count parameter of the MPI_Send() and MPI_Recv() functions. This paper is a great reference manual in case that I feel lost when using any of the functions. I hope to keep learning more and more while reading it.  


THIRD PAPER: An Architectural Approach to Autonomic Computing by Steve R. White, et al. 

This paper focuses on the architectural approaches to achieving the goals of autonomic computing. The authors describe and outline interfaces and behavioral requirements for individual components. This paper also describes how interactions among components are established and also provides for some adequate design patterns that engender the desired system level properties of self-configuration, self-healing, self-optimization, and self-protection. They provide sufficient information on the background of the architecture of autonomic computing. The architecture of autonomic computing must, first, describe the external interfaces and behaviors required to make an individual component autonomic or self-managing; second, it must describe how to compose systems out of these autonomic components in such a way that the system as a whole is self-managing.

The authors of this document decided to explain autonomic computing architecture from the most basic element of it, the autonomic element. They started by defining the autonomic element, in which they were very clear about it. The paper follows a sequence of interesting parts. First, it starts talking about the behavioral properties of autonomic computing, interfaces and interactions among autonomic elements, construction of a system with autonomic behaviors. Then, the authors address the topic in a very theoretical way about the design patterns—a topic in which my opinion would have been much better to explain if some images or sample code were provided. At the end, there is a little discussion about verification and refinement of the architecture. All in all, this was an informative paper to read, it gave me ground knowledge on the architecture and most importantly the design patterns of autonomic computing systems. 

For my project, I will be implementing the design pattern strategy that was mentioned in the paper. Since I will be focusing on a design that is self-healing and self-optimizing, this paper will be a great reference for my work. Furthermore, the information about the autonomic elements’ behavior, relationships, and policies was essential because I was able to understand what the basic requirements that an autonomic computing system must have.  


FOURTH PAPER: Composing Adaptive Software by McKinley, Sadjadi, Kasten, and Cheng 

Adaptive computing systems have increased dramatically; and ubiquitous computing focuses on dissolving traditional boundaries for how, when and where humans and computers interact. This paper then explains the difference between composition and parameter adaptation—composition adaptation being the most flexible option to choose. The importance of using the middleware as research in adaptive software was an idea that was presented due to the fact that middleware provides a place for adaptation. The paper then iterates the importance in explaining that the core of computational adaptation is a level of indirection for intercepting and redirecting interactions among program entities. Subsequently, the paper explains three main topics of adaptive software such as separation of concerns, computational reflection, and component-based design. 

I learned tons of info from this paper. I learned that the difference between static and dynamic composition is that the latter has flexible approaches to implement compositional adaptation at run time. I was able to understand that most of the time the adaptive code will go in middleware layers—and in a few times the code could be included in the application itself. I was also able to comprehend that although middleware approaches support transparent adaptations, they apply only to programs that are written against a specific middleware platform—this is why developers would need to implement compositional adaptation in the application itself. Ultimately, the authors supported well enough the concept that computational adaptation is powerful, but without appropriate tools to automatically generate and verify the code, its use could have negative impacts, therefore, the system integrity and security need always to be considered. 

This paper was insightful for me. Sometimes I felt as if this was an autonomic computing implementation. Something that I will be having in mind is that the adaptive code could most of the time be attached in a middleware layer. Furthermore, the computational reflection is a topic that I would keep in mind at the time that I am about to implement my autonomic computing self-healing system for my project. In conclusion, this paper is a good resource in understanding how compositional adaptation works and how and where it can be implemented.  


FIFTH PAPER: Run-Time Fault Handling for Job Flow Management in Grid Environments by Dasgupta, Kalayci, Sadjadi, et al. 

This paper focuses on the topic of adding a self-healing behavior to the execution of a job flow without the need to modify the job flow engines or redevelop the job flows themselves. In this paper, the feasibility of a non-intrusive approach to self-healing by inserting a generic proxy to an existing two-level job flow management which employs job flow based on service orchestration at the upper level, and service choreography at the lower level is studied and explained. In other words, this paper simply focuses in adding self-healing characteristics to a job flow management in a non-intrusive manner.  

The authors did a great job at explaining that the interaction of Grid services with dynamic distributed system resources makes fault-tolerance a critical aspect of job flow management. Thus, run time job failures need to be addressed for individual jobs as well as sub jobs. The objective or approach that the writers provided to this problem was to handle failures at runtime, without the need of changing the job flow management. They implemented the TRAP/BPEL framework because this employs the aforementioned approach—it has an intermediate proxy intercepting calls from the flow engine, and deploys the runtime failure handling it on behalf of the workflow. In that way, there are not changes to the workflow during modeling time. I learned that when a failure is detected a recovery component kicks in for any adapted component.  So, the authors explained in detail that during the recovery phase the proxy applies recovery policies to the failed invocations. Then, those policies would contain rules to detect failures and a sequence of recovery actions to follow on the failure detection. Although this paper was short but very informative, I felt as if I did not learn a lot of information about the proxy and how it could be modified and programmed—I wish this paper could be longer and probably cover more details about that. 

Well, this paper was extremely important for me. Not only because I know some of the authors of this paper and I could bug them and ask them all sorts  of questions, but, because the project of this paper is specifically targeting an issue to be solved with self healing. I will surely study this material and ask soe of the authors for a little bit of help if necessary in the design of my autonomic computing self-healing design.

        

Keywords: Bioibformatics, Cluster, Progress report, report

Posted by Camilo Silva | 0 comment(s)

<< Back