GEXEC installed on the cluster
Caltech GEXEC is a scalable cluster remote execution system which provides fast, RSA authenticated remote execution of parallel and distributed jobs. It provides transparent forwarding of stdin, stdout, stderr, and signals to and from remote processes, provides local environment propagation, and is designed to be robust and to scale to systems over 1000 nodes. Internally, GEXEC operates by building an n-ary tree of TCP sockets and threads between gexec daemons and propagating control information up and down the tree. By using hierarchical control, GEXEC distributes both the work and resource usage associated with massive amounts of parallelism across multiple nodes, thereby eliminating problems associated with single node resource limits (e.g., limits on the number of file descriptors on front-end nodes). An initial release of the software (below) consists of a daemon, a client program, and a library which provides programmatic interface to the GEXEC system.
Please ref http://www.theether.org/gexec/ and http://www.theether.org/authd/ for original document.
For using it, see the following example:
1.
when you log in bitc, and want to submit a job to node12, you can do the following:
# export GEXEC_SVRS=”node12″
# gexec -n 1 <your_command_here>
2. if you want to submit a job to more nodes simutanuously (for example, node 11~13) , you can do the following:
# export GEXEC_SVRS=”node11 node12 node13″
# gexec -n 3 <your_command_here>
You can write a script in this way to submit a series of jobs to a series nodes, also.
For questions, please ask me.
17 Comments
Leave a Reply
You must be logged in to post a comment.
A best electric toothbrush is one that provides you excellent dental care along with proper brushing.
I’ll immediately snatch your rss as I can’t find your e-mail subscription link or newsletter service. Do you’ve any? Please let me know in order that I may just subscribe. Thanks.
I’m not easily irmpessed. . . but that’s impressing me! 🙂
It’s really great that poplee are sharing this information.
I’m quite pelsead with the information in this one. TY!
Check that off the list of tihngs I was confused about.
Ab fab my goldoy man.
This information is off the hoziol!
Valuable info. Lucky me I found your site by accident, I bookmarked it.
But yeah Thanks a lot for making the effort to discuss the following, I feel strongly over it and really enjoy learning more using this topic. If likely, as you develop expertise, would you mind updating your site with more advice? It is extremely ideal for me.
You made some good points there. I did a search on the issue and found most individuals will consent with your blog.
Well I sincerely enjoyed studying it. This tip provided by you is very effective for correct planning.
I think other website proprietors should take this web site as an example , very clean and good user genial design .
Audrie…
When you do the right thing, enjoy it! When you take positive actions, enjoy them. After all, they are leading you toward positive results. Pat yourself on the back. Truly enjoy the fact that you’re making positive progress, and the negative temptatio…
Valid node8 and node10 and node11 thru 18.
Now you can use
# gsh node_name “your_command”
to execute on other nodes without password.
Try it out.
Known problem:
1. not working for interactive command and GUI things;
2. how to know which node to run?
Goal: remote run AFNI sub-job