Wednesday, January 16, 2008

What is a cluster?

WHAT IS A CLUSTER?
A cluster is a group of two or more loosely coupled computers that cooperate in such a way that they behave like a single computer. The idea is to take advantage of the high performance processing power of personal computers. By many measures including, computational speed, size of main memory, available disk space and bandwidth, the PC of today is more powerful than the supercomputers of the past. By harnessing the power of multiple such low cost processing elements, we can create a powerful supercomputer.
A cluster is made up of nodes, each with one or more processors, shared memory and peripheral devices (such as disks) connected by a network that allows data to move between the nodes. A cluster is a pretty powerful setup even if old computers are used.
Clusters can be
- DIY (assembled by the user out of COTs –commercial off the shelf components)
- Prepackaged (assembled by a vendor)

WHY IS A CLUSTER USED?
A cluster is can be used for
1. Parallel Processing – to make applications run faster by running parallel processes; such clusters are called High Performance Clusters.
Applications that require greater performance than that available through a single processor can utilize the power of these clusters. The reasons that an application desires higher computational power include
– Real time constraints: computation must be finished within a certain period of time (e.g. – weather forecasting)
- Need for a higher throughput: when a single processor would require days or even years to complete a calculation. e.g. – Google uses over 15,000 commodity PCs with fault tolerant software to provide a high performance web search service.
- Larger Memory requirements – application needs to handle huge amounts of data.

2. Fault Tolerance – such clusters operate by having redundant nodes, which can continue to provide service when system components fail; also called High-Availability Clusters. The failure of a single component would only reduce the cluster’s power. If the primary node in a high-availability cluster fails, it is replaced by a secondary node that has been waiting for that moment. That secondary node is usually a mirror image of the primary node, so that when it does replace the primary, it can completely take over its identity and thus keep the system environment consistent from the user's point of view. Such clusters are good choices for environments that require guarantees of high processing power, such as Web servers. e.g.- Linux HA project

3. Load Balancing – A front end load balancer distributes workload to back end platforms. Load-balancing clusters provide a more practical system for business needs. As the name implies, this system entails sharing the processing load as evenly as possible across a cluster of computers. The load could be in the form of an application processing load or a network traffic load that needs to be balanced. Such a system is perfectly suited for large numbers of users running the same set of applications. E.g.- Linux Virtual Server project


WHAT ARE BEOWULF CLUSTERS?

The best-known type of Linux-based cluster is the Beowulf cluster. A Beowulf cluster consists of multiple machines on a local area network that pool resources to solve computing tasks. In order for this cooperation to take place, special cluster-enabled software applications must be written using clustering libraries. The most popular clustering libraries are Parallel Virtual Machine (PVM) and Message Passing Interface (MPI), and they are both very mature and work very well. By using PVM or MPI, programmers can create cluster-enabled applications that are able to take advantage of an entire cluster's computing resources, rather than being bound to a single machine.

WHAT ARE MOSIX CLUSTERS?
This kind of clustering technology is easy to set up and can provide an immediate benefit. MOSIX works in a fundamentally different way from PVM or MPI, extending the kernel so that any standard Linux process can take advantage of a cluster's resources. By using special adaptive load-balancing techniques, processes running on one node in the cluster can be transparently "migrated" to another node where they may execute faster. Because MOSIX is transparent, the process that's migrated doesn't even "know" (or need to know) that it is running on a remote system. As far as that remote process and other processes running on the original node (called the "home node") are concerned, the process is running locally.
Because MOSIX is completely transparent, no special programming is required to take advantage of MOSIX's load-balancing technology. In fact, a default MOSIX installation will automatically migrate processes to the "best" node without any user intervention. For example, if an application is designed to fork() many child processes which perform work, then MOSIX will be able to migrate each one these processes as needed. So, if you wanted to compress 13 digital audio tracks simultaneously (as separate processes), then MOSIX will allow you to immediately benefit from the power of your cluster. If, however, you were to run the 13 encoding processes linearly (one after another), then you wouldn't see any speedup at all.

WHAT IS OSCAR?

Open Source Cluster Application Resource software package allows installation of a HPC cluster and is easier to install than Beowulf.

Friday, January 11, 2008

FOSS - Benefits for users

This philosophy of FOSS results in tremendous benefits to users. These include:
• Reduced Cost - The cost of FOSS software remains fixed even when the number of users increases. For example, if you use GNU/Linux as your operating system, in your office, the same CD can be duplicated and distributed to all users. However, if you opt for a “licensed” Microsoft Operating System, it would have to be purchased for each user’s desktop.
• Reliability/ Stability - Free software is the combined result of the experience and the intelligence of all the participants. Its reliability increases as time passes, with all the corrections which are made.
• Portability - This quality is not intrinsic to free software, but is very often seen in free software. If software meets success, it will necessarily be adapted to other environments than those initially considered.
• Performance - Resulting from a lot of examinations, the use of algorithms coming from advanced research works, as well as tested by various usages, free software have good performance characteristics by nature.
• Interoperability - The support in Linux, for example, of a lot of network protocols, file system formats, and even binary compatibility modes assures a good interoperability.
• Reactivity - Rapid solution of corrections to a given problem.
• Security – Transparency of source code results in faster identification and fixing of bugs and security loopholes.

FOSS is relevant and important in education institutions because access to the source code of the programs allows students to explore internals of complex systems and hence acquire a deeper understanding of what they study. For example, students learn about operating system concepts by checking out the code which implements similar functionalities in Linux. Today, most educational programs require access to a lot of computing software resources, e.g., Matlab, circuit simulators, drawing packages, etc. Mostly proprietary solutions are used by institutions, costing lakhs of Rupees in license fee. FOSS solutions are available in many areas, with the commonly used licensing terms for distribution and modification, and in almost all cases, at zero cost.

There is no dearth of FOSS software. FOSS systems and tools include - Linux and BSD Operating Systems, OpenOffice Writer for word processing, Open Office Math for mathematical equations, Moodle for a Learning Management System, audacity for audio editing, blender for 3-D animation/ rendering, gimp for photo editing, Scilab for Scientific Applications, Beowulf, Mosix for distributed computing, the list goes on and on.

“To FLOSS, or not to FLOSS - that is the question”

Hey, don’t get me wrong. I am not about to launch into an essay on dental hygiene. Instead, let’s get introduced to “Free/ Libre Open Source Software” – also abbreviated as FLOSS. In the computer world, FLOSS or FOSS (Free Open Source Software) is software whose license give users the freedom to run the program for any purpose, to study and modify the source code, and to redistribute copies of either the original or modified program– all legally, without paying any royalty to previous developers!!
Source Code is the human readable language in which a computer program is written. The packaged and installed source code on your computer exists as a “binary” or an “executable” form. Conventional, proprietary software just gives you the binary and keeps the source code a secret, lest anybody else learn something from it! This means that the software comes to you like a black box. Its internals – the nuts and bolts that make it are hidden from users. It’s as if you are sold a car that you can drive, but one where you can’t look under the hood. With FOSS software, the source code is distributed along with the binary. Not only are you encouraged to look at the engine that runs the software, you are free to customize it according to your needs. This is what is meant by “open source”.
The free part of FOSS does not necessarily mean “free of charge”. FOSS applications are free in the sense that they do not charge a licensing fee for usage. Most software is usually free to download or available at nominal costs, much cheaper than similar proprietary software.
So why FOSS? What are the benefits of FOSS? Innovations are often created by combining pre-existing components in novel ways, which generally requires that users be able to modify those components. FOSS software gives innovators this freedom to experiment without spending time on reinventing the wheel from scratch. FOSS software gives you freedom – freedom to run it as you wish, freedom to make changes to it and freedom to help your community, so that others can benefit from your work.
The FOSS movement is spearheaded by Richard Stallman and the Free Software Foundation (FSF). Stallman’s ideas about FOSS took birth when the MIT Artificial Intelligence Lab (AI Lab) where he was working was not able to customize a Xerox printer to suit his needs because Xerox refused to share their precious source code with him. Here’s how he gives an interesting analogy to put forth his thoughts about proprietary work.
Imagine what it would be like if recipes were hoarded in the same fashion as software. You might say, “How do I change this recipe to take out the salt?” and the great chef would respond, “How dare you insult my recipe, the child of my brain and my palate, by trying to tamper with it? You don't have the judgment to change my recipe and make it work right!”
“But my doctor says I'm not supposed to eat salt! What can I do? Will you take out the salt for me?”
“I would be glad to do that; my fee is only $50,000.” Since the owner has a monopoly on changes, the fee tends to be large. “However, right now I don't have time. I am busy with a commission to design a new recipe for ship's biscuit for the Navy Department. I might get around to you in about two years.”
His experience with Xerox prompted Richard Stallman to propose the following for software:
• Liberty - every user should be free to copy, diffuse, modify a program, either to share it with others, or to adapt it to his own needs.
• Equality - every person should have the same rights on the software.
• Fraternity - the whole computing community should be encouraged to cooperate and thus to produce software that is more reliable and useful to all.
• Source code access – that should allow the understanding, adaptation, correction, distribution, improvement of the software.