Saturday 8 September 2012

Technologies involved in distributed computing: Distributed computing and how each technology diff...

Technologies involved in distributed computing: Distributed computing and how each technology diff...: Distributed computing and how each technology differs from one another. Introduction: Distributed computing is the next steps in comp...

Distributed computing and how each technology differs from one another.


Distributed computing and how each technology differs from one another.
Introduction:
Distributed computing is the next steps in computer progress, where computers are not only networked, but also smartly distribute their workload across each computer so that they stay busy and don't squander the electrical energy they feed on. When we combine the concept of distributed computing with the tens of millions of computers connected to the Internet, we’ve got the fastest computer on Earth.
Definition: A field of computer science in which we can study about the distributed systems is referred to as Distributed Computing. It is a method of computer processing in which different parts of a program are run simultaneously on two or more computers that are communicating with each other over a network. In this computing one computer communicate with other to achieve a common target. Distributed computing is a type of segmented or parallel computing, but the latter term is most commonly used to refer to processing in which different parts of a program run simultaneously on two or more processors that are part of the same computer. While both types of processing require that a program be segmented—divided into sections that can run simultaneously, distributed computing also requires that the division of the program take into account the different environments on which the different sections of the program will be running.

                             Fig a: Architecture of distributed computing.

Technologies involved in distributed computing:
It all began with RPC:
RPC stands for Remote Procedure Call (RFC 1831), first distributed computing technology to gain wide spread. The idea behind RPC is to make a function call to a procedure in another process and address space either on the same processor or across the network on another processor without having to deal with the concrete details of how this should be done besides making a procedure call.
Before an RPC call can be made, both the client and the server both have to have stubs for the remote function that are usually generated by an interface definition language (IDL). When an RPC call is made by a client the arguments to the remote function are marshalled and sent across the network and the client waits until a response is sent by the server. There are some difficulties with marshalling certain arguments such as pointers since a memory address on a client is completely useless to the server so various strategies for passing pointers are usually implemented the two most popular being
a.) Disallowing pointer arguments and
b.) Copying what the pointer points at and sending that to the remote function.

An RPC function locates the server in one of two ways:
  • Hard coding the address of the remote server which is extremely inflexible and may require a recompile if that server goes down.
  • Using dynamic binding where various servers export  whatever interfaces/services they support and clients pick which server that they want to use out of those that supports whatever service is needed.
RPC programs, as well as other distributed systems, face a number of problems which are unique to their situation such as
Ø Network packets containing client requests being lost.
Ø Network packets containing server responses being lost.
Ø Client being unable to locate its server.
The need for distributed object and component systems


CORBA

            The Common Object Request Broker Architecture (CORBA) is a standard defined by the Object Management Group (OMG) that enables software components written in multiple computer languages and running on multiple computers to work together (i.e., it supports multiple platforms).A CORBA application usually consists of an Object Request Broker (ORB), a client and a server. An ORB is responsible for matching a requesting client to the server that will perform the request, using an object reference to locate the target object. When the ORB examines the object reference and discovers that the target object is remote, it marshals the arguments and routes the invocation out over the network to the remote object's ORB. The remote ORB then invokes the method locally and sends the results back to the client via the network. There are many optional features that ORBs can implement besides merely sending and receiving remote method invocations including looking up objects by name, maintaining persistent objects, and supporting transaction processing. A primary feature of CORBA is its interoperability between various platforms and programming languages.

The first step in creating a CORBA application is to define the interface for the remote object using the OMG's interface definition language (IDL). Compiling the IDL file will yield two forms of stub files; one that implements the client side of the application and another that implements the server. Stubs and skeletons serve as proxies for clients and servers, respectively. Because IDL defines interfaces so strictly, the stub on the client side has no interacting with the skeleton on the server side, even if the two are compiled into different programming languages, use different ORBs and run on different operating systems.

Then in order to invoke the remote object instance, the client first obtains its object reference via the Orb. To make the remote invocation, the client uses the same code that it would use in a local invocation but use an object reference to the remote object instead of an instance of a local object. When the ORB examines the object reference and discovers that the target object is remote, it marshals the arguments and routes the invocation out over the network to the remote object's ORB instead of to another process within the on the same computer.

CORBA also supports dynamically discovering information about remote objects at runtime. The IDL compiler generates type information for each method in an interface and stores it in the Interface Repository (IR). A client can thus query the IR to get run-time information about a particular interface and then use that information to create and invoke a method on the remote CORBA server object dynamically through the Dynamic Invocation Interface (DII). Similarly, on the server side, the Dynamic Skeleton Interface (DSI) allows a client to invoke an operation of a remote CORBA Server object that has no compile time knowledge of the type of object it is implementing.

CORBA is often considered a superficial specification because it concerns itself more with syntax than with semantics. CORBA specifies a large number of services that can be provided but only to the extent of describing what interfaces should be used by application developers. Unfortunately, the bare minimum that CORBA requires from service providers lacks mention of security, high availability, failure recovery, or guaranteed behaviour of objects outside the basic functionality provided and instead CORBA deems these features as optional. The end result of the lowest common denominator approach is that ORBs vary so wildly from vendor to vendor that it is extremely difficult to write portable CORBA code due to the fact that important features such as transactional support and error recovery are inconsistent across ORBs. Fortunately a lot of this has changed with the development of the CORBA Component Model, which is a superset of Enterprise Java Beans.
         As distributed computing became more widespread, more flexibility and functionality was required than RPC could provide. RPC proved suitable for Two-Tier Client/Server Architectures where the application logic is either in the user application or within the actual database or file server. Unfortunately this was not enough, more and more people wanted a Three-Tier Client/Server Architectures where the application is split into client application (usually a GUI or browser), application logic and data store (usually a database server). Soon people wanted to move to N-tier applications where there are several separate layers of application logic in between the client application and the database server.
         The advantage of N-tier applications is that the application logic can be divided into reusable, modular components instead of one monolithic codebase. Distributed object systems solved many of the problems in RPC that made large scale system building difficult, in much the same way Object Oriented paradigms swept Procedural programing and design paradigms. Distributed object systems make it possible to design and implement a distributed system as a group of reusable, modular and easily deployable components where complexity can be easily managed and hidden behind layers of abstraction.
DCOM/COM+

              Distributed Component Object Model (DCOM)is the distributed version of Microsoft's COM technology which allows the creation and use of binary objects/components from languages other than the one they were originally written in, it currently supports Java(J++),C++, Visual Basic, JScript, and VBScript. DCOM works over the network by using proxy's and stubs. When the client instantiates a component whose registry entry suggests that it resides outside the process space, DCOM creates a wrapper for the component and hands the client a pointer to the wrapper. This wrapper, called a proxy, simply marshals methods calls and routes them across the network. On the other end, DCOM creates another wrapper, called a stub, which unmarshals methods calls and routes them to an instance of the component.

DCOM servers object can support multiple interfaces each representing a different behaviour of the object. A DCOM client calls into the exposed methods of a DCOM server by acquiring a pointer to one of the server object's interfaces. The client object can the invoke the server object's exposed methods through the acquired interface pointer as if the server object resided in the client's address space.
Java RMI

                   Method Invocation (RMI) is a technology that allows the sharing of Java objects between Java Virtual Machines (JVM) across a network. An RMI application consists of a server that creates remote objects that conform to a specified interface, which are available for method invocation to client applications that obtain a remote reference to the object. RMI treats a remote object differently from a local object when the object is passed from one virtual machine to another. Rather than making a copy of the implementation object in the receiving virtual machine, RMI passes a remote stub for a remote object. The stub acts as the local representative, or proxy, for the remote object and basically is, to the caller, the remote reference. The caller invokes a method on the local stub, which is responsible for carrying out the method call on the remote object. A stub for a remote object implements the same set of remote interfaces that the remote object implements. This allows a stub to be cast to any of the interfaces that the remote object implements. However, this also means that only those methods defined in a remote interface are available to be called in the receiving virtual machine.
              RMI provides the unique ability to dynamically load classes via their byte codes from one JVM to the other even if the class is not defined on the receiver's JVM. This means that new object types can be added to an application simply by upgrading the classes on the server with no other work being done on the part of the receiver. This transparent loading of new classes via their byte codes is a unique feature of RMI that greatly simplifies modifying and updating a program.
          The first step in creating an RMI application is creating a remote interface. A remote interface is a subclass of java.rmi.Remote, which indicates that it is a remote object whose methods can be invoked across virtual machines. Any object that implements this interface becomes a remote object.
           To show dynamic class loading at work, an interface describing an object that can be serialized and passed from JVM to JVM shall also be created. The interface is a subclass of the java.io.Serializable interface. RMI uses the object serialization mechanism to transport objects by value between Java virtual machines. Implementing Serializable marks the class as being capable of conversion into a self-describing byte stream that can be used to reconstruct an exact copy of the serialized object when the object is read back from the stream. Any entity of any type can be passed to or from a remote method as long as the entity is an instance of a type that is a primitive data type, a remote object, or an object that implements the interface java.io.Serializable. Remote objects are essentially passed by reference. A remote object reference is a stub, which is a client-side proxy that implements the complete set of remote interfaces that the remote object implements. Local objects are passed by copy, using object serialization. By default all fields are copied, except those that are marked static or transient. Default serialization behaviour can be overridden on a class-by-class basis.
          Thus clients of the distributed application can dynamically load objects that implement the remote interface even if they are not defined in the local virtual machine. The next step is to implement the remote interface, the implementation must define a constructor for the remote object as well as define all the methods declared in the interface Once the class is created, the server must be able to create and install remote objects. The process for initializing the server includes; creating and installing a security manager, creating one or more instances of a remote object, and registering at least one of the remote objects with the RMI remote object registry, for bootstrapping purposes. An RMI client behaves similarly to a server; after installing a security manager, the client constructs a name used to look up a remote object. The client uses the Naming.lookup method to look up the remote object by name in the remote host's registry. When doing the name lookup, the code creates a URL that specifies the host where the server is running.

Applications of Distributed
Computing:                                                                                                                       There are many applications of Distributed Computing which help us in every walk of life and in all types of communications networks. Important applications are given below
The telecommunication has a vital role in our life because it is based on distributed computing such as Telecommunication and cellular networks, computer network such as internet and different types of wireless networks.
1.     Airplane control towers and different type of industrial applications are also based on such computing.
2.     Different kind of network applications are also based on this type of computing such as Www (World Wide Web), peer to peer networks, Distributed databases and many more.
3.     Distributed computing is also used with parallel computation in different applications such as. Scientific Computing and Data rendering in distributed graphics etc.
Advantages of Distributed
Computing:                                                                                                  1:Incremental growth:                                                                                            Computing power can be added in small
increments
2: Reliability:                                                                                                                   If one machine crashes, the system as a whole can still survive
3: Speed:
       A distributed system may have more total computing power than a mainframe
4: Open system:
       This is the most important point and the most characteristic point of a distributed system.
Since it is an open system it is always ready to communicate with other systems.
An open system that scales has an advantage over a perfectly closed and self-contained system. 
5: Economic:
       It is economic and Microprocessors offer a better price/performance than mainframes 

Disadvantages of Distributed
Computing:
1: As previously mentioned distributed systems will have an
inherent security issue.
2: Networking:  If the network gets saturated then problems with
transmission will surface.
3: Software:   There is currently very little less software support for
Distributed system.
4: Troubleshooting:    Troubleshooting and diagnosing problems in a
distributed system can also become more difficult, because the
analysis may require connecting to remote nodes or inspecting
communication between nodes.
Other forms of Computing:
Apart from Distributed Computing there are other forms of computing like Grid computing, Parallel  computing, Mobile computing, Cloud computing, Cluster computing, utility computing etc.
Grid Computing:
A computational Grid is a hardware and software infrastructure that provides dependable, consistent, pervasive and inexpensive access to high end computational capabilities. On the other hand, grid computing has some extra characteristics compared to distributed computing. Grid is a software environment that makes it possible to share disparate, loosely coupled IT resources across organizations and geographies. Using a grid, your IT resources are freed from their physical boundaries and offered as services. These resources include almost any IT component – computer cycles, storage spaces, databases, applications, files, sensors, or scientific instruments.
In grid computing, resources can be dynamically provisioned to your users or applications that need them. Resources can be shared within a workgroup or department, across different organizations and geographies, or outside your enterprise.  Grid is concerned to efficient utilization of a pool of heterogeneous systems with optimal workload management utilizing an enterprise's entire computational resources ( servers, networks, storage, and information) acting together to create one or more large pools of computing resources. There is no limitation of users, departments or originations in grid computing.
The Globus Toolkit, currently at version 5, is an open source toolkit for building computing grids developed and provided by the Globus Alliance. The Globus project provides open source software toolkit that can be used to build computational grids and grid based applications. It allows sharing of computer power, databases, and other resources securely across corporate, institutional and geographic boundaries without sacrificing local autonomy.
                                              Fig b: Architecture of GTK
Parallel Computing:
The simultaneous use of more than one processor or computer to solve a problem is called as Parallel Computing. Parallel computations can be performed on shared-memory systems with multiple CPUs, or on distributed-memory clusters made up of smaller shared-memory systems or single-CPU systems.  Coordinating the concurrent work of the multiple processors and synchronizing the results are handled by program calls to parallel libraries. Parallel Computing Toolbox lets you solve computationally and data-intensive problems using multicore processors, GPUs, and computer clusters. High-level constructs—parallel for-loops, special array types, and parallelized numerical algorithms—let you parallelize MATLAB applications without CUDA or MPI programming.
The toolbox provides twelve workers (MATLAB computational engines) to execute applications locally on a multicore desktop. Without changing the code, you can run the same application on a computer cluster or a grid computing service (using MATLAB Distributed Computing Server). You can run parallel applications interactively or in batch.
                                      Fig b: Architecture of parallel computing

Mobile Computing:
Mobile Computing is a generic term used to refer to a variety of devices that allow people to access data and information from where ever they are. Ex: mobile device. Mobile communication allows transmission of voice and multimedia data via a computer or a mobile device without having connected to any physical or fixed link.  Mobile communication is evolving day by day and has become a must have for everyone. Mobile communication is the exchange of voice and data using a communication infrastructure at the same time regardless of any physical link.  This journey of mobile communication technology started from late nineties with the 1st generation mobile technology has now reached till 3rd generation totally changes the canvas of communication mode.
There are so many types of mobile computers, such as laptops, PDAs, PDA phones and other mobility devices were introduced in the mid of 1990s including wearable technology as well. And to use these types of mobility equipment’s we need to use right technology to make it more secure and reliable infrastructure.                           If we talk about the mobile communication technologies we can count on many mobile technologies available today such as 2G3G4GWiMAXWibro, EDGE, GPRS and many others.

Cloud Computing:
Cloud computing is a computing paradigm shift where computing is moved away from personal computers or an individual application server to a “cloud” of computers. Users of the cloud only need to be concerned with the computing service being asked for, as the underlying details of how it is achieved are hidden. This method of distributed computing is done through pooling all computer resources together and being managed by software rather than a human. It involves delivering hosted services over the Internet. These services are broadly divided into three categories: Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS) and Software-as-a-Service (SaaS).
Cloud computing is essentially infrastructure as a service. Using resources from distributed servers/computers, cloud computing frees you from the configuration of local servers. The cloud is another movement towards making computing a service that can be adjusted based on the needs of a business. For example, for a business in retail that might need to utilize a large number of servers during the holiday seasons, but not nearly as many after the holiday seasons, cloud computing is helpful by allowing the business to aggregate server usage based on their needs.
Cloud computing has been gaining momentum, particularly in the SaaS environment. Organizations are quickly realizing the value in cutting costs through monthly subscriptions to software, rather than outright purchasing it and having it become obsolete. When newer versions of software become available, companies need to repurchase and sometimes increase the amount of infrastructure needed to support the new software. With cloud-hosted services, it is possible to utilize enterprise-level software through subscription, without worrying about the cost of infrastructure or upgrading to the latest version.
                                         Fig c: Architecture of cloud computing

Benefits of Cloud Computing:
Cloud Computing is helpful in multiple ways. The primary benefits are that:
1) Time to market is significantly reduced.
2) Overall cost savings in maintenance costs from having business-owned servers to maintain. Cloud computing services also generally have a much higher guarantee for up-time that a smaller company running its own servers usually cannot match.
3) Organizations can cut software licensing and administrative costs through utilizing online services in the cloud, such as SharePoint, Exchange Server, and Office Communications Server.
Cloud computing is ideal for a small to medium size business that has large spikes in infrastructure needs at varying times during the year, rather than a consistent need that in-house servers could sufficiently fulfil. Services available include Windows Azure, Amazon EC2, and Go Grid.

Cluster Computing:
A computer cluster is a group of linked computers working together closely so that in many respects they form a single computer. The components of a cluster are commonly, but not always, connected to each other through fast local area networks. Clusters are usually deployed to improve performance and/or availability over that provided by a single computer, while typically being much more cost-effective than single computers of comparable speed or availability. Clustering has been available since the 1980s when it was used in DEC's VMS systems. IBM's Sysplex is a cluster approach for a mainframe system. Microsoft, Sun Microsystems, and other leading hardware and software companies offer clustering packages that are said to offer scalability as well as availability. As traffic or availability assurance increases, all or some parts of the cluster can be increased in size or number.
Cluster computing can also be used as a relatively low-cost form of parallel processing for scientific and other applications that lend themselves to parallel operations.
Utility computing:
Conventional Internet hosting services have the capability to quickly arrange for the rental of individual servers, for example to provision a bank of web servers to accommodate a sudden surge in traffic to a web site.
“Utility computing” usually envisions some form of virtualization so that the amount of storage or computing power available is considerably larger than that of a single time-sharing computer. Multiple servers are used on the “back end” to make this possible. These might be a dedicated computer cluster specifically built for the purpose of being rented out, or even an under-utilized supercomputer. The technique of running a single calculation on multiple computers is known as distributed computing.
Amazon Web Services (AWS), despite a recent outage, is the current poster child for this model as it provides a variety of services, among them the Elastic Compute Cloud (EC2), in which customers pay for compute resources by the hour, and Simple Storage Service (S3), for which customers pay based on storage capacity. The main benefit of utility computing is better economics. Corporate data centres are notoriously underutilized, with resources such as servers often idle 85% of the time. Utility computing allows companies to only pay for the computing resources they need, when they need them.