Wednesday, January 14, 2009

High Overload High Perfomance Systems (Distributed)

http://www.danga.com/memcached/
memcached is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.

http://www.cs.vu.nl/pub/amoeba/
- The distributed FS is powerful
The Amoeba Distributed Operating System. Amoeba is a powerful microkernel-based system that turns a collection of workstations or single-board computers into a transparent distributed system. It has been in use in academia, industry, and government for about 5 years. It runs on the SPARC (Sun4c and Sun4m), the 386/486, 68030, and Sun 3/50 and Sun 3/60. At the Vrije Universiteit, Amoeba runs on a collection of 80 single-board SPARC computers connected by an Ethernet, forming a powerful processor pool. This equipment is pictured below. It is used for research in distributed and parallel operating systems, runtime systems, languages, and applications.

http://hadoop.apache.org/core/
Distributed file system from apache.
Hadoop is a software platform that lets one easily write and run applications that process vast amounts of data.
Here's what makes Hadoop especially useful:
* Scalable: Hadoop can reliably store and process petabytes.
* Economical: It distributes the data and processing across clusters of commonly available computers. These clusters can number into the thousands of nodes.
* Efficient: By distributing the data, Hadoop can process it in parallel on the nodes where the data is located. This makes it extremely rapid.
* Reliable: Hadoop automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures.
Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS) (see figure below.) MapReduce divides applications into many small blocks of work. HDFS creates multiple replicas of data blocks for reliability, placing them on compute nodes around the cluster. MapReduce can then process the data where it is located.
Hadoop has been demonstrated on clusters with 2000 nodes. The current design target is 10,000 node clusters.
For more information about Hadoop, please see the Hadoop wiki.

http://www.greenplum.com/
Greenplum is redefining the database for Petabyte-scale analytics at breakthrough speeds. Gain competitive advantage and extreme scalability at a lower cost, by managing all of your data on commodity hardware running Greenplum Database.
World’s Most Powerful Analytical Database

1 comment:

Unknown said...

You forgot Aster Data! The world's first and most powerful implementation of in-database mapreduce:

http://www.asterdata.com/product/mapreduce.php