суббота, 21 мая 2011 г.

Why we’re using HBase

Why we’re using HBase: Part #1
Why we’re using HBase: Part #2

ZooKeeper - A Reliable, Scalable Distributed Coordination System

ZooKeeper - A Reliable, Scalable Distributed Coordination System

Abstract
ZooKeeper is a high available and reliable coordination system. Distributed applications use ZooKeeper to store and mediate updates key configuration information. ZooKeeper can be used for leader election, group membership, and configuration maintenance. In addition ZooKeeper can be used for event notification, locking, and as a priority queue mechanism. It's a sort of central nervous system for distributed systems where the role of the brain is played by the coordination service, axons are the network, processes are the monitored and controlled body parts, and events are the hormones and neurotransmitters used for messaging. Every complex distributed application needs a coordination and orchestration system of some sort, so the ZooKeeper folks at Yahoo decide to build a good one and open source it for everyone to use.

P.S. So ZooKeeper is open-source realization (from Yahoo) of "close-source" Chubby lock service (from Google).

понедельник, 16 мая 2011 г.

Failure Trends in a Large Disk Drive Population

Failure Trends in a Large Disk Drive Population

Why Existing Databases (RAC) are So Breakable!

Why Existing Databases (RAC) are So Breakable!

Some high-scalability/architecture links

Here

Notes on Google Megastore

Notes on Google Megastore

More notes

Google Megastore

P.S. Megastore is the data engine supporting the Google Application Engine.

Designing a Scalable Twitter

Designing a Scalable Twitter

Are Cloud Based Memory Architectures the Next Big Thing?

Who are the Major Players in this Space?:

With that bit of background behind us, there are several major players in this space (in alphabetical order):
Coherence - is a peer-to-peer, clustered, in-memory data management system. Coherence is a good match for applications that need write-behind functionality when working with a database and you require multiple applications have ACID transactions on the database. Java, JavaEE, C++, and .NET.
GemFire - an in-memory data caching solution that provides low-latency and near-zero downtime along with horizontal & global scalability. C++, Java and .NET.
GigaSpaces - GigaSpaces attacks the whole stack: Compute Grid, Data Grid, Message, Colocation, and Application Server capabilities. This makes for greater complexity, but it means there's less plumbing that needs to be written and developers can concentrate on writing business logic. Java, C, or .Net.
GridGain - A compute grid that can operate over many data grids. It specializes in the transparent and low configuration implementation of features. Java only.
Terracotta - Terracotta is network-attached memory that allows you share memory and do anything across a cluster. Terracotta works its magic at the JVM level and provides: high availability, an end of messaging, distributed caching, a single JVM image. Java only.
WebSphere eXtreme Scale. Operates as an in-memory data grid that dynamically caches, partitions, replicates, and manages application data and business logic across multiple servers.

Q&A with Tangosol's Cameron Purdy and Peter Utzschneider

Q&A with Tangosol's Cameron Purdy and Peter Utzschneider:

"Their functionality and the data that they depend on must all be continuously available, surviving failures of servers, networks and even data centers."

"We can now detect a remote garbage collection within milliseconds, and dynamically reshape the cluster's traffic to avoid overheating that node."

"The self-tuning algorithms, flow control and traffic shaping features have propelled us years ahead of the technology curve, and help to explain how Coherence is the lowest latency, highest throughput and most scalable clustered data management system in existence."

"This 3.2 release is also notable because it is the first release that we have focused specifically on cluster latency, largely related to specific requests from telecom and financial services customers."

"In my experience, switching to Infiniband form GigE is a big win. But also, I've seen a strictly linear, by-the-ruler, scaling to nearly 400 cache nodes"