вторник, 28 декабря 2010 г.

Consistent hashing

Consistent hashing is a scheme that provides hash table functionality in a way that the addition or removal of one slot does not significantly change the mapping of keys to slots. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped. By using consistent hashing, only K/n keys need to be remapped on average, where K is the number of keys, and n is the number of slots.

Wiki: Consistent hashing

Theory: Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web

Practice: Web caching with consistent hashing

Finally, you may not know this, but you use consistent hashing every time you put something in your cart at Amazon.com. Their massively scalable data store, Dynamo, uses this technique. Or if you use Last.fm, you’ve used a great combination: consistent hashing + memcached.

понедельник, 27 декабря 2010 г.

DarkStar #1: Dynamic Adaptation of User Migration Policies in Distributed Virtual Environments

Dynamic Adaptation of User Migration Policies in Distributed Virtual Environments

Abstract

A distributed virtual environment (DVE) consists of multiple network nodes (servers), each of which can host many users that consume CPU resources on that node and communicate with users on other nodes. Users can be dynamically migrated between the nodes, and the ultimate goal for the migration policy is to minimize the average system response time perceived by the users. In order to achieve this, the user migration policy should minimize network communication while balancing the load among the nodes so CPU resources of the individual nodes are not overwhelmed. This paper considers a multiplayer online game as an example of a DVE and presents an adaptive distributed user migration policy, which uses Reinforcement Learning to tune itself and thus minimize the average system response time perceived by the users. Performance of the self tuning policy was compared on a simulator with the standard benchmark non-adaptive migration policy and with the optimal static user allocation policy in a variety of scenarios, and the self-tuning policy was shown to greatly outperform both benchmark policies, with performance difference increasing as the network became more overloaded. These results provide yet another demonstration of the power and generality of the methodology for designing adaptive distributed and scalable migration policies, which has already been applied successfully to several other domains [17, 18].

P.S. Wiki:Project Darkstar

DarkStar #0: Scalable Data Storage in Project Darkstar

Scalable Data Storage in Project Darkstar

Abstract
We present a new scheme for building scalable data storage for Project Darkstar, an infrastructure for building online games and virtual worlds. The approach promises to provide data storage with horizontal scaling that is tailored to the special requirements of online environments and that takes advantage of modern multi-core architectures and high throughput networking.
After a brief overview of Project Darkstar, we describe the overall architecture for a caching data store. Then we provide more detail on the individual components used in the solution. Finally, we suggest some of the additional facilities that will be required to bring the full experiment to completion.

P.S. Wiki:Project Darkstar

пятница, 10 декабря 2010 г.

Some words about distributed frameworks

Some words about distributed frameworks in post Characterizing Enterprise Systems using the CAP theorem.
* RDBMS
* Amazon Dynamo
* Terracotta
* Oracle Coherence
* GigaSpaces
* Cassandra
* CouchDB
* Voldemort
* Google BigTable

STM in Scala

Scala 2.9, возможно, будет включать STM.
Reference implementation основана на CCSTM.

P.S. CCSTM: A Library-Based STM for Scala

понедельник, 29 ноября 2010 г.

[PJP]: Dynamic Class Loading in the Java Virtual Machine

Sheng Liang and Gilad Bracha, "Dynamic Class Loading in the Java Virtual Machine", ACM OOPSLA'98, pp.36-44, 1998.

Abstract
Class loaders are a powerful mechanism for dynamically loading software components on the Java platform. They are unusual in supporting all of the following features: laziness, type-safe linkage, user-defined extensibility, and multiple communicating namespaces.
We present the notion of class loaders and demonstrate some of their interesting uses. In addition,we discuss howto maintain type safety in the presence of user-defined dynamic class loading.

P.S. Найдено в tutorial on Javassist.
P.P.S. Интересный пример:
MyClassLoader myLoader = new MyClassLoader();
Class clazz = myLoader.loadClass("Box");
Object obj = clazz.newInstance();
Box b = (Box)obj; // this always throws ClassCastException.
Так как не до конца определено, кто же загрузчик класса Box (в той части, где - 'Box b = (Box)...').

среда, 27 октября 2010 г.

An Approach to Internal Domain-Specific Languages in Java

An Approach to Internal Domain-Specific Languages in Java
from author of Fest-Assert [here, here].

P.S. Fest-Assert is good example of Internal DSL in Java. In some sence is competitor of Hamcrest.

P.P.S. Yeep, i like Internal DSL and wrote about it: here about ApacheCamel, here about Hamcrest.

четверг, 7 октября 2010 г.

TPC history

Some notes on TPC history.
With tho articles:
1985: A Measure of Transaction Processing Power
2005: Thousands of DebitCredit Transactions-Per-Second: Easy and Inexpensive

P.S. TPC from tpc.org is the most popular transaction performance test.

вторник, 5 октября 2010 г.

An Analysis of Linux Scalability to Many Cores

An Analysis of Linux Scalability to Many Cores

ABSTRACT
This paper analyzes the scalability of seven system applications (Exim, memcached, Apache, PostgreSQL, gmake, Psearchy, and MapReduce) running on Linux on a 48-core computer. Except for gmake, all applications trigger scalability bottlenecks inside a recent Linux kernel. Using mostly standard parallel programming techniques — this paper introduces one new technique, sloppy counters — these bottlenecks can be removed from the kernel or avoided by changing the applications slightly. Modifying the kernel required in total 3002 lines of code changes. A speculative conclusion from this analysis is that there is no scalability reason to give up on traditional operating system organizations just yet.

P.S. From MIT reseachers

четверг, 30 сентября 2010 г.

Набираем программистов

Господа и дамы, GridDynamics в Харькове открыл несколько позиций по J2EE всех уровней (стажер, junior, middle, senior, architect).

Кто заинтересован встретится со мной за чаем и узнать на какие проекты берем, стек технологий и т.д. - пишите на Golovach.Ivan@gmail.com.

P.S. Все проекты под кластер(10-100 машин) и с использованием In Memory Data Grid.
P.P.S. Особо жду тех, кто осилил в прошлом году мои лекции:)
P.P.P.S. Стажеров и juniors менторить буду, скорее всего, я.
P.P.P.P.S. Большинство заказчиков из Штатов/Калифорнии (eBay, PayPal, Macys, Cisco, Bank of America), так что есть возможность передать привер Арни лично:)

понедельник, 20 сентября 2010 г.

Erlang vs. Scala

Erlang vs. Scala

воскресенье, 12 сентября 2010 г.

Размерность частично-упорядоченных множеств

Теорема
Порядковая и мультипликативная размерность частично-упорядоченного множества совпадают.

P.S. Размерность частично-упорядоченного множества можно интерпретировать
- как степень параллельности процессов в системе
- как размерность времени системы
(детальнее смотри в книге на стр. 58).

пятница, 10 сентября 2010 г.

[Book]: Великолепная книга по распределенным системам - "Distributed Computing - Principles, Algorithms, and Systems"

Великолепная книга по распределенным системам - "Distributed Computing - Principles, Algorithms, and Systems".

Докладываю на IT Jam.

Докладываю на IT Jam в Guru section по теме "The principal limitations on the scalability of distributed systems".

четверг, 9 сентября 2010 г.

Software Transactional Memory for Large Scale Clusters

Software Transactional Memory for Large Scale Clusters

Abstract
While there has been extensive work on the design of software transactional memory (STM) for cache coherent shared memory systems, there has been no work on the design of an STM system for very large scale platforms containing potentially thousands of nodes. In this work, we present Cluster-STM, an STM designed for high performance on large-scale commodity clusters. Our design addresses several novel issues posed by this domain, including aggregating communication, managing locality, and distributing transactional metadata onto the nodes. We also re-evaluate several STMdesign choices previously studied for cache-coherent machines and conclude that, in some cases, different choices are appropriate on clusters. Finally, we show that our design scales well up to 512 processors. This is because on a cluster, the main barrier to STMscalability is the remote communication overhead imposed by the STM operations, and our design aggregates most of that communication with the communication of the underlying data.

HPCS languages: X10(IBM), Fortress(Sun), Chapel(Cray)

Beyond Locks and Messages: The Future of Concurrent Programming

Conclusions
Here’s the tongue-in-cheek summary of the trends which, if you believe that the HPCS effort provides a glimpse of the future, will soon be entering the mainstream:
1. Threads are out (demoted to latency controlling status), tasks (and semi-implicit parallelism) are in.
2. Message passing is out (demoted to implementation detail), shared address space is in.
3. Locks are out (demoted to low-level status), transactional memory is in.

понедельник, 6 сентября 2010 г.

[PJP]: Web Services Are Not Distributed Objects

"Web Services Are Not Distributed Objects"
Werner Vogels, Cornell University

Abstract
Web services are frequently described as the latest incarnation of distributed object technology. This misconception, perpetuated by people from both industry and academia, seriously limits broader acceptance of the true Web services architecture. Although the architects of many distributed and Internet systems have been vocal about the differences between Web services and distributed objects, dispelling the myth that they are closely related appears difficult.
Many believe that Web services is a distributed systems technology that relies on some form of distributed object technology. Unfortunately, this is not the only common misconception about Web services. In this article, I seek to clarify several widely held beliefs about the technology that are partially or completely wrong.

понедельник, 30 августа 2010 г.

SEDA: An Architecture for WellConditioned, Scalable Internet Services

SEDA: An Architecture for WellConditioned, Scalable Internet Services

Abstract
We propose a new design for highly concurrent Internet services, which we call the staged event-driven architecture (SEDA). SEDA is intended to support massive concurrency demands and simplify the construction of well-conditioned services. In SEDA, applications consist of a network of event-driven stages connected by explicit queues. This architecture allows services to be well-conditioned to load, preventing resources from being overcommitted when demand exceeds service capacity. SEDA makes use of a set of dynamic resource controllers to keep stages within their operating regime despite large fluctuations in load. We describe several control mechanisms for automatic tuning and load conditioning, including thread pool sizing, event batching, and adaptive load shedding. We present the SEDA design and an implementation of an Internet services platform based on this architecture. We evaluate the use of SEDA through two applications: a high-performance HTTP server and a packet router for the Gnutella peer-to-peer file sharing network. These results show that SEDA applications exhibit higher performance than traditional service designs, and are robust to huge variations in load.

четверг, 12 августа 2010 г.

List of important publications in theoretical computer science

List of important publications in theoretical computer science:

# 1 Computability
# 2 Computational complexity theory
# 3 Algorithms
# 4 Algorithmic information theory
# 5 Information theory
# 6 Formal verification

List of important publications in computer science

List of important publications in computer science:

# 1 Theoretical computer science
# 2 Numerical Analysis
# 3 Concurrent, parallel, and distributed computing
# 4 Networks and security
# 5 Computational linguistics
# 6 Operating systems
# 7 Databases
# 8 Information retrieval
# 9 Artificial intelligence
# 10 Machine learning
# 11 Computer vision
# 12 Compilers
# 13 Software engineering
# 14 Programming languages
# 15 Computer architecture
# 16 Computer graphics
# 17 History of computation
# 18 Teaching in Computer science
# 19 Computer science humor
# 20 Collaborative Networks

List of important publications in concurrent, parallel, and distributed computing

List of important publications in concurrent, parallel, and distributed computing with links to original papers.
If no link - you can find document by using Google Academia Search.

Contents
- Consensus, synchronisation, and mutual exclusion
- Fault-tolerance
- Distributed graph algorithms
- Foundations of distributed systems

The Topological Structure of Asynchronous Computability

The Topological Structure of Asynchronous Computability
MAURICE HERLIHY
AND
NIR SHAVIT

Abstract.
We give necessary and sufficient combinatorial conditions characterizing the class of decision tasks that can be solved in a wait-free manner by asynchronous processes that communicate by reading and writing a shared memory.
We introduce a new formalism for tasks, based on notions from classical algebraic and combinatorial topology, in which a task’s possible input and output values are each associated with highdimensional geometric structures called simplicial complexes. We characterize computability in terms of the topological properties of these complexes. This characterization has a surprising geometric interpretation: a task is solvable if and only if the complex representing the task’s allowable inputs can be mapped to the complex representing the task’s allowable outputs by a function satisfying certain simple regularity properties.

вторник, 27 июля 2010 г.

CustomConcurrentHashMap - A framework for concurrent hash map implementations.

com.google.inject.internal.CustomConcurrentHashMap:
"A framework for concurrent hash map implementations."
"Retrieval operations (including java.util.Map.get(java.lang.Object)) generally do not block, so may overlap with update operations (including java.util.Map.put(java.lang.Object, java.lang.Object) and java.util.Map.remove(java.lang.Object))."
"Retrievals reflect the results of the most recently completed update operations holding upon their onset."
"For example, the following strategy emulates the behavior of java.util.concurrent.ConcurrentHashMap: ..."

суббота, 24 июля 2010 г.

What Do High-Level Memory Models Mean for Transactions?

What Do High-Level Memory Models Mean for Transactions?

Abstract
Many people have proposed adding transactions, or atomic blocks, to type-safe high-level programming languages. However, researchers have not considered the semantics of transactions with respect to a memory model weaker than sequential consistency. The details of such semantics are more subtle than many people realize, and the interaction between compiler transformations and transactions could produce behaviors that many people find surprising. A language’s memory model, which determines these interactions, must clearly indicate which behaviors are legal, and which are not. These design decisions affect both the idioms that are useful for designing concurrent software and the compiler transformations that are legal within the language.
Cases where semantics are more subtle than people expect include the actual meaning of both strong and weak atomicity; correct idioms for thread safe lazy initialization; compiler transformations of transactions that touch only thread local memory; and whether there is a well-defined notion for transactions that corresponds to the notion of correct and incorrect use of synchronization in Java. Open questions for a high-level memory-model that includes transactions involve both issues of isolation and ordering.

вторник, 20 июля 2010 г.

Werner’s requirements

Job Opening for a Senior Research Engineer:
"You need to be able to dive deep on technology issues, use your analytical skills to reduce a problem to its fundamentals, and create solutions."
"... use computer science theory and knowledge of advanced research to design solutions that are fundamental in nature ..."
"...understanding of distributed storage systems, scalable database technologies and data stream processing."

понедельник, 12 июля 2010 г.

JMM links

On main JMM page some BROKEN links on some Jeremy Manson papers.
Here is right links:
+ Dissertation: The Java Memory Model (Jeremy Manson, Doctor of Philosophy, 2004)
+ The Java Memory Model (POPL '05)
+ Requirements for Programming Language Memory Models (SCJP '04)

P.S. From JMM page - "A journal submission about the memory model that combines Jeremy Manson's dissertation, the POPL paper and the CSJP paper."
As i see journal whole includes POPL and SCJP papers !!!BUT!!! Dissertation contains more complete explanation of causality formalizm (without dissertation its very hard for me to understand it).

пятница, 18 июня 2010 г.

The Hotspot Java Virtual Machine

In slides The Hotspot Java Virtual Machine there is many links to GC books/articles.

вторник, 15 июня 2010 г.

[PJP]: Google Developers - Numbers Everyone Should Know

From "Google: Designs, Lessons and Advice from Building Large Distributed Systems":
Numbers Everyone Should Know

L1 cache reference ------------------------- 0.5 ns
Branch mispredict --------------------------- 5 ns
L2 cache reference ------------------------- 7 ns
Mutex lock/unlock --------------------------- 25 ns
Main memory reference ------------------- 100 ns
Compress 1K bytes with Zippy ----------- 3,000 ns
Send 2K bytes over 1 Gbps network ---- 20,000 ns
Read 1 MB sequentially from memory --- 250,000 ns
Round trip within same datacenter ------ 500,000 ns
Disk seek ------------------------------------- 10,000,000 ns
Read 1 MB sequentially from disk -------- 20,000,000 ns
Send packet CA->Netherlands->CA ----- 150,000,000 ns

четверг, 10 июня 2010 г.

[PJP]: Broker vs. Brokerless

Broker vs. Brokerless

Table of Contents
- Introduction
- Broker
- No Broker
- Broker as a Directory Service
- Distributed broker
- Distributed directory service
- Conclusion

вторник, 8 июня 2010 г.

Cliff Click vs Doug Lea: about fence-less CAS

Interesting conversation(s) with Doug Lea

Abstract
We are discussing the merits of a no-fence CAS instruction - and exposing this instruction at the Java level. Since Intel does not have a no-fence this whole idea is a no-op for Intel. Right now there is a weak spurious-fail-allowed CAS in the Unsafe to mimic load-linked/store-conditional ops, to get buy-in from IBM whose chips do LL/SC instead of CAS. Doug wants to the fence-less CAS to allow spurious failure, and I do not because it causes me to add retry loops in places.

пятница, 4 июня 2010 г.

The Pauseless GC Algorithm

The Pauseless GC Algorithm from Azul Virtual Machine (AVM).

Abstract
Modern transactional response-time sensitive applications have run into practical limits on the size of garbage collected heaps. The heap can only grow until GC pauses exceed the responsetime limits. Sustainable, scalable concurrent collection has become a feature worth paying for.

Azul Systems has built a custom system (CPU, chip, board, and OS) specifically to run garbage collected virtual machines. The custom CPU includes a read barrier instruction. The read barrier enables a highly concurrent (no stop-the-world phases), parallel and compacting GC algorithm. The Pauseless algorithm is designed for uninterrupted application execution and consistent mutator throughput in every GC phase.

Beyond the basic requirement of collecting faster than the allocation rate, the Pauseless collector is never in a “rush” to complete any GC phase. No phase places an undue burden on the mutators nor do phases race to complete before the mutators produce more work. Portions of the Pauseless algorithm also feature a “self-healing” behavior which limits mutator overhead and reduces mutator sensitivity to the current GC state.

We present the Pauseless GC algorithm, the supporting hardware features that enable it, and data on the overhead, efficiency, and pause times when running a sustained workload.

P.S. This GC works on Azul Machines (700+ cores).

Profiling java.util.concurrent locks

Profiling java.util.concurrent locks

Abstract
IBM’s Yao Qi, Raja Das, and Zhi Da Luo describe jucprofiler, an alphaWorks tool designed to profile multicore applications that make use of the java.util.concurrent classes introduced in Java 5.

вторник, 1 июня 2010 г.

Семинары приостановлены

В виду начавшейся сессии и того, что многие довольно таки эффективно нашли себе работу (с чем всех и поздравляю) - семинары приостановлены (предположительно до осени).

пятница, 28 мая 2010 г.

Есть у кого - "Distributed Algorithms, Nancy Lynch"?

Господа и дамы, если у кого есть сабж в электронном варианте - вышлите, пожалуйста. Найти не удается.

четверг, 27 мая 2010 г.

вторник, 25 мая 2010 г.

Classloader leak!

Some interesting type of memory leak - classloader leak:
- Log4j primer
- IntrospectionUtils, DriverManager, DOM4J, Mozilla Rhino, etc
"...There are two main patterns that cause this situation. The first one is where a library loaded by the container has a cache that keeps strong references to objects or classes in the application. The second one is where the application has ThreadLocal data that is attached to a container thread. In tomcat this thread will be part of the thread pool, so it will never be garbage collected..."

вторник, 18 мая 2010 г.

deja vu: reordering

wiki: deja vu:
"Существует гипотеза, что при возникновении дополнительных нейронных связей воспринимаемая информация может поступать в область памяти раньше, чем в аппарат первичного анализа. Поэтому мозг, сравнивая ситуацию с её копией, уже поступившей в память, приходит к выводу, что это уже было."

Reordering в работе мозга:)

вторник, 11 мая 2010 г.

Семинар сегодня (11 мая) - будет

Семинар сегодня (11 мая) - будет

четверг, 29 апреля 2010 г.

Understanding the Limitations of Causally and Totally Ordered Communication

Understanding the Limitations of Causally and Totally Ordered Communication

Abstract
Causally and totally ordered communication support (CATOCS) has been proposed as important to provide as part of the basic building blocks for constructing reliable distributed systems. In this paper, we identify four major limitations to CATOCS, investigate the applicability of CATOCS to several classes of distributed applications in light of these limitations, and the potential impact of these facilities on communication scalability and robustness. From this investigation, we find limited merit and several potential problems in using CATOCS. The fundamental difficulty with the CN.OCS is that it attempts to solve state problems at the communication level in violation of the well-known “end-to-end” argument.

вторник, 27 апреля 2010 г.

Семинары продолжаться с 11го мая

Господа
- сегодня(27 апр) семинара не будет
- следующий будет 11 мая, приблизительная тема "Consistency models"

P.S. Всем удачных Майских Праздников, Любви, Шашлыка и Байдарок:)

пятница, 23 апреля 2010 г.

Concurrency Control, Transaction Isolation and Serializability in SQL92 and Oracle

Concurrency Control, Transaction Isolation and Serializability in SQL92 and Oracle7 - possibly the main document for citation about Oracle-specific levels of tx-isolation:
+ Oracle Consistent Read
+ Snapshot Isolation

P.S. I found it from this post - Generalized Isolation Level Definitions.
"... and Gemstone [22] and Oracle [24] provide serializability and Snapshot Isolation, respectively, using multi-version optimistic implementations."
"...
[24] Oracle Corporation. Concurrency Control, Transaction Isolation and Serializability in SQL92 and Oracle7, July 1995.
..."

вторник, 20 апреля 2010 г.

Танненбаум, Стеен: Распределенные системы. Принципы и парадигмы

Танненбаум, Стеен: Распределенные системы. Принципы и парадигмы
Содержание:
Глава 1. Введение 22
Глава 2. Связь 81
Глава 3. Процессы 164
Глава 4. Именование 214
Глава 5. Синхронизация 274
Глава 6. Непротиворечивость и репликация 328
Глава 7. Отказоустойчивость 403
Глава 8. Защита 458
Глава 9. Распределенные системы объектов 539
Глава 10. Распределенные файловые системы 623
Глава 11. Распределенные системы документов 699
Глава 12. Распределенные системы согласования 752
Глава 13. Библиография 790
Список терминов 833
Алфавитный указатель 855

Семинар, посвященный масштабируемости высоконагруженных систем

Тут и тут:

Программа мероприятия

  1. Данные в высоконагруженных распределённых приложениях. 60 мин (А. Рагозин, Development Manager, Grid Dynamics, г. Москва)
  2. Масштабируемость и уровни изоляции транзакций. 30 мин (И. Головач, Senior Software Engineer, Grid Dynamics, г. Харьков)
  3. Обработка больших объемов данных с помощью Map/Reduce. 60 мин (А. Клочков, Senior Software Engineer, Grid Dynamics, г. Москва)

понедельник, 19 апреля 2010 г.

The Memory Management Glossary

http://www.memorymanagement.org/glossary/ - big memory management glossary with something about 400 terms.

Glossary articles contains links to The Memory Management Reference Full Bibliography.

Bibliography contains many "strange" articles like this "NREVERSAL of Fortune -- The Thermodynamics of Garbage Collection" [PDF].

четверг, 15 апреля 2010 г.

Monitors and Concurrent Pascal: a personal history

Monitors and Concurrent Pascal: a personal history (P. Brinch Hansen)
Abstract
This is a personal history of the early development of the monitor concept and its implementation in the programming language Concurrent Pascal. The paper explains how monitors evolved from the ideas of Dahl, Dijkstra, Hoare, and the author (1971-73). At Caltech the author and his students developed and implemented Concurrent Pascal and used it to write several model operating systems (1974-75). A portable implementation of Concurrent Pascal was widely distributed and used for system design (1976-90). The monitor paradigm was also disseminated in survey papers and text books. The author ends the story by expressing his own mixed feelings about monitors and Concurrent Pascal.

[NIO]: Netty

Netty Framework:
The Netty project
is an effort to provide an asynchronous event-driven network application framework and tools for rapid development of maintainable high performance & high scalability protocol servers & clients.

Performance Test Reports:
1) Last link is a comparision between 5 most popular NIO Frameworks in Java: Netty, Grizzly, MINA, NIO Framework and xSocket.
2) Trustin Lee (founder of Netty)
- have expirience with NIO from 2003
- is co-founder of Apache MINA:)

вторник, 13 апреля 2010 г.

13 апреля - семинар отменяется

Так выходит, что сегодня я не смогу провести семинар.

вторник, 6 апреля 2010 г.

Семинар отменяется:(

Тысяча извинений, но сегодня я не смогу провести семинар.

суббота, 3 апреля 2010 г.

Google Code University Materials

Advanced Concepts in Java

Distributed Systems

Some Linux I/O papers

Comparing and Evaluating epoll, select, and poll Event Mechanisms
Abstract
This paper uses a high-performance, eventdriven, HTTP server (the mserver) to compare the performance of the select, poll, and epoll event mechanisms. We subject the mserver to a variety of workloads that allow us to expose the relative strengths and weaknesses of each event mechanism.

четверг, 1 апреля 2010 г.

NASDAQ+RTJS: 150K tps, 500M tx-per-day

Can't find slides from JavaOne 2007 lecture of Anna Ewing, CIO of the NASDAQ exchange, but here some info about NASDAQ (writed using Real-Time Java Specification (JSR-1)):
- 150K transactions-per-second;
- 500M transactions-per-day.
Here and here.

[DSL]: ApacheCamel

As i wrote early i like some 'specific' style of programming. So now i have yet some examples:

From here:
-------------------------
onException(ValidationException.class).to("activemq:validationFailed");

onException(MyFunctionalException.class).process(new MyFunctionFailureHandler()).stop();

from("jms:queue:order:input")
.to("bean:validateOrder");
.to("bean:transformOrder")
.to("bean:handleOrder");

вторник, 30 марта 2010 г.

Семинар отменяется

Тысяча извинений, но я сегодня не смогу провести семинар.

[Protocols]: Fast TCP

Fast TCP:
- wiki
- "FAST TCP: motivation, architecture, algorithms, performance"
- "FAST TCP: from theory to experiments"

To poll or epoll: that is the question

To poll or epoll: that is the question:

"One of the updates in build 59 of Mustang (JavaTM SE 6) is that the New I/O Selector implementation will use the epoll event notification facility when running on the Linux 2.6 kernel. The epoll event mechanism is much more scalable than the traditional poll when there are thousands of file descriptors in the interest set. The work done by poll depends on the size of the interest set whereas with epoll (like Solaris /dev/poll) the registration of interest is separated from the retrieval of the events. A lot has been written on the topic. The C10K problem has been documenting I/O frameworks and strategies for several years. One relatively recent paper on Comparing and Evaluating epoll, select, and poll Event Mechanisms makes it clear the workloads where epoll performs a lot better than poll.

[Protocols]: Sockets Direct Protocol (SDP)

Lesson: Understanding the Sockets Direct Protocol:

This section is being updated to reflect features and conventions of the upcoming release, JDK7. You can download the current JDK7 Snapshot from java.net. We've published this preliminary version so you can get the most current information now, and so you can tell us about errors, omissions, or improvements we can make to this tutorial.

For high performance computing environments, the capacity to move data across a network quickly and efficiently is a requirement. Such networks are typically described as requiring high throughput and low latency. High throughput refers to an environment that can deliver a large amount of processing capacity over a long period of time. Low latency refers to the minimal delay between processing input and providing output, such as you would expect in a real-time application.

In these environments, conventional networking using socket streams can create bottlenecks when it comes to moving data. Introduced in 1999 by the InfiniBand Trade Association, InfiniBand (IB) was created to address the need for high performance computing. One of the most important features of IB is Remote Direct Memory Access (RDMA). RDMA enables moving data directly from the memory of one computer to another computer, bypassing the operating system of both computers and resulting in significant performance gains.

The Sockets Direct Protocol (SDP) is a networking protocol developed to support stream connections over InfiniBand fabric. SDP support was introduced to Java Platform, Standard Edition ("Java SE Platform") in JDK7 for applications deployed in the Solaris Operating System ("Solaris OS"). The Solaris OS has supported SDP and InfiniBand since Solaris 10 5/08.

понедельник, 29 марта 2010 г.

Seminar 30.03: GFS, Chubby, BigTable

Themes for seminar 30.03:
- GFS
- Chubby
- BigTable

There are kernel technologies at Google.

пятница, 26 марта 2010 г.

The Google File System

The Google File System
Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung, Google

Abstract
We have designed and implemented the Google File System, a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients.
While sharing many of the same goals as previous distributed file systems, our design has been driven by observations of our application workloads and technological environment, both current and anticipated, that reflect a marked departure from some earlier file system assumptions. This has led us to reexamine traditional choices and explore radically different design points.

MapReduce: Simpli ed Data Processing on Large Clusters

MapReduce: Simplified Data Processing on Large Clusters
Jeffrey Dean and Sanjay Ghemawat, Google Inc.

Abstract
MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper.
Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.
Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day.

Interpreting the Data: Parallel Analysis with Sawzall

Interpreting the Data: Parallel Analysis with Sawzall
Rob Pike, Sean Dorward, Robert Griesemer, Sean Quinlan, Google Inc.

Abstract
Very large data sets often have a flat but regular structure and span multiple disks and machines. Examples include telephone call records, network logs, and web document repositories. These large data sets are not amenable to study using traditional database techniques, if only because they can be too large to fit in a single relational database. On the other hand, many of the analyses done on them can be expressed using simple, easily distributed computations: filtering, aggregation, extraction of statistics, and so on. We present a system for automating such analyses. A filtering phase, in which a query is expressed using a new procedural programming language, emits data to an aggregation phase. Both phases are distributed over hundreds or even thousands of computers. The results are then collated and saved to a file. The design—including the separation into two phases, the form of the programming language, and the properties of the aggregators—exploits the parallelism inherent in having data and computation distributed across many machines.

The Chubby lock service for loosely-coupled distributed systems

The Chubby lock service for loosely-coupled distributed systems.
Mike Burrows, Google Inc.

Abstract
We describe our experiences with the Chubby lock service, which is intended to provide coarse-grained locking as well as reliable (though low-volume) storage for a loosely-coupled distributed system. Chubby provides an interface much like a distributed file system with advisory locks, but the design emphasis is on availability and reliability, as opposed to high performance. Many instances of the service have been used for over a year, with several of them each handling a few tens of thousands of clients concurrently. The paper describes the initial design and expected use, compares it with actual use, and explains how the design had to be modified to accommodate the differences.

Bigtable: A Distributed Storage System for Structured Data

Bigtable: A Distributed Storage System for Structured Data
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, ant others, Google Inc

Abstract
Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. In this paper we describe the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable.

среда, 24 марта 2010 г.

Comparing Two High-Performance I/O Design Patterns

Comparing Two High-Performance I/O Design Patterns:
1. Short and clear comparison of Reactor and Proactor I/O patterns.
2. Interesting info about props of different I/O mechanizm on different platforms (Win, Linux, BSD, Solaris).

Summary
This article investigates and compares different design patterns of high performance TCP-based servers. In addition to existing approaches, it proposes a scalable single-codebase, multi-platform solution (with code examples) and describes its fine-tuning on different platforms. It also compares performance of Java, C# and C++ implementations of proposed and existing solutions.

понедельник, 15 марта 2010 г.

Concurrent Data Structures

Concurrent Data Structures from concurrency experts Mark Moir and Nir Shavit (Sun Microsystems Laboratories).

In 23 page overview authors cite !!!138!!! world-class sources + introduce good index.

Contents:
1.1 Designing Concurrent Data Structures
- Performance
- Blocking Techniques
- Nonblocking
- Techniques
- Complexity Measures
- Correctness
- Verification Techniques
- Tools of the Trade
1.2 Shared Counters and Fetch-and-Á Structures
1.3 Stacks and Queues
1.4 Pools
1.5 Linked Lists
1.6 Hash Tables
1.7 Search Trees
1.8 Priority Queues
1.9 Summary

пятница, 12 марта 2010 г.

Amazon SimpleDB Consistency Enhancements

Amazon SimpleDB Consistency Enhancements.

Abstract
This document outlines the new strong consistency features in SimpleDB - consistent read and conditional put/delete. It also demonstrates how they can be used to program key database application scenarios such as persistent application state, concurrency control, item counter, and conditional update/delete.

четверг, 11 марта 2010 г.

Scalable servers: NIO + NIO.2

Some ineresting slides:
+ Doug Lea, "Scalable IO in Java" (pattern Reactor in NIO);
+ Alan Bateman(Sun), Jeanfrancois Arcand(Sun), "Asynchronous I/O Tricks and Tips".

среда, 10 марта 2010 г.

HotSpot Glossary of Terms

HotSpot Glossary of Terms:
- adaptive spinning
- biased locking
- block start table
- bootstrap classloader
- bytecode verification
- C1 compiler
- C2 compiler
- card table
- class data sharing
- class hierachy analysis(CHA)

Семинар

Тысяча извинений, что не смог провести семинар 9 марта и не смог всех предупредить.

воскресенье, 28 февраля 2010 г.

Google Collections LIbrary 1.0 final

Коллекции от Google помогут не изобретать велосипед. Есть удобные вещи для многопоточности.
Google Collections LIbrary 1.0 final (part1)
Google Collections LIbrary 1.0 final (part2)

P.S. Никогда не пользуйтесь сервисом http://ru.savefrom.net, а для того, чтобы скачать видео, зарегистрируйтесь на youtube.com :)

четверг, 25 февраля 2010 г.

Cloud vs Grid

The effective use of a Cloud is dependent on the applications. The Cloud can be used appropriately when dealing with linear processes and independent, relatively small data volumes. For applications with larger storage requirements or closely coupled parallel processes with high I/O requirements, Clouds are often useless [here].

Grid works at the "library level" where the users are provided an known "environment" in which their applications should be able to run. In a sense cloud computing offers what grid cannot, a predictable execution environment [here].

Some links from second source:
- Benchmarking Amazon EC2 for High-performance Scientific Computing
- Can Cloud Computing Reach The TOP500?
- HPC in the Cloud

среда, 24 февраля 2010 г.

3 Phase Commit [3PC]

According to the Wiki - paper "A Formal Model of Crash Recovery in a Distributed System" of Dale Skeen and Michael Stonebraker is main source about 3PC [PDF or PDF].

Paper contains very good analisys and critique of 2PC and definition of 3PC.

There are else some research of thin boundary between Сonsistency and Partition Tolerance (like CAP-Theorem):
"Theorem #2: Rules 1 and 2 are sufficient for designing protocols resilient to a single site failure.
Theorem #3: There exist no protocol using independent recovery that is resilient to arbitrary failures by two sites.
Theorem #4: There exist no protocol resilient to a network partitioning when messages are lost.
Theorem #5: Design Rules 3 and 4 are nesessary and sufficient for making protocols resilient to a partition in a two-site site protocol.
Corollary #6: There exist multisite protocols that are resilient to a simple partition when undeliverable messages are returned to the sender.
Corollary #7: Knowlege of which messages were undelivered at the time the network fails is nesessary and sufficient for recovering from simple partitions.
Theorem #8: There exist no protocol resilient to a multiple partition."

BASE: An Acid Alternative

Cтатья. там же есть ссылка на pdf. Из статьи:
--------------------------
Conclusion

Scaling systems to dramatic transaction rates requires a new way of thinking about managing resources. The traditional transactional models are problematic when loads need to be spread across a large number of components. Decoupling the operations and performing them in turn provides for improved availability and scale at the cost of consistency. BASE provides a model for thinking about this decoupling.

вторник, 23 февраля 2010 г.

[ПАРАЛЛЕЛЬНЫЕ АЛГОРИТМЫ]

Нашел набор ресурсов по теме:) думаю уместно дать ссылку

понедельник, 22 февраля 2010 г.

Расписание семинаров: Вт-18.00

Семинар по Распределенным системам, Транзакциям и Согласованности Данных будет проходить по Вт в 18.00 в ауд 6-52 (начиная с 23 февраля).
В это вторник рассмотрим:
CAP-теорема
Amazon Dynamo
Oracle Coherence
реализация SQL-92 уровней изолированности транзакций в блокировочниках

пятница, 19 февраля 2010 г.

Модель многопоточности.

Уточнение по завтрашнему семинару. Он будет проходить в ауд. 6-56 (или 6-52)

суббота, 13 февраля 2010 г.

Java и языки с динамической типизацией?

Интересная статья об интеграции скриптовых языков в грядущей Java 7.

пятница, 12 февраля 2010 г.

Модель многопоточности.

Мною предложена абстрактная модель частного случая разделения потоками критических секций. Получены результаты по обнаружению т.н. dead-lock'ов достаточно простыми методами. Идеи пока сырые, но все желающие приглашаются их обсудить 20 февраля (суббота) в 11:00 часов в Каразина, аудитория 6-52. Я буду докладываться на семинаре у Григория Николаевича Жолткевича.

Старт семинаров

Предлагается провести первый семинар весеннего семестра во Вторник в !!!18.00!!!.
К этому времени я точно буду в Харькове.
Примерные темы:
- обзор транзакций в Java(JTA, JTS, OTS(Corba)).
- транзакции в Spring + EJB3
- уровни изолированности(READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, READ CONSISTENCY, SNAPSHOT ISOLATION, SERIALIZABLE, CURSOR STABILITY) + феномены(Dirty Write, Dirty Read, Cursor Lost Update, Lost Update, Fuzzy Read, Phantom, Read Skew, Write Skew), три проблемы.
- возможные нетипичные подходы к реализации транзакций (не блоктровочники, не версионники, не оптимистичный ресолв) и феномены/аномалии, которые это может породить.

Anatomy of the Linux ...

Serie: Anatomy of the Linux ...[ALL ARTICLES]:
- Anatomy of Linux process management
- Anatomy of real-time Linux architectures
- Anatomy of Linux synchronization methods
+
All articles from this author:
- Inside the Linux 2.6 Completely Fair Scheduler
- Linux and symmetric multiprocessing
- Inside the Linux scheduler

четверг, 11 февраля 2010 г.

Serializable: Information about the effort to add a true serializable transaction isolation level to PostgreSQL.

Serializable.
Information about the effort to add a true serializable transaction isolation level to PostgreSQL.
* 1 Overview
o 1.1 Serializable and Snapshot Transaction Isolation Levels
o 1.2 Serializable Isolation Implementation Strategies
o 1.3 Predicate Locking
o 1.4 Mixed Isolation Levels
* 2 Current Status
* 3 Development Path
o 3.1 Source Code Management
o 3.2 Implementation
o 3.3 Testing
* 4 R&D Issues
* 5 Discussion
* 6 Publications

"A new technique for implementing full serializable isolation in an MVCC database appears in the literature beginning in 2008[1][2]."

P.S. Впервые нахожу материал как в MVCC-database делают true-serializable.

Above the clouds with Android

Эта статья , как мне показалось, заслуживает нашего внимания :)

Summary: The open source Android operating system has taken the world by storm, allowing sophisticated cloud computing applications to run wherever you are. Designed to be highly efficient on battery-powered devices like the T-Mobile G1 smartphone, at heart, Android is Linux®, and there are several layers to the Android programming model that permit the creation of secure applications tailor-made for cloud computing. Soar to new heights with Android and experience mobile computing as you've never experienced it before.

Есть в русском варианте тут

Edsger W. Dijkstra Prize in Distributed Computing: Prize winning papers

Edsger W. Dijkstra Prize in Distributed Computing

"Prize winning papers:
* 2009: Joseph Halpern and Yoram Moses for "Knowledge and Common Knowledge in a Distributed Environment" in the Journal of the ACM, 37(3):549-587, July 1990.
* 2008: Baruch Awerbuch and David Peleg for "Sparse Partitions," in Proceedings of the 31st Annual Symposium on Foundations of Computer Science (FOCS), 503-513, October 1990.
* 2007: Cynthia Dwork, Nancy Lynch, and Larry Stockmeyer for "Consensus in the presence of partial synchrony," Journal of the ACM, 35(2):288-323, April 1988.
* 2006: John M. Mellor-Crummey and Michael L. Scott for "Algorithms for scalable synchronization on shared-memory multiprocessors," ACM Transactions on Computer Systems, 9(1):21-65, February 1991.
* 2005: Marshall Pease, Robert Shostak, and Leslie Lamport for "Reaching agreement in the presence of faults," Journal of the ACM, 27(1):228-234, April 1980.
* 2004: R. G. Gallager, P. A. Humblet, and P. M. Spira for "A Distributed Algorithm for Minimum-Weight Spanning Trees", ACM Transactions on Programming Languages and Systems, 5(1):66-77, January 1983.
* 2003: Maurice Herlihy for "Wait-Free Synchronization", ACM Transactions on Programming Languages and Systems, 13(1):124-149, January 1991.
* 2002: Edsger W. Dijkstra for "Self-stabilizing systems in spite of distributed control," Communications of the ACM, 17(11):643-644, November 1974.
* 2001: Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson for "Impossibility of Distributed Consensus with One Faulty Process," Journal of the ACM, 32(2):374-382, April 1985.
* 2000: Leslie Lamport for "Time, Clocks, and the Ordering of Events in a Distributed System," Communications of the ACM, 21(7):558-565, July 1978."

среда, 10 февраля 2010 г.

CAP Theorem

C == Consistency
A == Availability
P == Tolerance to network Partitions

---
Theorem: You can have at most two of these properties for any shared-data system.
---

Here the keynote speech by Eric Brewer at the ACM Symposium on the Principles of Distributed Computing (PODC) where he fist time introduce such theorem.

Two years later, in 2002, Seth Gilbert and Nancy Lynch of MIT, formally proved Brewer to be correct and thus Brewer's Theorem was born.

Very good disscussion about CAP-Theorem is here.

вторник, 9 февраля 2010 г.

Erlang: AXD 301 – ATM switch from Ericsson

Взято тут.
"Как только упоминается язык Erlang, так сразу в качестве козырного туза предъявляется проект AXD 301 – ATM switch (телефонный цифровой коммутатор, если я правильно перевожу это на русский) от Ericsson-а. Поскольку практически всегда, когда речь заходит о нем, приходиться искать в Интернете статьи с указанием объема кода и количества разработчиков, то решил некоторые из них опубликовать здесь. Итак:

AXD 301-A new generation ATM switching system – рекламное-обзорное описание самого устройства. 1998 год.

Ulf Wiger’s Four-fold Increase in Productivity and Quality – презентация с рассказом о том, как программирование на Erlang увеличивает скорость и качество разработки. С примером AXD 301, естественно. 2001 год.

Joe Armstrong’s Concurrency Oriented Programming in Erlang – доклад о конкурентном программировании на Erlang. 2002 год (и тот же доклад, но уже в 2003 году).

Joe Armstrong’s Making reliable distributed systems in the presence of sodware errors – диссертация Армстронга, в которой он говорит, в том числе, и об AXD 301. 2003 год.

Mats Cronqvist’s Troubleshooting a Large Erlang System – небольшая статья о том, как разрабатывался софт для AXD 301. 2004 год.

Leslie Lamport's book: "Specifying Systems"

Leslie Lamport - one of the Monster in theory of concurrent and distributed systems wrote a book "Specifying Systems" [PDF].

CONTENTS
1. A Little Simple Math
2. Specifying a Simple Clock
3. An Asynchronous Interface
4. A FIFO
5. A Caching Memory
6. Some More Math
7. Writing a Specification: Some Advice
8. Liveness and Fairness
9. Real Time
10. Composing Specifications
11. Advanced Examples
12. The Syntactic Analyzer
13. The TLATeX Typesetter
14. The TLC Model Checker
15. The Syntax of TLA+
16. The Operators of TLA+
17. The Meaning of a Module
18. The Standard Modules

P.S. Book page contains a link on "The Wildfire Verification Challenge Problem".
P.P.S TLA - The Temporal Logic of Actions

понедельник, 8 февраля 2010 г.

Конкурс з програмування в середовищі CUDA в Україні

Конкурс з програмування в середовищі CUDA в Україні

Корпорація NVIDIA спільно з українським партнером, компанією Юстар, оголошують про проведення конкурсу по паралельному програмуванню в середовищі CUDA.

Головна мета конкурсу — популяризація CUDA серед українських розробників програмного забезпечення, фахівців і вчених, що використовують паралельне програмування, а також студентів технічних ВУЗів.

Технологія NVIDIA CUDA ™ — це єдина середовище розробки на C, що дозволяє програмістам і розробникам писати програмне забезпечення для вирішення складних обчислювальних задач за менший час завдяки багатоядерної обчислювальної потужності графічних процесорів. У світі вже встановлені мільйони GPU з підтримкою CUDA, і тисячі програмістів вже безкоштовно користуються інструментами CUDA для прискорення додатків – від кодування відео та аудіо до пошуків нафти і газу, моделювання продуктів, медичних зображень та наукових досліджень.

Конкурс відбудеться в період з 15 лютого по 14 травня 2010.

Деталі конкурсу та умови участі на сторінці Конкурс CUDA

P.S. Можно поучаствовать, если придумать какую-нибудь нетривиальную тему:)

11 лютого, Харків: Конференція «Високопродуктивні обчислення та програмування на гібридних суперкомп’ютерних системах на базі TESLA, архітектура CUDA»

Конференція «Високопродуктивні обчислення та програмування на гібридних суперкомп’ютерних системах на базі TESLA ™ від NVIDIA ®, архітектура CUDA», Харків

Дата: 11.02.2010

Место: м.Харків, Інститут сцинтиляційних матеріалів НАН України, проспект Леніна, 60
Захід проходить за сприяння Інституту сцинтиляційних матеріалів НАН України.

Архітектура паралельних обчислень CUDA від NVIDIA дозволила дослідникам усього світу, застосовуючи мову С, задіяти обчислювальні потужності графічних процесорів для вирішення складних завдань високопродуктивних обчислень.

Гібридні суперкомп’ютерні системи на базі NVIDIA Tesla застосовуються для вирішення завдань щодо розрахунків у науково-дослідницькій діяльності, надаючи доступ до нових обчислювальних висот. Рішення постачаються в різних форматах – від персональних суперкомп’ютерів прямо на вашому столі, до гібридних суперкомп’ютерних систем з кластерів з попередньою конфігурацією.

ThreadLocal: history of performance improvment

From "Threading lightly, Part 3: Sometimes it's best not to share. Exploiting ThreadLocal to enhance scalability"

ThreadLocal performance
While the concept of a thread-local variable has been around for a long time and is supported by many threading frameworks including the Posix pthreads specification, thread-local support was omitted from the initial Java Threads design and only added in version 1.2 of the Java platform. In many ways, ThreadLocal is still a work in progress; it was rewritten for version 1.3 and again for version 1.4, both times to address performance problems.

In JDK 1.2, ThreadLocal was implemented in a manner very similar to Listing 2, except that a synchronized WeakHashMap was used to store the values instead of a HashMap. (Using WeakHashMap solves the problem of Thread objects not getting garbage collected, at some additional performance cost.) Needless to say, the performance of ThreadLocal was quite poor.

Dynamo: Amazon’s Highly Available Key-value Store

Dynamo: Amazon’s Highly Available Key-value Store

Abstract
Reliability at massive scale is one of the biggest challenges we face at Amazon.com, one of the largest e-commerce operations in the world; even the slightest outage has significant financial consequences and impacts customer trust. The Amazon.com platform, which provides services for many web sites worldwide, is implemented on top of an infrastructure of tens of thousands of servers and network components located in many datacenters around the world. At this scale, small and large components fail continuously and the way persistent state is managed in the face of these failures drives the reliability and scalability of the software systems.

This paper presents the design and implementation of Dynamo, a highly available key-value storage system that some of Amazon’s core services use to provide an “always-on” experience. To achieve this level of availability, Dynamo sacrifices consistency under certain failure scenarios. It makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.

Categories and Subject Descriptors
D.4.2 [Operating Systems]: Storage Management; D.4.5 [Operating Systems]: Reliability; D.4.2 [Operating Systems]: Performance;

General Terms
Algorithms, Management, Measurement, Performance, Design, Reliability.

Group Communication, Total Ordering and Consensus

These are scientific papers which have influenced Postgres-R in some way or another, very interesting readings.

Total Order Broadcast and Multicast Algorithms: Taxonomy and Survey
a good overview, classification and theoretical comparison of group communication systems, by Xavier Défago, André Schipper and Péter Urbán, April 2004
Comparative Performance Analysis of Ordering Strategies in Atomic Broadcast Algorithms
Theoretical analysis of various algorithms for atomic broadcast used for total ordered delivery of messages, by Xavier Défago, André Schiper and Péter Urbán, December 2003
PLATO: Predictive Latency-Aware Total Ordering
An interesting approach to implement low-latency reliable multicasting in datacenters, based on observing arrival times, by Mahesh Balakrishnan, Ken Birman and Amar Phanishayee, October 2006
Ricochet: Lateral Error Correction for Time-Critical Multicast
Ricochet uses IP multicast and lateral error correction to implement reliable multicasting for clustered applications, by Mahesh Balakrishnan, Ken Birman, Amar Phanishayee and Stefan Pleisch, April 2007.

P.S. Взято отсюда.

Database Replication

These are scientific papers which have influenced Postgres-R in some way or another, very interesting readings.

Don't be lazy, be consistent
The paper that started it all: Don't be lazy, be consistent: Postgres-R, a new way to implement Database Replication, by Bettina Kemme and Gustavo Alonso, September 2000.
Postgres-R(SI)
Combining Replica Control with Concurrency Control based on Snapshot Isolation — a continuation of the Postgres-R idea, making use of MVCC by Shuqing Wu and Bettina Kemme, April 2005.
Database Replication
Lots of other interesting readings about database replicaiton by Bettina Kemme.
Processing Transactions over Optimistic Atomic Broadcast Protocols
Proposes exploiting the spontaneous ordering and using optimistic delivery to decrease the delay induced by the GCS, by Bettina Kemme, Fernando Pedone, Gustavo Alonso and André Schipper.
Pronto: High availability for standard off-the-shelf databases
An eager, update-everywhere replication middleware, using JDBC, by Fernando Pedone and Svend Frolund, 2008
Middleware-based Database Replication: The Gaps Between Theory and Practice
A good overview and comparison of the current state of theory and practice, by Emmanuel Cecchet, George Candea, Anastasia Ailamaki

P.S. Взято отсюда.

воскресенье, 7 февраля 2010 г.

Postgres-R (8) Architecture

Postgres-R (8) Architecture.

Abstract
Tis document describes the design and architecture of Postgres-R (8), a multi-master replication system for Postgres. It is an extension of the work presented by [KA00]and incorporates enhancements from the subsequent paper Postgres-R (SI) by [WK05]. Further inspiration originates from Slony-II of Neil Conway and Gavin Sherry and from conversation with other fellow hackers of Postgres. Please note that this paper describes the underlying concept and does not necessarily reflect the status of the prototype implementation, which is available from http://www.postgres-r.org. The reader is supposed to be familiar with Postgres internals, especially with Multi-Version Concurrency Control (MVCC) and transaction isolation issues.

пятница, 5 февраля 2010 г.

PJP: OSGi: "Listeners Considered Harmful: The “Whiteboard” Pattern"

Listeners Considered Harmful: The “Whiteboard” Pattern.

OSGi - пожалуй самая популярная технология позволяющая устанавливать, заменять и убирать отдельные компоненты ПО в одной JVM БЕЗ ПЕРЕЗАПУСКА ПРИЛОЖЕНИЯ прямо ВО ВРЕМЯ РАБОТЫ ПРИЛОЖЕНИЯ. И как раз патерн Whiteboard - ключевой момент для реализации этих возможностей.

P.S. Взято тут. Для интереса там еще ссылки на несколько pdf-оф.
P.P.S. В конце документа ссылка на Java Tip 79: Interact with garbage collector to avoid memory leaks - борьба с "memory leaks" в Java из-за потери памяти на забытых(не дерегистрированных) listeners. Борьба ведется при помощи weak references.

пятница, 29 января 2010 г.

Book: Beautiful Code

Beautiful Code: Leading Programmers Explain How They Think (Theory in Practice (O'Reilly)).

--- Foreword ---
I got my first job as a programmer in the summer of 1982. Two weeks after I started, one of the system administrators loaned me Kernighan and Plauger's The Elements of Programming Style (McGraw-Hill) and Wirth's Algorithms + Data Structures = Programs (Prentice Hall). They were a revelation—for the first time, I saw that programs could be more than just instructions for computers. They could be as elegant as well-made kitchen cabinets, as graceful as a suspension bridge, or as eloquent as one of George Orwell's essays.

Generalized Isolation Level Definitions

Generalized Isolation Level Definitions.
More recent, more complete, more difficult to follow.

Atul Adya - Microsoft Research
Barbara Liskov - Laboratory for Computer Science, MIT, Cambridge
Patrick O’Neil - Univ. of Massachusetts, Boston

Abstract
Commercial databases support different isolation levels to allow programmers to trade off consistency for a poten tial gain in performance. The isolation levels are defined in the current ANSI standard, but the definitions are ambigu ous and revised definitions proposed to correct the problem are too constrained since they allow only pessimistic (locking) implementations. This paper presents new specifications for the ANSI levels. Our specifications are portable; they apply not only to locking implementations, but also to optimistic and multi-version concurrency control schemes. Furthermore, unlike earlier definitions, our new specifications handle predicates in a correct and flexible manner at all levels.

A Critique of ANSI SQL Isolation Levels

Классика цитирования:
A Critique of ANSI SQL Isolation Levels.
Good, fairly readable discussion of transaction isolation.

Hal Berenson Microsoft Corp.
Phil Bernstein Microsoft Corp.
Jim Gray U.C. Berkeley
Jim Melton Sybase Corp.
Elizabeth O’Neil UMass/Boston
Patrick O'Neil UMass/Boston

Abstract:
ANSI SQL-92 [MS, ANSI] defines Isolation Levels in terms of phenomena: Dirty Reads, Non-Repeatable Reads, and Phantoms. This paper shows that these phenomena and the ANSI SQL definitions fail to properly characterize several popular isolation levels, including the standard locking implementations of the levels covered. Ambiguity in the statement of the phenomena is investigated and a more formal statement is arrived at; in addition new phenomena that better characterize isolation types are introduced. Finally, an important multiversion isolation type, called Snapshot Isolation, is defined.

1 . Introduction
"...which defined Degrees of Consistency in three ways: locking, data-flow graphs, and anomalies."
"The three ANSI phenomena are ambiguous, and even in their loosest interpretations do not exclude some anomalous behavior that may arise in execution histories."
"...lock-based isolation levels have different characteristics than their ANSI equivalents."
"...degrees of consistency defined in 1977 in [GLPT]."
"...Chris Date’s definitions of Cursor Stability..."

Семинары откладываются на середину февраля

Семинары откладываются на середину февраля в виду того, что я уезжаю в командировку.

вторник, 26 января 2010 г.

Matchers, Constraints, Predicates

Это появилось в jmock и называлось - 'Сonstraints'. Потом Это выделили в проект Hamcrest и назвали 'Matchers'(='Constraints', ='Predicates'). В конце концов JUnit стал использовать Hamcrest и называет Это 'Matchers' или просто 'assertThat'.

assertThat(x, is(3));
assertThat(x, is(not(4)));
assertThat(responseString, either(containsString("color")).or(containsString("colour")));
assertThat(myList, hasItem("3"));


assertThat(something, isA(Color.class));
assertThat(something, contains("World"));
assertThat(something, same(Food.CHEESE));
assertThat(something, not(eq("Hello")));
assertThat(something, not(contains("Cheese")));
assertThat(something, or(contains("color"), contains("colour")));
assertThat(something, between(10, 20));


assertThat(list, includes("peach", "pear", "plum"));

--------------

JUnit также ввел в свою библиотеку Assumptions и Theories.

P.S. Спасибо Java 5 за static import + varargs.

понедельник, 25 января 2010 г.

Some cache patterns and terms

From here:

Patterns:
------------------------
Read-Through Caching (синхронное чтение из backing store)
Write-Through Caching (синхронное сохранение в backing store)
Write-Behind Caching (асинхронное сохранение в backing store с возможностью Write-coalescing и Write-combining)
Refresh-Ahead Caching (асинхронная подкачка из backing store)
------------------------

Terms:
-------------
Cache-aside. Означает не более чем то, что время загрузки, обновления, удаления, сохранения сущностей выбираем мы сами. При Read-Through/Write-Through/Write-Behind/Refresh-Ahead кэш решает когда вызывать.

Write-coalescing:
The writes - which are typically much more expensive operations - are often reduced because multiple changes to the same object within the write-behind interval are "coalesced" and only written once to the underlying datasource ("write-coalescing").

Write-combining:
Additionally, writes to multiple cache entries may be combined into a single database transaction ("write-combining") by using the CacheStore.storeAll() method.

Cluster-durable, Disk-durable.

System of record(SoR), more here.

Extract, transform, load(ELF).
------------------------

P.S. Последние два термина(SoR, ELF) - скорее область Data Warehouse но хороший кластерный транзакционный кэш зачастую выполняет часть функций DW.

HPC for Dummies

High Performance Computing for Dummies [PDF].

HPC enables us to first model then manipulate products, services, and techniques. These days, HPC has moved from a selective and expensive endeavor to a cost-effective enabling technology within reach of virtually every budget. This book will help you to get a handle on exactly what HPC does and can be.

This special edition eBook from Sun and AMD shares details on real-world uses of HPC, explains the different types of HPC, guides you on how to choose between different suppliers, and provides benchmarks and guidelines you can use to get your system up and running.

SOA for Dummies

SOA for Dummies

Welcome to Service Oriented Architecture For Dummies, 2nd IBM Limited Edition. Service Oriented Architecture (SOA) is the most important technology initiative facing businesses today. SOA is game changing, and early SOA successes make it clear that SOA is here to stay.

This book introduces you to the basics of SOA in context with the real life experiences of seven companies. Seen through the varied business environments depicted in each of the case studies, we hope you will recognize that SOA is more than a bunch of new software products strung together to allow technology companies to have something else to sell. SOA represents a dramatic change in the relationship between business and IT. SOA makes technology a true business enabler and empowers business and technology leaders alike.

Doug Lea Discusses the Fork/Join Framework

Doug Lea Discusses the Fork/Join Framework.

Summary
Doug Lea talks to InfoQ about the evolution of the Fork/Join Framework, the new features planned for java.util.concurrent in Java 7, and the "Extra 166" package. The interview goes on to explore some of the hardware and language changes that are impacting concurrent programming, and the effect the increasing prevalence of alternative languages in the JVM are having on library design.

Bio
Doug Lea is a professor of computer science at State University of New York at Oswego where he specialises in concurrent programming and the design of concurrent data structures. He wrote "Concurrent Programming in Java: Design Principles and Patterns", one of the first books on the subject, and chaired JSR 166, which added concurrency utilities to Java.

вторник, 19 января 2010 г.

GPUs (CUDA) + tomography

Письмо от Kostiantyn Sokolinskyi:
--------------------
Товарищ в Бельгии занимается вот таким

"These are the pages we made about our "Super Computer" project:
several GPUs in one big pc... Just give him my contact info so he can
ask me any question he might have with regard to programming for
GPUs."

http://fastra.ua.ac.be/en/index.html
http://fastra2.ua.ac.be/

если интересно, могу дать его контакты :)
--------------------
Если кому интересно - могу дать контакты того, кто даст контакты:)

Продолжение семинаров

1. Продолжение семинаров ожидается с начала февраля. Точный даты и время будут чуть позже.
2. Виталий Пеньков(Скорп) имеет что сказать на 5 лекций по тестированию.
3. Я начну с рассказа про транзакции в J2EE(JTA, 2PC, JTS, OTS, EJB+Spring transactions).
4. Думаю удастся пригласить еще несколько интересных "рассказчиков":)

Event-Based Programming without Inversion of Control

"Event-Based Programming without Inversion of Control" from Philipp Haller, Martin Odersky(He designed the Scala programming language and Generic Java).
"...
Most programming models support event-driven programming only through inversion of control. Instead of calling blocking operations (e.g. for obtaining user input), a program merely registers its interest to be resumed on certain events (e.g. an event signaling a pressed button, or changed contents of a text eld). In the process, event handlers are installed in the execution environment which are called when certain events occur. The program never calls these event handlers itself. Instead, the execution environment dispatches events to the installed handlers. Thus, control over the execution of program logic is .

Virtually all approaches based on inversion of control su er from the following two problems: First, the interactive logic of a program is fragmented across multiple event handlers (or classes, as in the state design pattern [13]). Second, control ow among handlers is expressed implicitly through manipulation of shared state [10].
..."

Concurrency Oriented Programming in Erlang

35 pages presentation:
"Concurrency Oriented Programming in Erlang".

В начале показан интересный способ проверять "конкурентность" языка(Erlang, C#, Java) - гонять сообщение по кольцу из процессов.

Herb Sutter: Books & Articles

Herb Sutter: Books & Articles.
P.S. Жаль, что он на плюсах а не на яве:)
P.P.S. Tsnx to Kostiantyn Sokolinskyi.

суббота, 16 января 2010 г.

Stroustrup: "What Should We Teach New Software Developers? Why?"

Свежая статья Страуструпа "What Should We Teach New Software Developers? Why?" (EN, RU).

четверг, 14 января 2010 г.

Буду работать на Кремниевую Долину!

Прошел собеседование в харьковский офис GridDynamics, штаб-квартира компании находится в Сан-Франциско (Калифорния). На местном рынке она слилась с российской Mirantis.

Не устоял перед этим:
----
"... Grid Dynamics работает с критически важными для компании-клиента технологиями и является мировым лидером в области масштабирования сложных систем, которые выдерживают предельные нагрузки и функционируют в непрерывном цикле ..."
----
"... Как правило, после основного курса обучения в высшем учебном заведении наши инженеры продолжают образование, занимаясь научной деятельностью. На данный момент аспирантуру Саратовского университета закончили или продолжают обучение треть сотрудников «Мирантиса». В компании работает пять кандидатов физико-математических наук (из них два доцента кафедры Математической Кибернетики и Компьютерных Наук) и в ближайшее время планируется защита ещё четырёх кандидатских диссертаций ..."
----
"... Обязанности:
* участие в разработке архитектуры высокопроизводительных бизнес-приложений.
* участие в разработке высокопроизводительных бизнес-приложений, от сбора требований до вывода в эксплуатацию.
* анализ и улучшение производительности приложений
."
----
"... 8. Возможно ли, работая в «Мирантисе», параллельно заниматься научной деятельностью?
- Да, это вполне возможно. Часть наших специалистов имеют защищенные научные работы и создают новые, некоторые из сотрудников обучаются в аспирантуре и получают дополнительное образование
."
----
ну и в завершение:
"... Удобные и просторные рабочие комнаты ... в комнатах работает по 4-7 человек."

вторник, 12 января 2010 г.

java->ejb->jca->cics->cobol

Integrating WebSphere Application Server and CICS using the J2EE Connector Architecture.

24 страницы от IBM о том как из java можно вызывать legacy code на COBOL под управлением транзакционного менеджера CICS.
Используется магия JCA(Java Connector Architecture), позволяющая не только любой источник данных использовать в java, но и "гладко" проинтегрировать его в контейнер на уровне transactions/security/connection_pooling.

P.S. Atention! Name collision with JCA==Java Cryptography Architecture
P.P.S. Можно просто посмотреть Figure 3(стр 11).

вторник, 5 января 2010 г.

The C10K problem

The C10K problem.
Классическая статья о проблемах ПО при обслуживании 10.000 клиентов одновременно.

понедельник, 4 января 2010 г.

Книги по RDBMS

"Классика жанра:
Бернстайн
Concurrency Control and Recovery in Database Systems
Джефри Ульман
Системы баз данных. Полный курс
Джим Грей
Transaction Processing : Concepts and Techniques (The Morgan Kaufmann Series in Data Management Systems)"

Взято тут. Читал только "Системы баз данных. Полный курс". Стенфордский учебник. Первая часть как писать DB, вторая - ка писать сам движек RDBMS.
По моим представлениям для познания Баз Данных (скажем Оракла):
1) Дэйт (это введение)
2) Ульман (это основы)
3) Том Кайт (это специфика именно оракла)
Три книги - и в бой:).

P.S. Кто найдет в электронном виде - присылайте на мыло:).