In 2005 Stonebraker et al. published a paper that outlined 8 key requirements for stream processing architecture. These key requirements can be easily translated into building blocks of stream processing architecture. Although, this article dates before systems such as Apache Kafka, Amazon Kinesis, Apache Spark,...
Apache Mesos is a popular open source cluster manager which enables building resource-efficient distributed systems. Mesos provides efficient dynamic resources isolation and sharing across multiple distributed applications such as Hadoop, Spark, Memcache, MySQL etc on a dynamic shared pool of resources nodes. This means with...
My notes and thoughts on Hadoop Ecosystem from book Hadoop Operations[1].
One of the major key take aways is emergence of the Hadoop cluster deployment and management tools such as hstack and Apache AMBARI. In our own setup we managed to deploy and scale...
Notes plus thoughts from my recent read Cassandra: The Definitive Guide. Common ways to solve scalability bottleneck with relational databases,
Throw More/better Hardware (memory And Cpu)
* Vertical scaling
* Faster disks (SSD vs RAID)
Move To A Database Cluster
*
With master-slave configuration:
* Master is now...
It is good to know your data. But there is clear distinction between being data driven vs data informed. No matter which area you work, there is always an opportunity to make additional gains by closely observing the characteristic and quality of your data. By...