Big Data

5 data integration trends that will define the future of ETL in 2018

5 data integration trends that will define the future of ETL in 2018

ETL refers to extract, transform, load and it is generally used for data warehousing and data …

Kubernetes for Big Data Workloads

Kubernetes for Big Data Workloads

Kubernetes has emerged as go to container orchestration platform for data engineering teams. …

For next wave of innovation organisations will need internal data services

For next wave of innovation organisations will need internal data services

To unlock the true value of data, organisations will need internal data services. Data services …

Reflections on Apache Drill

Reflections on Apache Drill

I have been playing with Apache Drill for quite some time now. In layman’s terms, Apache Drill …

Rise of Elastic Data Warehouse and Database Services

Rise of Elastic Data Warehouse and Database Services

Currently the majority of cloud based database and data warehouse services are provisioned with …

Requirements for stream processing architecture

Requirements for stream processing architecture

In 2005 Stonebraker et al. published a paper that outlined 8 key requirements for stream processing …

Building Distributed Systems with Mesos

Building Distributed Systems with Mesos

Apache Mesos is a popular open source cluster manager which enables building resource-efficient …

Hadoop Ecosystem- Deployment And Management

Hadoop Ecosystem- Deployment And Management

My notes and thoughts on Hadoop Ecosystem from book Hadoop Operations1. One of the major key take …

Traditional Ways To Solve Scalability Problems With RDBMS

Traditional Ways To Solve Scalability Problems With RDBMS

Notes plus thoughts from my recent read Cassandra: The Definitive Guide. Common ways to solve …