Recently I discovered Looker, a relatively new Business Intelligence (BI) platform which utilises a YAML based modeling language LookML. LookML is core engine of Looker. LookML offers a very powerful, highly expressive yet lightweight modeling layer. LookML combines data modelling with on-demand data transformations, and data derivations at the same layer. This kind of late-binding modeling bypasses traditional ETL based data loading and warehousing.
Lack of agilitiy is one of the biggest pains with traditional BI workflows. Often you have multiple data sources and a daily batch ETL process loads data in warehouse. Daily report snapshots (often csv or excel files) are generated from data warehouse. With this kind of workflow there is hardly any scope for experimentation. Ad hoc data requests take way to long. A minor change in report snapshot or ETL process may take upto a week.
Looker changes all of that. First, Looker can connect source databases directly and work with raw data. This means data used for reporting and visualisation is fresh and untransformed. Second, with LookML one can model very complex business logic with ease. LookML models are extensible, and reusable. In LookML model, you can define the key metrics and queries once and reuse them throughout the model. Third, LookML enables on-demand data transformations and derivation (closer to or at, query time) which means reports and visualisation are based on fresh data. Last but not least, LookML model files can be version controlled (GIT integration) and hence Looker provide a sophisticated change management provide.
Enough talking lets see a quick example which is borrowed from the Looker blog to demonstrate power and ease of LookML.
Lets say you are running an ecommerce store and want to calculate dashboard view for the total amount of orders and may be tier them. Your ecommerce store database has 3 tables: order, order items and inventory items.
Following LookML model will render our desired dashboard view.
- view: Order
fields:
- dimension: total_amount_of_order_usd
type: number
decimals: 2
sql:
(SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
- dimension: total_amount_of_order_usd_tier
type: tier
sql: ${total_amount_of_order_usd}
tiers: [0,10,5,150,500,1000]
As you can see, dimension total_amount_of_order_usd
was defined once and then reused to calculate total_amount_of_order_usd_tier
dimension using a handy Looker in-built function tier
. You can see outcome below.
All these dashboard views can be generated, altered, and updated - without the need to rewrite any SQL. You define your key dimensions with essential business logic once, then reference them within other dimensions and measures. To me that is a hell of a lot better than traditional BI workflow, allowing quicker iteration and experimentation.