Power Query (or rather M) is a functional language. This simply means that functions can be passed around just as any other variable—or stored in lists, records and tables just like any other value. As a consequence, a function can take other functions as arguments, or itself return a new function.
I want to delve a bit deeper into this at some point but this post will give a very tangible example how applying functional techniques can make your code both easier to write and easier to read.
…no one ever said (and if they did, they shouldn’t have). You absolutely can have too many metrics, yes I’m talking to you.
That being said, here’s another one: the Gini coefficient. This may ring a bell, it is traditionally used to measure income inequality within a population so it regularly crops up in the news, articles etc. Technically there’s no reason we can’t use this to measure inequality in other domains, but why would we do that?
It’s all about the eggs to baskets ratio: The main reason that Gini is useful is that inequality often is an excellent proxy for risk.
Have you ever needed to count unique things like transactions or customers? Of course you have, and counting things is thankfully very straightforward in the Power Pivot universe, just use DISTINCTCOUNT('Harry'[BulletId]) and you’re done.
Have you ever needed to count lots and lots of things though? Maybe billions of things? You can still use the same function but the sheer amount of data itself may present a different challenge, especially if you’re working with Excel (or Power BI Desktop).
Thankfully, if you’re willing to trade some accuracy for more data points (or more information perhaps) there’s an excellent option. If you can live with an error of say ±2%, you can use the HyperLogLog algorithm to estimate the number of distinct elements in your data set.
This post lays out how to implement HyperLogLog in Power Pivot or Power BI, and in a way that is surprisingly easy. And to be clear, we will maintain the ability to drill and slice our data just like we normally would. In fact, apart from some loss of accuracy we’re making no concessions whatsoever, i.e. we’re not introducing any additional complexity to the model.
We’ll start with the why-this-works though, and please bear with me as it will get a bit geeky (or interesting, your choice).