Saturday, April 15, 2017

Decision Theory Under Uncertainty (Rational Choice)

I guess enough time has passed that I can confess that my previous post titled "Mandates and Pragmatism" was an allegory about the IUID mandate, and other mandates like it (if the shoe fits..).

When one steps back and asks enough "why" questions, it becomes evident that the problem that we are trying to solve is the problem of a mandate.  It is the "because I said so" rationale of our parents. Therefore, removing the mandate solves the problem. No? So, go ahead, kids, and eat your dessert first. Tell your parents I said so.

During one particularly futile discussion about the subject at an IUID conference I took to doodling in my notebook and produced the following mathematical model.


You see, I spent many frustrating months studying Izhak Gilboa's book "Decision Theory Under Uncertainty", so I figured that I would finally put some of that knowledge to good use. Gilboa is one of the living legends in the field of decision theory. Another one of his famous works is "Rational Choice".  I must warn you that his works are no light reading. One of the things I usually found when reading Gilboa was that it would take me a lot of effort to get through his mathematical models and proofs, only to arrive at a "duh" moment. So you mean that green marbles are not blue?

To summarize my doodle, the model is actually a mathematical representation using decision theory notation of a very fundamental issue: what is the rational choice?  If the set of choices are Navy ERP, IUID alone, or a combination of the two?

The value of the utility function was in terms of FIAR compliance.

The conclusion of my doodling is that concentrating on ensuring that the Accountable Property System of Record (APSR) is FIAR compliant would be a more rational choice than trying to comply with both the IUID mandate and APSR FIAR compliance, since IUID alone would never provide 100% FIAR compliance. Like my kids would say... uhmmm.... duh! Once we put it like that... who needs math?

Monday, April 3, 2017

What's the formula for statistical sampling?


Statistical sampling, for some reason, is one of those concepts that gives some people a lot of trouble. There is a book by Daniel Kahneman titled "Thinking, Fast and Slow" that I always recommend. One of the concepts Kahneman likes to write about is System 1 thinking vs. System 2 thinking. I suppose that those who feel comfortable when dealing with uncertainty and probabilistic models are people who are good system 2 thinkers.

So, what is statistical sampling? To keep things simple, it may be better that we start with an example.

Suppose that we work at a light bulb factory that makes millions of light bulbs per day and are in charge of testing the light bulbs. We could test for many things but, to keep things simple, let's say that we are testing whether a light bulb can survive a drop from 3 feet onto concrete. I am not sure why anyone would want that, but it sounds like fun.



There is one sure way to ensure that we can test this and be absolutely certain that we know what percentage of our light bulbs could pass the test. That is, by dropping every single light bulb on a concrete floor.

The problem with that approach is that we need the light bulbs for other things, such as selling them to make a profit.

Statistics gives us a way to test the light bulbs and have less broken glass to sweep.

Using statistical sampling, we could take a small sample of light bulbs and use them to represent the entire batch. We just need to follow some simple rules:
  • We must choose the light bulbs in our sample at random
  • Each light bulb must have an equal chance of being selected
  • We must select a large enough sample size... (more on that in another post)

So, if we sampled one thousand light bulbs at random and 100 of them broke, statistics allows us to say that 90% of all light bulbs we produce can survive a fall of 3 feet onto a concrete floor.

What scares people off is what follows. There is a probabilistic nature to statistical processes. So the result of our drop test is not really 90% - it is actually "around 90%", in other words a range of values determined by something called a "confidence interval".

We are not going to get that deep into statistical sampling in this post, but I wanted to start things off with a very simple scenario that shows the benefit of statistical sampling.