6. Probability
← How Formal Systems Reorganize Belief
From Certainty To Uncertainty
In the previous chapters we encountered several procedures that stabilize conviction by aligning the reasoning process to formal devices. This progression culminated in logical systems, which are designed so that true premises lead to true conclusions through clearly defined rules.
Most domains do not offer the luxury of definitely true premises.
Measurements fluctuate. Observations vary. Outcomes depend on processes that cannot be predicted with certainty.
In such situations strict logical derivation cannot be applied directly. Yet structured procedures have been developed that still allow conviction to stabilize in a controlled way.
The systematic investigation of these patterns forms the domain of probability and statistics.
Games Of Chance
The earliest systematic investigations of uncertainty arose in the context of games of chance and betting.
Dice and cards produce outcomes that are unpredictable in individual cases, yet repeated play reveals stable patterns. Betting turns these patterns into practical questions: "How should the pot be divided fairly if a game ends early?", or "When should I bet on a game at all?"
The early answers seem simple. One lists the possible outcomes, counts how many favor a certain bet, and compares that number to all possible outcomes.
From this perspective the convincing force of probability appears to arise mostly from counting and taking ratios.
But these calculations rely on a deeper idea that emerged in this context: the concept of a random device.
Random Devices And Possible Outcomes
A random device is a device that has a known set of mutually exclusive possible outcomes and, when used, produces one outcome from that set. But it is unpredictable which outcome will appear.
Examples of random devices are urns filled with colored balls, a deck of cards, a die, lottery tickets in a basket, socks in a drawer, or modern random number generator algorithms.
However, these objects do not automatically function as random devices. They do so only when used in a specific way.
There needs to be mixing, shuffling, throwing, shaking, or simply not knowing the result before it is revealed. In the case of random number generators, the seed must not be known.
What makes such situations tractable is that the set of possible outcomes is fixed and known in advance. In addition, the procedure is arranged so that no outcome can be deliberately selected.
The Die Model
The die relies on symmetry to assign probabilities.
What is remarkable about this reasoning is that the probabilities are not guessed or estimated. They are read directly from the structure of the situation. The symmetry of the device allows the possible outcomes to be treated as equally likely, and the probabilities follow from counting.
When it is rolled or thrown in uncontrolled ways, this symmetry means that no face has a higher or lower chance of appearing than any other.
Looking at one throw, the thus equally probable outcomes are
1, 2, 3, 4, 5, 6.
And only one will appear.
If we want to bet on a six, we count the favorable outcomes and divide them by all possible outcomes. There is one favorable outcome out of six, so the probability is $\frac{1}{6}$.
A fair bet would therefore have to pay out in proportion to this probability. Over many games, such a bet yields no average advantage to either side.
Looking at two throws, the equally probable outcomes are
(1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6)
(2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6)
(3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6)
(4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6)
(5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6)
(6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6).
Each pair represents one possible sequence of results. Because each individual throw has six equally possible outcomes, the game has $6 \times 6 = 36$ equally possible outcomes.
If we bet on getting at least one six, we count all outcomes that contain a six. These are the last row and the last column. There are 11 such cases.
The probability of at least one six in two throws is therefore $\frac{11}{36}$.
Probability calculations can be simplified and extended, and a rich theory has been built on this foundation. But the basic idea is straightforward. Because the die is symmetric, all elementary outcomes are treated as equally probable. To determine the probability of an event, one counts the outcomes that produce it and compares them with the total number of possible outcomes. The resulting fraction gives the probability and therefore the fair betting odds.
Symmetry is not the only way to obtain such probabilities. Whenever we have a process that selects outcomes from a known set without favoring any particular one, they are treated as equally probable. In a lottery basket the tickets are mixed and drawn blindly. In the case of socks in a drawer we rely on the fact that socks are usually thrown in without order and that the selection is blind.
Under these conditions the probabilities are not merely guesses. They are determined by the structure of the situation we look at. Mathematics becomes applicable to life.
The Cards Model
Cards introduce a small complication.
Assume we have a deck of only six cards labeled 1 to 6.
Drawing one card from the shuffled deck is essentially the same situation as throwing a six-sided die. Each card is indistinguishable during the draw, so the six possible outcomes are equally likely.
But after that card was drawn and not returned to the deck, this changes. The second draw no longer has six possible outcomes. One card has already been removed. Only five remain.
The second draw therefore resembles throwing a smaller die with five faces. But which five faces exactly depends on the first result.
If the first card was 1, the remaining possibilities are 2, 3, 4, 5, 6.
If the first card was 2, the remaining possibilities are 1, 3, 4, 5, 6.
And so on.
In other words, the probabilities for the next step depend on what has already happened. This resembles using a smaller die on each draw. Each die has fewer and fewer options, and which options disappear depends entirely on the previous draws.
To keep track of the probabilities we need a form of bookkeeping. We can no longer speak simply of the probability of drawing a certain card in a certain draw. Instead we must ask for the probability of drawing that card given the cards that have already been drawn. This is called a conditional probability. The set of previous draws defines the state of the process.
The basic idea remains the same as before. We still rely on a random device and counting of outcomes. But the set of possible outcomes at each step now depends on the state created by earlier steps.
Probability Theory And Propositional Logic
Probability theory eventually takes the patterns we have just seen and organizes them into a symbolic system.
In this respect it resembles propositional logic from the previous chapter. Logic introduced symbols such as $A$ and $B$ to represent statements and then specified rules for how these statements may be combined.
Probability theory uses a similar strategy. In the presentation used here, the symbols now represent outcomes rather than statements.
The set of possible outcomes of a random device is often called the sample space in probability theory.
We write $P(A)$ for the probability that outcome $A$ occurs, as in the die example.
We introduce $P(A|B)$ for the probability that outcome $A$ occurs given outcome $B$ has occurred, as in the cards example.
Outcomes can then be combined in ways that parallel the logical connectives introduced earlier.
For example, the outcome $A\ and\ B$ means that both outcomes occur. In the die example this could represent "a six appears in the first throw and a six appears in the second throw."
The probability of this outcome has the form $P(A\ and\ B)=P(A) \times P(B|A)=P(B) \times P(A|B)$, which expresses that the probability of two outcomes occurring together equals the probability of one outcome multiplied by the probability of the other given the first.
The corresponding rules for the other connectives are:
$P(A\ or\ B) = P(A) + P(B) - P(A\ and\ B)$, which expresses that the probability of either outcome occurring is the sum of their probabilities minus the overlap counted twice.
$P(A|B) = \frac{P(A)P(B|A)}{P(B)}$, which relates conditional probabilities in both directions. It is obtained by rearranging the two products in the rule for $P(A\ and\ B)$.
$P(not\ B) = 1 - P(B)$, which expresses that the probability of an outcome not occurring is the remainder to one.
These symbolic rules allow probability calculations to be carried out without listing every possible outcome explicitly, as we did in the examples above. Like logic, probability can be organized into a system of rules that can be applied to problems. This system has proven reliable in practice, though now calculation is involved when applying the rules.
For example, consider the case of two throws of a die. Let A be the outcome "the first throw shows a six" and B the outcome "the second throw shows a six". Instead of asking for at least one six, we ask for exactly two sixes.
From symmetry we know that $P(A)=\frac{1}{6},P(B)=\frac{1}{6}$.
The result of the first throw does not influence the second because dice have no memory, thus also $P(B∣A)=P(B)=\frac{1}{6}$.
Using the multiplication rule we obtain $P(A and B)=P(A) \times P(B∣A)=\frac{1}{6}\times\frac{1}{6}=\frac{1}{36}$.
This is the outcome (6, 6) in the table above, which appears once in the 36 possible outcomes.
Like logical calculi, probability theory does not by itself initially possess much convincing force. The symbols and rules are introduced deliberately. Their convincing force arises elsewhere: they summarize the simpler models we have already seen, stabilize reasoning by constraining possible steps, and prove useful across a wide range of structured practical and scientific contexts.
In that sense probability theory and logic share a similar role. They are not read directly off our thinking. They are refined systems built on earlier patterns of conviction formation. Their value lies in organizing those patterns, preventing certain kinds of mistakes, and providing mathematical guarantees when their assumptions are satisfied.
Some authors, such as Cox, Jaynes, and de Finetti, also tried to show that the probability rules themselves can be motivated from simpler and more compelling constraints. This does not make the theory a direct reading of thought, but it does show that its formal structure was not introduced in a wholly arbitrary way.