The complete question
was: The other day I was trying to explain the
difference between “impossible” and “with probability zero” to a friend. I
remarked “if you pick an integer, and I pick an integer, the mathematical
probability that we picked the same number is zero, but it’s certainly not
impossible.” A seemingly harmless statement, but only later did I think, what
does it even mean to “pick ANY integer with EQUAL probability?” Is there any
meaning to a random variable that can take on an infinite number of values with
equal probability? It would be like a uniform distribution that has one or both
bounds stretched to infinity. Does this kind of object have a name? What kind of
mathematical properties would this even have? Oh dear, this is blowing my mind
just thinking about it. Please help!
Mathematician:
Developing a consistent theory of probability for sets with an infinite number
of elements (like the set of all integers, the set of all real numbers, or the
set of real numbers between 0 and 1) requires dealing with a handful of subtle
and tricky problems. In fact, to determine whether it is possible to choose one
object from an infinite number of objects such that each object has the same
probability of being chosen, we must delve into what we really mean by
“possible”. Various interpretations of this question are:
1. Can we work with
probabilities (as they are defined in a formal, mathematical way) on infinite
sets of items, and if so, can we assign equal probabilities to each item?
2. Can a (formal,
mathematical) procedure be devised to sample uniformly at random from an
infinite set?
3. Can an algorithm be
designed that can carry out this sampling procedure on a real computer such that
the algorithm will terminate in a finite amount of time?
Each of these questions,
which I’ll address one by one, raises interesting considerations.
Can a formal theory of
probability be developed for infinite sets of items? Absolutely. For example,
the Gaussian distribution (often known as the bell curve or normal distribution)
is defined on the set of real numbers (so when you draw a sample according to
this distribution, you get some real number). Strictly speaking, it assigns a
probability of zero to each of the individual real numbers, but assigns a
non-zero probability to subsets of real numbers. For example, if we sample from
a Gaussian distribution, there is a formula that can tell us how likely the
number is to be less than any particular value X (so the set of all numbers less
than X is assigned a positive probability). Why does the gaussian distribution
not assign non-zero probabilities to the actual numbers themselves? The reason
(loosely speaking) is because the probability of getting ANY number when we
sample from a distribution must be 1 (which just means that some number must
always occur). On the other hand, the set of real numbers contains so many
numbers that, if each of them had a non-zero probability of occurring, it would
not be possible for the total probability (which is, essentially, just the sum
or integral of all of the numbers’ probabilities) to be 1. Another, more
intuitive way to think about this is to consider a dart board. If we throw a
dart at the board and are willing to assume that matter is infinitely divisible
(sorry, Physicist) then we will always hit some point on the board (assuming our
aim isn’t too terrible). But, at the same time, the chance of hitting any
particular point is negligibly small (zero in fact) since there are so many
possible points. So clearly, to describe the probability of hitting this board’s
points, it is not sufficient to consider only the probability of each individual
point being hit, but rather we have to consider how likely we are to hit various
regions of the board, such as the region constituting the bullseye. Even though
each particular point has a zero probability of being hit, some point
is always hit in practice, and the set of points that make up the bullseye
together have a positive probability of being hit with each throw.
Okay, so we can
define probabilities on an infinite set, though as the Gaussian distribution
case shows, we may have to actually let the probabilities be assigned to subsets
of our original set, rather than to every object in the set itself. But can we
do this in such a way that every object in our set is sampled with equal
likelihood? The answer is again yes, though with some caveats. For instance, if
we limit ourselves to the real numbers that are between 0 and 1, we can assign a
uniform distribution to these numbers which will give them each an equal
probability. The uniform distribution basically says that the probability that a
given sampled number will be between a and b (for 0<a<b<1) is equal to
b-a. This fact implies that all numbers are equally likely to be sampled from
this distribution.
Fine, but can we define a
probability distribution on the set of integers (rather than the real numbers
between 0 and 1) such that they each occur with equal probability (i.e. does a
uniform distribution on the integers exist)? The answer, sadly, is no. A
probability mass function (which is the kind of probability distribution we need
in this case) is defined to be a positive function that has a sum of values
equal to 1. But any positive function that assigns an equal value to each
integer must have probabilities that sum to either infinity or zero, so the
desired distribution is impossible to construct. As a technical side note
though, people sometimes try to get around this issue in Bayesian analysis by
applying what are known as improper priors. Attempting to define a uniform
distribution on the full set of real numbers also fails, for a very similar
reason that it doesn’t work on the integers (it can not be the case that each
real number (or equally sized interval of real numbers) has the same probability
and the probability density function integrates to 1).
On to our second question of
whether is it possible to come up with a formal mathematical procedure for
sampling from infinite sets. The answer is yes, if we have an unlimited amount
of time to spare. For real numbers between 0 and 1, we can use the following
sampling procedure:
i. Start with the number
0.0, and set n=1
ii. Set the nth digit of our
number after the decimal point to a random number from 0 to 9.
iii. Increase n by 1 and
return to step ii.
If this procedure were
iterated forever, it would produce a single random number between 0 and 1, and
all real numbers between 0 and 1 are equally likely to be generated. Essentially
we are just constructing a number by choosing each of it’s decimal digits
randomly.
But what if we wanted to
carry out this procedure for the set of all integers? Here things get stranger.
The natural choice is the following procedure:
i. Start with the number
0.0, and set n=1
ii. Set the nth digit of our
number BEFORE the decimal point to a random number from 0 to 9.
iii. Increase n by 1 and
return to step ii.
If this procedure were
carried out forever, it would seem as though it would produce an integer, and
that this integer would have an equal probability of being any integer. This is
true, in the sense that all integers that the algorithm produces have an equal
likelihood of being produced (i.e. when it DOES produce integers those integers
are each equally likely). But the algorithm does not actually do what we would
like. We begin to see the problem when we pick any integer X, and ask the
question, “what is the probability that this procedure would produce a number
that is greater than X?”. The answer, is that there is a probability of 1 (i.e.
a 100% chance), NO MATTER WHAT X IS. This makes sense, given that the integers
stretch off to infinity, and that the number of integers close to 0 will always
be dwarfed by the number of them far from it. But how can this procedure (when
run forever) produce one, single integer, while at the same time having a
probability of 1 of producing a number bigger than any particular number we
choose? Well, each integer can be thought of as having an infinite sequence of
zero digits to the left of its first non-zero digit. This algorithm will have a
probability of 0 of producing a number with an infinite successive sequence of
zeros, and therefore will have a zero probability of producing an integer! In
other words, there is a probability of 1 that each number it produces will be
(in a certain sense) “infinite”, so it does not serve the purpose that we
hoped.
This brings us to our final
question, regarding whether a terminating algorithm (that can be run on a real
computer) can be created that will sample uniformly at random from an infinite
set of numbers. The answer to this question is no, but with the footnote that
this does not matter too much in practice (for reasons to be discussed). One way
to understand why this is impossible is to consider how many bits it would take
to transmit the number produced by such a sampling procedure. We would have to
transmit some kind of code (agreed upon in advance) that represents the number
we got from sampling, but since there are an infinite number of possible
outcomes and since we have no knowledge (in advance) of what number will occur,
we would need to have an infinite number of codes to represent all the outcomes.
Hence, if we think of the codes as numbers, and choose some number N, then some
of the codes must contain more than N digits (since N digits is only enough to
describe a finite number of different codes). But since this holds for all N no
matter how large, this means that, on average, it would take literally forever
to transmit one of these codes. But, if the number cannot be transmitted, no
computer could ever make a copy of it, which implies that no computer could ever
generate such a number. What this confusing, convoluted argument is getting at
is the fact that a uniform distribution on an infinite set of items (if it
existed) would have an infinite entropy, so the numbers sampled from such a
procedure could never be transmitted or stored (as doing so, on average, would
require an infinite number of bits) so there is no way that such an algorithm
could be used in real life. One way to see that the entropy of such a
distribution would be infinite is to note that if we define a uniform
distribution on n items, that as we let n go to infinity the entropy of the
distribution approaches infinity.
Despite these problems, some
infinite sets have a nice property that the procedure of sampling (with equal
probability) from them can be nicely approximated on a computer. For example, if
we want to sample a real number between 0 and 1, we can approximate this
procedure by limiting ourselves only to numbers with at most 40 digits after the
decimal point, and then sampling uniformly at random from this restricted set.
While this procedure is not perfect, it will produce numbers that (for most
purposes) look like those we would get if we truly sampled from all real numbers
between 0 and 1. On the other hand, sampling uniformly at random from the set of
all integers cannot be approximated in any nice way (and hence, is in some sense
an inaccessible procedure). The problem here is that, as noted, if you fix any
number X that you like, 100% of integers are greater than that X, no matter what
X is. Since real computers are limited in the size of the numbers they can
store, any attempt to approximate the procedure of sampling from all integers
will be limited to sampling from integers less than some number X, despite the
fact that 100% of integers truly are above X. If we try to sample uniformly at
random from the set of all integers (or the set of real numbers, for that
matter) we are doomed to complete failure.
댓글 없음:
댓글 쓰기