FOUNDATIONS
OF THE
THEORY OF PROBABILITY
BY
A. N. KOLMOGOROV
Second Enllli.la Edition
TRANSLATION EDITED BY
NATHAN MORRISON
WITH AN ADDED BIBLIOCRAPHY BY
A. T. BHARUCHA·REID
UNIVERSITY OF ORECON
CHELSEA PUBLISHING COMPANY
NEW YORK
COPYRIGHT 1950 BY CHELSEA PUBLISHING COMPANY
COPYRIGHT © 1956, CHELSEA PUBLISHING COMPANY
LIBRARY OF CONGRESS CATALOGUE CARD NUMBER 56-11512
PRINTED IN THE UNITED STATES OF AMERICA
EDITOR'S NOTE
In the preparation of this English translation of Professor
Kolmogorov's fundamental work, the original German monograph
Grundbeuriff eder Wahrscheinlichkeitrechnunu which appeared
in the Ergebnisse Der Mathematik in 1933, and also a Russian
translation by G. M. Bavli published in 1936 have been used.
It is a pleasure to acknowledge the invaluable assistance of
two friends and former colleagues, Mrs. Ida Rhodes and Mr.
D. V. Varley, and also of my niece, Gizella Gross.
Thanks are also due to Mr. Roy Kuebler who made available
for comparison purposes his independent English translation of
the original German monograph.
Nathan Morriaon
PREFACE
The purpose of this monograph is to give an axiomatic
foundation for the theory of probability. The author set himself
the task of putting in their natural place, among the general
notions of modern mathematics, the basic concepts of probability
theory-concepts which until recently were considered to be quite
peculiar.
This task would have been a rather hopeless one before the
introduction of Lebesgue's theories of measure and integration.
However, after Lebesgue's publication of his investigations, the
analogies between measure of a set and probability of an event,
and between integral of a function and mathematical expectation
of a random variable, became apparent. These analogies allowed
of further extensions; thus, for example, various properties of
independent random variables were seen to be in complete analogy
with the corresponding properties of orthogonal functions. But
if probability theory was to be based on the above analogies, it
still was necessary to make the theories of measure and integra
tion independent of the geometric elements which were in the
foreground with Lebesgue. This has been done by Frechet.
While a conception of probability theory based on the above
general viewpoints has been current for some time among certain
mathematicians, there was lacking a complete exposition of the
whole system, free of extraneous complications. (Cf., however,
the book by Frechet, [2] in the bibliography. )
I wish t o call attention to those points of the present exposition
which are outside the above-mentioned range of ideas familiar to
the specialist. They are the following : Probability distributions
in infinite-dimensional spaces ( Chapter III, § 4) ; differentiation
and integration of mathematical expectations with respect to a
parameter ( Chapter IV, § 5 ) ; and especially the theory of condi
tional probabilities and conditional expectations ( Chapter V) .
It should be emphasized that these new problems arose, of neces
sity, from some perfectly concrete physical problems.1
1 Cf., e.g., the paper by M. Leontovich quoted in footnote 6 on p. 46; a110 the
joint paper by the author and M. Leontovich, Zur Statistik der kontinuie".
lichen Systeme und des zeitlichen Ve".lau!e8 der physikali8chen Vorgiinge.
PhYI. Jour. of the USSR, Vol. 3, 1983, pp. 35-63.
v
vi Preface
The sixth chapter contains a survey, without proofs, of some
results of A. Khinchine and the author of the limitations on the
applicability of the ordinary and of the strong law of large num
bers. The bibliography contains some recent works which should
be of interest from the point of view of the foundations of the
subject.
I wish to express my warm thanks to Mr. Khinchine, who
nas read carefully the whole manuscript and proposed several
improvements.
Kljasma near Moscow, Easter 1933.
.4.. Kolmogortn
CONTENTS
Page
EDITOR'S NOTE. . • • . . . • . • . • • . . . . . . . . • . . . . . . . . • • • . . • • • • • iii
P REFACE . . . . • . • • • . • • . • • • . • . . . . . . . . . . . . . . . . . . . • • • • • • • . v
I. ELEMENTARY THEORY OF PROBABILITY
§ 1. Axioms . ...... ...................... .. ...... ... .. 2
§ 2. The re lation to experimen tal data . . . . . . . . . . . . . . . . . .. 3
§ 3. Notes on terminology . . . . , ....... , .... , .... , , , . , , .. 5
§ 4. Immediate corollaries of the ax ioms ; conditional pro ba-
bil it ies ; Theorem of Ba yes . . . . . . . . . . ....... . . , , . " 6
§ 5. Independence ,.,.,., . . ,........................... 8
§ 6. Conditiona l probab ilities a s random var ia ble s; Marko v
chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 12
n. INFINITE PROBABILITY FIELDS
§ 1. Axiom of Cont in uity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 14
§ 2. Bo rel fields of pro babil it y. . . . . . . . . . . . . . . . . . . . . . . . . .. 16
§ 3. Examp les of in fin ite fiel ds of probabil it y . . . . . . . , . . . . . 18
III. RANDOM VARIABLES
§ 1. Probabi lity functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 21
§ 2. De fin it ion of random variables and of d istrib ut ion f unc -
tio ns . . . . . . . . . ...... . . . ... . . . . . . . . . . . .. . . . . .... 22
§ 3. M ult i-d imen siona l d istr ibution f unct ions . . , . . . . . . . . . .. 24
§ 4. Probab ilities in infin ite-d im ens ional spaces . , . . . . . . . . .. 27
§ 5. Eq uiva lent ran do m var ia bl es ; vari ous k inds of converg-
ence . . . . .. . . . . . , ........ , ..... , ..... " ........ 33
IV. MATHEMATICAL EXPECTATION8
§ 1. Abstract Lebesgue inte grals , , .... , .. , .. , .......... , 37
§ 2. Abso lute and cond iti onal mathematical e xpectat ions . , .. 39
§ 3. The Tcheb ycheff ine qua lit y . . . . . . . . . . . . . . . . . . . , , .. " 42
§ 4. So me cr iter ia for con vergence ....... ' . . . . .. . , , . . . . , .. 43
§ 5. Differentiat ion an d inte gra tion of mathe mat ical expecta -
tions w ith re spect to a para meter . ... � , . . . . . . . . . . .. 44
vii
viii
V. CONDITIONAL PROBABILITIES AND MATHEMATICAL
EXPECTATIONS
§ 1 . Conditional probabilities . . . . . . . . . . . . . . . . . . .. . . . .. . . 47
§ 2. Explanation of a Borel paradox . . . . . . . . . . . . . . . . . . . .. 50
§ 3. Conditional probabilities with respect to a random vari-
able . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
§ 4. Conditional mathematical expectations. . . . . . . . . . . . . .. 52
VI. INDEPENDENCE; THE LAw OF LARGE NUMBERS
§ 1. Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
§ 2. Independent random variables . . . . . . . . . . . . . . . . . . . . . .. 58
§ 3. The Law of Large Numbers . . . . . . . . . . . .. . . . . . . . . . . . . 61
§ 4. Notes on the concept of mathematical expectation . . .. . . 64
§ 5. The Strong Law of Large Numbers; convergence of a
series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
ApPENDIx-Zero-or-one law in the theory of probability . . ... 69
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . 73
NOTES TO SUPFLEMENTARY BIBLIOGRAPHY ... . . . ... . . " .... 77
SUPPLEMENTARY BIBLIOGRAPHy . . . . . . . . . . . . . . . . . .. . . . . .. 81
Chapter I
ELEMENTARY THEORY OF PROBABILITY
We define as elementary theory of probability that part of
the theory in which we have to deal with probabilities of only a
finite number of events. The theorems which we derive here can
be applied also to the problems connected with an infinite number
of random events. However, when the latter are studied, essen
tially new principles are used. Therefore the only axiom of the
mathematical theory of probability which deals particularly with
the case of an infinite number of random events is not introduced
until the beginning of Chapter II (Axiom VI) .
The theory of probability, as a mathematical discipline, can
and should be developed from axioms in exactly the same way
as Geometry and Algebra. This means that after we have defined
the elements to be studied and their basic relations, and have
stated the axioms by which these relations are to be governed,
all further exposition must be based exclusively on these axioms,
independent of the usual concrete meaning of these elements and
their relations.
In accordance with the above, in § 1 the concept of a field of
probabilities is defined as a system of sets which satisfies certain
conditions. What the elements of this set represent is of no im
portance in the purely mathematical development of the theory
of probability (cf. the introduction of basic geometric concepts
in the Foundations of Geometry by Hilbert, or the definitions of
groups, rings and fields in abstract algebra) .
Every axiomatic ( abstract ) theory admits, as is well known,
of an unlimited number of concrete interpretations besides those
from which it was derived. Thus we find applications in fields of
science which have no relation to the concepts of random event
and of probability in the precise meaning of these words.
The postulational basis of the theory of probability can be
established by different methods in respect to the selection of
axioms as well as in the selection of basic concepts and relations.
However, if our aim is to achieve the utmost simplicity both in
1
2 I. Elementary Theory of Probability
the system of axioms and in the further development of the
theory, then the postulational concepts of a random event and
its probability seem the most suitable. There are other postula
tional systems of the theory of probability, particularly those in
which the concept of probability is not treated as one of the basic
concepts, but is itself expressed by means of other concepts.1
However, in that case, the aim is different, namely, to tie up as
closely as possible the mathematical theory with the empirical
development of the theory of probability.
§ 1. Axiomaa
Let E be a collection of elements �, "1, " • • • , which we shall call
elementary events, and if a set of subsets of E; the elements of
the set lY will be called random events.
I. lY is a fieldS of sets.
II. � contains the s,et E.
III. To each set A in � is assigned a non-negative real number
peA). This number peA) is called the probability of the event A.
IV. peE) equals 1 .
V. If A and B have no element in common, then
peA + B) = P(A) + PCB)
A system of sets, �, together with a definite assignment of
numbers peA), satisfying Axi-oms I-V, is called a field of prob
ability.
Our system of Axioms I-V is consistent. This is proved by the
following example. Let E consist of the single element � and let �
consist of E and the null set O. peE) is then set equal to 1 and
P(O) equals O.
1 For example, R. von Mises[1] and [2] and S. Bernstein [1] .
, The reader who wishes from the outset to give a concrete meaning to the
following axioms, is referred to § 2.
• Cf. HAUSDORFF, Mengenlehre, 1927, p. 78. A system of sets is called a field
if the sum, product, and difference of two sets of the system also belong to the
same system. Every non-empty field contains the null set O. Using Hausdorff's
notation, we designate the product of A and B by AB ; .. the sum by A + B in
the case where AB = 0; and in the general ease by A + B; the difference of
A and B by A-B. The set E-A, which is the complement of A, will be denoted
by A. We shall assume that the reader is familiar with the fundamental rules of operations of sets and their sums, products, and differenees. All subsets of 8' will be designated by Latin capitals.
§ 2. The Relation to Experimental Data 3
Our system of axioms is not, however, complete, for in various
problems in the theory of probability different fields of proba
bility have to be examined.
The Construction of Fields of Probability. The simplest fields
of probability are constructed as follows. We take an arbitrary
finite set E = {EL• EI • • • •• Et} and an arbitrary set {PI' Pi" .. , Pt}
of non-negative numbers with the sum Pl + P2 + . . . + Pk = 1.
� is taken as the set of all subsets in E, and we put
P{"" ''- ' " ., Ei.t}
= Pi, + P,- + . . . + 1"1'
In such cases, Ph P2, • • • , p" are called the probabilities of the
elementary events ell e20 . . . , �" or simply elementary probabili
ties. In this way are derived all possible finite fields of probability
in which if consists of the set of all subsets of E. (The field of
probability is called finite if the set E is finite.) For further
examples see Chap. II, § 3.
§ 2. The Relation to Experimental Data4
We apply the theory of probability to the actual world of
experiments in the following manner:
1) There is assumed a complex of conditions,S, which allows
of any number of repetitions.
2) We study a definite set of events which could take place as
a result of the establishment of the conditions S. In individual
cases where the conditions are realized, the events occur, gener
ally, in different ways. Let E be the set of all possible variants
ill f2, . . . of the outcome of the given events. Some of these vari
ants might in general not occur. We include in set E all the vari
ants which we regard a priori as possible.
3) If the variant of the events which has actually occurred
• The reader who is interested in the purely mathematical development of
the theory only, need not read this section, since the work following it is based
only upon the axioms in § 1 and makes no use of the present discussion. Here
we limit ourselves to a simple explanation of how the axioms of the theory of
probability arose and disregard the deep philosophical dissertations on the
concept of probability in the experimental world. In establishing the premises
necessary for the applicability of the theory of probability to the world of
actual events, the author has used, in large measure, the work of R. v. Mises,
[1] pp.21-27.
4 I. Elementary Theory of Probability
upon realization of conditions S belongs to the set A ( defined in
any way), then we say that the event A has taken place.
Example : Let the complex S of conditions be the tossing of a
coin two times. The set of events mentioned in Paragraph 2}con
sists of the faCt that at each toss either a head or tail may come up.
From this it follows that only four different variants (elementary
events) are possible, namely : HR, RT, TR, TT. If the "event A"
connotes the occurrence of a repetit ion, then it will consist of a
happening of either of the first or fourth of the four elementary
events. In this manner, every event may be regarded as a set of
elementary events.
4) Under certain conditions, which we shall not discuss here,
we may assume that to an event A which may or may not occur
under conditions S, is assigned a real number P(A) which has
the following characteristics:
(a) One can be practically certain that if the complex of con
dit ions S is repeated a large number of t imes, n, then if m be the
number of occurrences of event A, the ratio mIn will differ very
slightly from P (A) .
(b) If P(A) is very small, one can be practically certain that
when conditions S are realized only once, the event A would not
occur at all.
The Empirical De duction of the Axioms. In general, one may
assume that the system � of the observed events A, B, C, . . . to
which are assigned definite probabilit ies, form a field containing
as an element the set E (Axioms I , I I, and the first part of
III , postulating the existence of probabilities ) . It is c lear that
o
0, then the quotient
P (B) = P(AB) .t. P(A)
(5)
is defined to be the conditional probability of the event B under
the condition A.
From (5) it follows immediately that
§ 4. Immediate Corollaries of the Axioms 7
P(AB) =P(A) PA(E) • (6 )
And by induction we obtain the general formula (the Multi-
plication Theorem)
P (AI AI' .. A,,)
=
P (At) P A, (AI) p .• , A, (A3) . . . P A, A, ... AM- t (All)' (7)
The following theorems follow easily:
P4(B) > 0,
PiE ) = 1,
P A(B + C) = P A(B) + p .. (C).
(8)
(9)
(10)
Comparing formulae (8 )-(10 ) with axioms III-V, we find that
the system tr of sets together with the set function P A(B) (pro
vided A is a fixed set) , form a field of probability and therefore,
all the above general theorems concerning P (B) hold true for the
conditional probability P A(B) (provided the event A is fixed ) .
It is also easy to see that
P .l(A) = 1 .
From (6 ) and the analogous formula
P (AB) = PCB) PB (A)
we obtain the important formula :
(11 )
P (A) = P(AlP..dB) (12) B P(B) '
which contains, in essence, the Theorem of Bayes.
THE THEOREM ON TOTAL PROBABILITY: Let Al + A2 + . . . +
An = E (th is assumes that the events Ah A2, • • • , An are mutually
exclusive) and let X be arbitrary. Then
P(X) = P(A1) PA,(X) + P(Az) PA,(X) + . . . + P(AII) PAM(X) . . (13)
Proof:
X = A1X + A2X + . . . + A"X;
using ( 4 ) we have
P(X) = peAl X) +P(A2 X) + . . . + P (A" X)
and according to (6 ) we have at the same time
P (A;X) = P (Ai) PAt (X).
THE THEOREM OF BAYES: Let Al + A2 + . . . + An = E and
X be arbitrary, then ,
P(Aj)PA,(X)
\
Px(A.) = P(A1)PA,(X) + P(A2)PA.(X) + ... :t peA,,) PA.(X)' (14)
� = 1, 2, 3 • . . . ,n.
8 I. Elementary Theory of Probability
AI, Az, • • • , Aft are often called "hypotheses" and formula
(14) is considered as the probability Px (Ai) of the hypothesis
AI after the occurrence of event X. [P (At) then denotes the
a priori probability of Ai']
Proof: From (12) we have
P (A-)
_ P(A;) PAi(X)
x • - P(X) .
To obtain the formula (14) it only remains to substitute for the
probability P (X) its value derived from (13) by applying the
theorem on total probability.
§ 5. Independence
The concept of mutual independence of two or more experi
ments holds, in a certain sense, a central position in the theory of
probability. Indeed, as we have already seen, the theory of
probability can be regarded from the mathematical point of view
as a special application of the general theory of additive set func
tions. One naturally asks, how did it happen that the theory of
probability developed into a large individual science possessing
its own methods?
In order to answer this question, we must point out the spe
cialization undergone by general problems in the theory of addi
tive set functions when they are proposed in the theory of
probability.
The fact that our additive set function P(A) is non-negative
and satisfies the condition P(E) = 1, does not in itself cause new
difficulties. Random variables (see Chap. III) from a mathe
matical point of view represent merely functions measurable with
respect to P (A), while their mathematical expectations are
abstract Lebesgue integrals. (This analogy was explained fully
for the first time in the work of Frechet6.) The mere introduction
of the above concepts, therefore, would not be sufficient to pro
duce a basis for the development of a large new theory.
Historically, the independence of experiments and random
variables represents the very mathematical concept that has given
the theory of probability its peculiar stamp. The classical work
or LaPlace, Poisson, Tchebychev, Markov, Liapounov, Mises, and
• See Frechet [1] and [2].
§ 5. Independence 9
Bernstein is actually dedicated to the fundamental investigation
of series of independent random variables. Though the latest
dissertations (Markov, Bernstein and others) frequently fail to
assume complete independence, they nevertheless reveal the
necessity of introducing analogous, weaker, conditions, in order
to obtain sufficiently significant results (see in this chapter § 6,
Markov chains).
We thus see,in the concept of independence, at least the germ
of the peculiar type of problem in probability theory. In this
book, however, we shall not stress that fact, for here we are
interested mainly in the logical foundation for the specialized
investigations of the theory of probability.
In consequence, one of the most important problems in the
philosophy of the natural sciences is--:-in addition to the well
known one regarding the essence of the concept of probability
itself-4;o make precise the premises which would make it possible
to regard any given real events as independent. This question,
however, is beyond the scope of this book.
Let us turn to the definition of independence. Given nexperi
ments �(l), �(2), • • • , 2{(>I), that is, n decompositions
i = 1.2 • . . .• n
of the basic set E. It is then possible to assign r = r1r2 • • • r" proba
bilities (in the general case )
PIJ, q, • . . q. = P (A�:) A�:) ... A��») � 0
which are entirely arbitrary except for the single condition7 that
(1)
DEFINITION 1. n experiments 2{(1), 2{(2), • • • , 2{(n) are called
mutually independent, if for any qu Q2, • • • , q .. the following
equation holds true:
P(A�:)A��) ... A��») = P(A�!») P (A�:») ... P(A��»). (2)
• One may construct a field o f probability with arbitrary probabilities sub
ject only to the above-mentioned conditions, as follows: E is composed of r
elements ¢� II, • . . II.. . Let the corresponding elementary probabilities be
PfJ, fJ, • • • ,,_, and finally let A � be the set of all i;IJ, IJ, • •
•
fl. for which
qi = q.
10 I. Elementary Tl'eory of Probability
Among the r equations in (2), there are only r-r1-r2- • • • -r" +
n - 1 independent equations8•
THEOREM I. If n experiments 91(1), 91(2), . . • , 91(") are mutu
ally independent, then any m of them (m < n) , 9,[(;'), �(il), • • • , �(i .. ),
are also independent9•
In the case of independence we then have the equations:
p (A�:')A��) ... A�::») = p (A�:'») p (A�:'») . . . p (A�::») (3)
(all ik must be different.)
DEFINITION II. n events Ah A2, • • • , A" are mutMlly indepen
dent, if the deconpositions (trials)
E = Ak + Ak (k = 1.2 •. . .• n)
are independent.
In this case r1 = r2 = . . . = r" == 2, r = 2"; therefore. of the 2"
equations in (2) only 2" - n - 1 are independent. The necessary
and sufficient conditions for the independence of the events Al, A2 •
. . . , A" are the following 2" - n - 1 equations"o:
(4)
m = 1, 2, . . . , n,
All of these eqMtions are mutually independent.
In the case n = 2 we obtain from (4) only one condition (22 -2-
• Actually, in the ease of independence, one may choose arbitrarily only
')', +,.. + . . . + 'R probabilities plil = P(A(j)) so as to comply with the n
conditions q y
LP�) = 1.
'/
Therefore, in the general case, we have 1'-1 degrees of freedom, but in the
ease of independence only ')', + 1', + . . . + Y" -no
o To prove this it is sufficient to show that from the rnl.ltUI&1 independence
of n decompositions follows the mutual independence of the first n-l. Let us
assume that the equations (2) hold. Then
P(A(I'A") AIR-I') = ""Ii:"P {A(1'A '�) A'O))
q, II .. • • • 1/" _ I � fit Ij� • • .. I./H
qo
p(A'I)) P(A"») P (AIR-I») �P (Ala)) = p(A'I») p(A'2») p(A'·-I))
1:' 9. , 'I,, '" .. 9'._1 � 9,. 91 'Ii " " · f .. - I ...
II.. Q.E.D.
1. See S. N. Bernstein [1] pp. 47-57. However, the reader can easily prove
this himself (using mathematical induction).
§ 5. Independence 11
1 = 1) for the independence of two events Al and A2 :
P(AIA2) = P(Al) P(A2). (5)
The system of equations (2) reduces itself, in this case, to three
equations, besides (5) :
P(AIA2) = P(Al) P(A2)
P(AIA2) = peAl) P(A2)
P(tLA2) = peAl) P(A2)
which obviously follow from (5).11
It need hardly be remarked that from the independence of
the events At, A2, • • • , A .. in pairs, i.e. from the relations
(i 4';)
it does not at all follow that when n > 2 these events are inde
pendent12• (For that we need the existence of all equations (4).)
In introducing the concept of independence,no use was made
of conditional probability. Our aim has been to explain as clearly
as possible,in a purely mathematical manner, the meaning of this
concept. Its applications, however, generally depend upon the
properties of certain conditional probabilities.
If we assume that all probabilities P (Aq (I» are positive, then
from the equations (3) it follows13 that
P ,t(iIIACI,1 •• • ,t(l .. -I) (A�:-») = P (A�:») . '1 9, (1 .. -1 (6)
From the fact that formulas (6) hold, and from the Multiplica
tion Theorem (Formula (7), § 4), follow the formulas (2). We
obtain, therefore,
THEOREM II: A necessary and sufficient condition for inde
pendence of experiments 91(1), 91(2), • • • , 91(") in the case of posi-
11 P(A1A.) - P(A1) - P(AIA2) := P(AI) - P(A1} P(A.) = P(AI) {, - P(A.)}
- P(AJ P(A-a>, etc.
12 This can be shown by the following simple example (S. N. Bernstein) :
Let set E be composed of four elements�l' �t. �3' Eo; the corresponding elemen
tary probabilities Pt, p" p., p. are each assumed to be � and
A ={�1'�1}' B={�l.E:l}' C=Ut.E,}.
It is easy to compute that
P(A) = PCB) = P(C) = IAI,
P(AB)=P{BC) =P{AC) = l.4 = (IAI)',
P(ABC) = 1;4 =#0 (IAI)' •
.. To prove it, one must keep in mind the definition of conditional proba
bility (Formula (6), § 4) and substitute for the probabilities of products the
products of probabilities according to formula (3).
12 I. Elementary Theory of Probability
tive probabilities P (A�i») is that the conditional probability of
the results Aq(i) of experiments m:(') under the hypothesis that
several other tests �(i,), �(iol, _ _ _ , �(ik) have had definite results
A (i,) A «(,) A (i.) A (ik) is equal to the absolute probability �I ' II ' t. ' · · ·' 'I
P(A/i» .
On the basis of formulas (4) we can prove in an analogous
manner the following theorem:
THEOREM III. If all probabilities P (Ale) are positive, then a
necessary and sufficient condition for mutual independence of
the events Ah Au ... , A" is the satisfaction of the equations
P .Ai, Ai. - -- Ai� (Ai) = P (Ai) (7)
for any pairwise different ih i2, • • • , ik, i.
In the case n = 2 the conditions (7) reduce to two equations:
P .. dAI) = P{A2)' } (8)
P A. (AI) = P (AI) .
It is easy to see that the tirst equation in (8) alone is a necessary
and sufficient condition for the independence of Al and A2 pro
vided P(A1) > O.
§ 6. Conditional Probabilities as Random Variables,
Markov Chains
Let m: be a decomposition of the fundamental set E:
E = Al + A2 + . . . +Ar,
and x a real function of the elementary event t,which for every
set Aq is equal to a corresponding constant aq• x is then called a
random variable, and the sum
is called the mathematical expectation of the variable x. The
theory of random variables will be developed in Chaps. III and IV.
We shall not limit ourselves there merely to those random vari
ables which can assume only a tinite number of different values.
A random variable which for every set Aq assumes the value
P Ag, (B), we shall call the conditional probability of the event B
after the given experiment � and shall designat.e it by p. (B). Two
experiments U(I) and m:(2) are independent if, and only if,
§ 6. Conditional Probabilities as Random Variables, Markov Chains 13
q=1,2, . . .• " ••
Given any decompositions (experiments) 21(1). 2{(2), • • • , 2{(n), we
we shall represent by
2{(1)2{(2) • •
•
21(")
the decomposition of set E into the products
...t (l)A (2) A (n) �, ... • • • f.'
Experiments !l(1). !l(2), • • • , !len) are mutually independent when
and only when
P.m•II) .. . . Ik-t) (A�) = P(A�),
k and q being arbitrary14.
DEFINITION: The sequence 2(1), 21(2), • • • , !l(n), • • • forms
a Markov chain if for arbitrary nand q
PI!I(I)I!IIO) • • • 2(lft-1) (A�n) = PI!II"_I) (A�").
Thus. Markov chains form a natural generalization of se·
quences of mutually independent experiments. If we set
p" .. ".(m,n)=PAI")(A�"I) m 0 , i = 1 , 2 , 3 , · · · , n .
, Cf., for example, LEBEsGuE, Le�ons sur l'integration, 1928, p. 152-156 .
• See the previous note.
• For a definition of Borel sets in R see HAUSDORFF, Mengenlehre, 1927.
pp. 1 77·181.
20 II. Infinite Probability Fields
F (a1• a2 • . . . • aft) is called the distribution function of the vari
ables Xl> X2 • • • • • X".
The investigation of fields of probability of the above type
is sufficient for all classical problems in the theory of probability6.
In particular, a pro.bability function in R" can be defined thus :
We take any non-negative point functio.n { ( Xl ! X2 • • • • • Xn )
defined in R". such that
and set
+00 +00 + ""
f J · · · f I (XI , X2 , · · · , x,.) dx1 dx2 • • • dx,, = 1
- ('<) - 00 - 00
P (A) = f f . . . f ! (xI , x2 • " ' , xn) dx! dX2 ' " dx,. (5)
A
f (X" X2, • • • , xn) is. in this case. the probability density at the
po.int (Xl . X2 • • • • • Xn) (cf. Chap. III. § 2 ) .
Ano.ther type o.f pro.bability function in R" is o.btained in the
follo.wing manner : Let {�,:} be a sequence o.f po.ints o.f R". and
let { P; } be a sequence o.f non-negative real numbers. such that
� PI = 1 ; we then set, as we did in Example I,
P (A) = J;'PI.
where the summatio.n I' extends over all indices i for which e
belo.ngs to. A. The two types o.f pro.bability functio.ns in R" men
tio.ned here do. no.t exhaust all po.ssibilities, but are usually con
sidered sufficient for applicatio.ns of the theory of probability.
Nevertheless. we can imagine problems of interest for applica
tions o.utside of this classical region in which elementary events
are defined by means o.f an infinite number of co.ordinates. The
correspo.nding fields o.f probability we shall study mo.re closely
after intro.ducing several concepts needed for this purpose. (Cf.
Chap. III. § 3 ) .
• Cf., for example, R. v. MISES [ 1 ] , pp. 13-19. Here the existence of proba·
bilities for "all practically possible" sets of an n-dimensional space is
required.
Chapter III
RANDOM VARIABLES
§ 1. Probability Functions
Giv en a mapping of the set E into a set E' consisting of any
type of elements, i.e., a single-valued function u ( �) defined on E,
whose values belong to E'. To each subset A' of E' we shall put
into correspondence, as its pre-image in E, the set u-1 (A') of all
elements of E which map onto elements of A'. Let n:(u) be the
system of all subsets A' of E', whose pre-images belong to the
field ij. n:(u) will then also be a field. If n: happens to be a Borel
field, the same will be true of n:(ltl . We now set
Pc,,) (A' ) = p {u - I (A')} . ( 1 )
Since this set-function P(u) , defined on n:(u), satisfies with respect
to the field n:(u' all of our Axioms I - VI, it represents a proba
bility function on n: (u) . Before turning to the proof of all the facts
just stated, we shall formulate the following definition.
DEFINITION. Given a single-valued function u (e) of a random
event �. The function P(u) (A' ) , defined by (1 ) , is then called the
probability function of u.
R.emark 1 : In studying fields of probability m, P ) , we call the
function P (A ) simply the prob?J>ility function, but PC,,) (A') is
called the probability function of u. In the case u (e) = e, PC,,) (A' )
coincides with P (A ) .
Remark 2 : The event u-1 (A') consists of the fact that u (�)
belongs to A'. Therefore, pcu' (A') is the probability of u (A ')
defined on lJ'. This probability function is called the n-dimensional
probability function of the random va', iables X10 X2 • • • • • x".
As follows directly from the definition of a random variable.
the field lJ' contains. for each choice of i and aj (i = 1. 2 • . . . • n),
the set of all points in R" for which Xi < ai' Therefore lJ' also con
tains the intersection of the above sets. i.e. the set L., a, . . . a.
of all points of Rn for which all the inequalities Xi <.. ai hold
(i = 1 . 2, . . . • n) 1.
If we now denote as the n-dimensional half-open interval
the set of all points in R", for which a. < xi < bit then we see at
once that each such interval belongs to the field lJ' since
[al• at • • • • • all ; bl• b2 • • • • • b,,)
= L", bs • • . b. - La, b, . . . b. - Lb, a. '" . . . b .. - • • • - Lb, b • . . . b" - I /Ja •
The Borel extension of the system of all n-dimensional half
open intervals consists of all Borel sets in R". From this it follows
that in the case of a Borel field of probability, the field � contains
all the Borel sets in the space R".
THEOREM : In the case of a Borel field of probability each Borel
function X = f ( Xl. X2 • • • • • x,,) of a finite number of random vari
ables Xl. X2 • • • • • X" is also a random variable.
All we need to prove this is to point out that the set of all
points (Xl' X2, • • • , X,,) in R" for which X = ! (xl• X2, • • • , x,,) < a,
is a Borel set. In particular, all finite sums and products of random
variables are also random variables.
DEFINITION : The function
F<:z .. :z • • · · · . :z.) (a a a ) - p<:z, . :z. " ". z·> (L ) l' I ' .. · '" II - tl1 0 • . . . a"
is called the n-dimensional distribution. function of the random
variables Xl, X2. • • . , X".
As in the one-dimensional case, we prove that the n-dimensional
distribution function FlZ, . z . . . . . • z.) (alo a2, • • • , an) is non-decreas
ing and continuous on the left in each variable. In analogy to
equations (3) and ( 4 ) in § 2, we here have
, The II, may a1so assume the infinite values ± CIC> .
26 III. Random Variables
lim F(a1, a. , . . . , aft) = F(at, • . • , a' _ l > - 00 , a. + l , . . . , a,,) = O . (7)
"' - -ao
lim F(at, a • • . • . , a..) = F(+oo, + 00 , . . . , +(0) = 1 . (8)
., - +ao . •. -+ + ... . . . . . tJa _ +ag
The distribution function F(z, z • . . . %,,) gives directly the values
of p(Zl . ZI • • • • • z .. ) only for the special sets L . If our field how-41 a • • • • a" ,
ever, is a Borel field, then2 P(Zh Z • • · · · 0 Jr,,) is uniquely determined for
all Borel se.ts in R" by knowledge of the distribution function
F(z', XI , • • "' z.) •
we call this derivative the n-dimensional probability density of
the random variables Xl, X2, . . . , Xn at the point ah az , . . . , a". If
also for every point (al, a2, . . . , an )
11, III u"
F X., • • • , x .. is called continuous. For
every Borel set A c R", we have the equality
p(r .. Jr, . · . . . %·) (A ) =1J. . ·11(a1 . a2 • • • • , a,,) da1 da2 • • • da" . (9 )
•
In closing this section we shall make one more remark about
the relationships between the various probability functions and
distribution functions.
Given the substitution
S
= (1 , 2 ,
\11 , il •
. . - ,
. . . ,
and let ,.s denote the transformation
� = � ( k = 1 , 2, . . . , n)
of space R" into itself. It is then obvious that
p(ri •• :r, • • · . . · :ri.) (A) = P(z, . z • • . . . . z .. l {ri1 (A)} . ( 10)
Now let x' = Pk ( X ) be the "projection" of the space R" on the
space Rk ( k < n) . so that the point ( Xl. X20 • • • • X .. ) is mapped onto
the point (Xl> X., • • • • Xk ) ' Then, as a result of Formula (2) in § 1,
• Cf. § 8, IV in the Second Chapter.
§ 4. Probabilities in Infinite-dimensional Spaces 27
p of the n-dimensional
space R-. A subset A of E we shall call a cylinder set if it can
be represented in the form
A = P;.l,. . . . . ,.. (A:)
where A' is a subset of Ra. The class of all cylinder sets coincides,
therefore, with the class of all sets which can be defined by rela
tions of the form
I Cf. HAUSDORFF. Mengtmlehre. 1927. p. 23.
28 III. Random Variables
I (XI'. , x" " . . . , XI'.) = 0 • ( 1 )
I n order to determine an arbitrary cylinder set P"' ''' . . . /'R (A') by
such a relation, we need only take as f a function which equals 0
on A', but outside of A' equals unity.
A cylinder set is a Borel cylinder set if the corresponding set
A' is a Borel set. All Borel cylinder sets of the space RM form a
field, which we shall henceforth denote by 3M'.
The Borel extension of the field ';jM we shall denote, as always,
by B';jM. Sets in BijM we shall call Borel sets of the space RM.
Later on we shall give a method of constructing and operating
with probability functions on ffM, and consequently, by means of
the Extension Theorem, on B';jM also. We obtain in this manner
fields of probability sufficient for all purposes in the case that the
set M is denumerable. We can therefore handle all questions
touching upon a denumerable sequence of random variables. But
if M is not denumerable, many simple and interesting subsets of
RM remain outside of B'ijM. For example, the set of all elements t
for which x,. remains smaller than a fixed constant for all
indices p., does not belong to the system B'ijM if the set M is
non-denumerable.
It is therefore desirable to try whenever possible to put each
problem in such a form that the space of all elementary events t
has only a denumerable set of coordinates.
Let a probability function p e A ) be defined on 'ijM. We may
then regard every coordinate x,. of the elementary event t
as a random variable. In consequence, every finite group
(xI'" x". ' . . . • x,..) of these coordinates has an n-dimensional
probability function P ",1'
•
. . . ,.. (A) and a corresponding distribu-
• F rom the above it follows that Borel cylinder sets are Borel sets definable
by relations of type ( 1 ) . Now let A and B be two Borel cylinder sets defined
by the relations
/ (X" . , x" • • . . . , x".) = 0 , g (x)." x.!. ' . • . , Xl .. ) = 0 •
Then we can define the sets A + B, A B, and A - B respectively by the relations
t · g = 0 ,
t' + gl = O .
t' + Q) (g) = 0 .
where Q) (x) = 0 for x + 0 and Q) (0) = 1 If f and 9 are Borel functions, so
also are f·g, r + g' and f' + w (g ) ; therefore, A + B, AB and A - B are Borel
cylinder sets. Thus we have shown that the system of sets �M is a field.
§ 4. Probabilities in Infinite-dimensional Spaces 29
tion function F,.,,.. " ' I'J al, a2, • • • , an) . It is obvious that for
every Borel cylinder set
A - p-l (A ') - 1', ,. . . . . 1'0 ,
the following equation holds :
P (A )
= P 1', 1' . . . . 1' .. (A') ,
where A' is a Borel set of Rn. In this manner, the probability
function P is uniquely determined on the field � of all cylinder sets
by means of the values of all finite probability functions PI',,.. .
.
. 1'''
for all Borel sets of the corresponding spaces R". However, for
Borel sets, the values of the probability functions P 1',1' . . . . ,... are
uniquely determined by means of the corresponding distribution
functions. We have thus proved the following theorem :
T.he set of aU finite-dimensional distribU!tion functions
Fp,,. . . . . ,' .. uniqU!ely determines the probability function p eA ) for
all sets in �M. If p eA ) is defined on �M, then ( according to the
extension theorem ) it is uniquely determined on B�M by the
values of the distribution functio'Yll F 1', ,... . . . ,.,. •
We may now ask the following. Under what conditions does a
system of distribution functions F"'�I . . . Jl" given a priori define
a field of probability on �M ( and, consequently, on B';JM) ?
We must first note that every distribution function F,I. ,. • . . . ,. ..
. must satisfy the conditions given in § 3, III of the second chap
ter ; indeed this is contained in the very concept of distribution
function. Besides, as a result of formulas ( 13) and ( 14) in § 2,
we have also the following relations :
FIAi.IIoi. . . . �Jtli. , tli. , . . . . �J = F,t. I'" " ,I. (al • ai • . . . . all) . (2)
Fp.I' . . . . Pl (al • ai . . . . . ak) = FI'.I' . . . . , • • (al • a2 • . . . • ab + 00 • . . . • +(0) , (3)
where k < n and (� ' 2, . . . ' �) is an arbitrary permutation . . tl ' t2 J • • • ) tIl
These necessary conditions prove also to be sufficient. as will
appear from the following theorem.
FUNDAMENTAL THEOREM: Every system of distribution func
tions F",1'� " . P'" satisfying the conditions (2) and (3) , defines a
probability function p eA ) on �M, which satisfies Axioms 1 - VI.
This probability function P (A ) can be extended ( by the exten
sion t1/;eorem) to BifM also.
30 III. Random Variables
Proof. Given the distribution functions Fp.'I'I " ' /'. ' satisfying
the general conditions of Chap. II, § 3, III and also conditions (2)
and ( 3 ) . Every distribution function Fl'l fJ. . . . . fJ. .. defines uniquely
a corresponding probability function P 1', 1'1 " ' 1'. for all Borel sets
of Rn ( cf. § 3 ) . We shall deal in the future only with Borel sets
of R" and with Borel cylinder sets in E.
For every cylinder set
we set
A - ,1, - 1 (A') - r #-1 1-'1 • • • �,. ,
(4)
Since the same cylinder set A can be defined by various sets A',
we must first show that formula ( 4 ) yields always the same
value for P (A) .
Let (xl'" xfJ.a ' . . . • x".) be a finite system of random variables
x,.. Proceeding from the probability function PfJ.a fJ.I , , ' 1' .. of these
random variables, we can, in accordance with the rules in § 3,
define the probability function P "Vi, ' " fJ.i1 of each subsystem
(xI',' • xp. . , • . . , x/ •. ) • From equations (2) and ( 3 ) it follows that I II ""
this probability function defined according to § 3 is the same as
the function PP.Vi . . . . P.ik given a priori. We shall now suppose that
the cylinder set A is defined by means of
A = p-l (A') !'i,fJ.i. · · · I'i1
and simultaneously by means of
A = p-l (A") /';,fJ.i • . . . 1';",
where all random variables XI'I and xl'J belong to the system
(xl'" x". ' . . . , x,u .. ) , which is obviously not an essential restriction.
The conditions
and
(X,u • XI' • • • • , x" ) c A" Jl JI rjm
are equivalent. Therefore
Ppi, fJ.i . . . . l'ik (A·) = PfJ., ,, . . . . ,u. {(xPi, . x,ui. ' . . . , XfJ.ik) c A'}
= p", ,u . . . . ,u .. {(x";, , xl'i. ' • • • , x,uf"') c A'l = Ppi, ,ui . . . . "i'" (A") .
which proves our statement concerning the uniqueness of the
definition of P ( A ) .
§ 4. Probabilities in Infinite-dimensional Spaces 31
Let us now prove that the field of probability (�, P) satisfies
all the Axioms I - VI. Axiom I requires merely that iYM be a field.
This fact has already been proven above. Moreover, for an arbi
trary p. :
E = p-l (RI) , I'
P (E) = PI' (Rl) = 1 ,
which proves that Axioms II and IV apply in this case. Finally,
from the definition of P ( A ) it follows at once that P ( A ) is non
negative ( Axiom III) .
It is only slightly more complicated to prove that Axiom V
is also satisfied. In order to do so, we investigate two cylinder sets
A = p-l (A') I'i. I'i, . . . I'ik
and B
=
p-l (B') . 1';. 1';' . . . I'i ..
We shall assume that all variables xI" and xfIJ belong to one inclu
sive finite system (Xf.ll ' x,.. , • . . , XI'R) . If the sets A and B do not
intersect, the relations(
) A' xl'il ' xl'is ' . . . , X" il C
and (xu . , Xu . , • • • , xu . ) C B' r1. rJI r1k
are incompatible. Therefore
P (A + B) = Pf.l. ,. . . . . f.lR { (Xl'i. ' xf.li. " . • , Xf.lil) c A'
or (xp;. ' xf.l;. ' . . . , x!';".) c B'}
=
Pf.l. " , . . . I,. { (Xl'i. , x" is ' . . . , Xf.lik) C A'}
+ PI'I I'I . . . f.l,, { (Xl'i. , X";, ' . . . , xl'i .. ) C B'} = P (A) + P (B) ,
which concludes our proof.
Only Axiom VI remains. Let
Al � AI � . . . � All � . . .
be a decreasing sequence of cylinder sets satisfying the condition
lim P (AII) = L > O.
We shall prove that the product of all sets All is not empty. We
may assume, without essentially restricting the problem, that in
the definition of the first n cylinder sets Ale. only the first n co
ordinates Xl'l in the sequence
X,.. ' XI" , . • • , Xf.lR ' ' "
32
occur, i.e.
For brevity we set
then, obviously
III . Random Variables
A - p-l (B ) n - I't f't . . . /'. n ·
P .. (B .. ) = P (A .. ) > L > O.
In each set Bn it is possible to find a closed bounded set U .. such
that
P .. (B" - U,,) < ; • .
FrQm this inequality we have for the set
the inequality
Let, morever,
V - p-l (U ) ft - 1'1 II, • . • 1'. ft
P (A .. - V .. ) < 2;"
From (5) it follows that
P (A .. - W .. ) < e .
Since W" c V" c An , it follows that
P (Wn) � P (A .. ) - e � L - e .
(5)
lf e i s sufficiently small, P ( W .. ) > 0 and W" is not empty. We
shall now choose in each set W .. a point e 0 , the probability
P { l x" - x l > a}
tends toward zero as n -+ 0Cl 5.
I. If the sequence ( 1) converges in probability to x and also
to x', then x and x' are equivalent. In fact
p{ l x - x' i > �} � p {l x" - x l > 2�} + P {l xlI - x'i > 2�} ;
since the last probabilities are as small as we please for a suffici
ently large n it follows that
p { l x - x' i > �} = 0
and we obtain at once that
P {x =f: x1 <: � p {l x - x' i > �} = o .
'"
II. If the sequence ( 1) almost surely converges to x, then it
• This concept is due to Bernoulli ; its completely general treatment was
introduced by E. E. Slutsky ( see [1] ) .
§ 6. Equivalent Random Variables; Various Kinds of Convergl3nce 35
also converges to x in probability. Let A be the convergence set
of the sequence ( 1 ) ; then
1 = P (A) � lim P {lxlI+" - x l < t , p = 0 , 1 , 2 , . . . } � lim P { l xn - x l < t} ,
,, + 00 . "' 00
from which the convergence in probability follows.
III. For the convergence in probability of the sequence ( 1 )
the following condition is both necessary and sufficient: For any
� > 0 there exists an n such th(J;t, for every p > 0, the following
inequality holds :
P {lXn+Jl - xII I > E} < E •
Let F1 (a) , F3 (a) , . . . , F,, (a) , . . . , F (a) be the distribution
functions of the random variables Xl> X2, • • • , x,,, . . . , x. If the
sequence XII converges in probability to x, the distribution func
tion F (a) is uniquely determined by knowledge of the functions
F,, (a) . We have, in fact,
THEOREM : If the sequence xlt X2, • • • , X,,, • • . converges in
probability to x, the corresponding sequence of distribution func
tions F,, (a) converges at each point of continuity of F (a) to the
distribution function F (a) of x.
That F (a) is really determined by the F,, (a) follows from the
fact that F (a) , being a monotone function, continuous on the left,
is uniquely determined by its values at the points of continuity6. To
prove the theorem we assume that F is continuous at the point
a. Let a' < a ; then in case X < a', x" > a it is necessary that
I XII - X I > 0, - a'. Therefore
lim P (x < a'. XII � a) = 0 ,
F(a') = P (x < a') :S: P (xlI a there
follows the relation
F (a") > lim sup F. (a) . (4)
• I n fact. i t has a t most only a countable set o f discontinuities ( see LEBESGUE,
Let;ons sur l'integration. 1928. p. 50. Therefore, the points of continuity are
everywhere dense, and the value of the function F ( a) at a point of discon
tinuity is determined as the limit of its values at the points of continuity
on its left.
86 III. Random Variables
Since F (a/ ) and F (a") converge to F ( a) for a' -+ a and a" -+ a,
it follows from (3) and (4) that
lim F .. ( a) = F (a) ,
which proves our theorem.
Chapter IV
MATHEMATICAL EXPECTATIONS l
§ 1. Abstract Lebesgue Integrals
Let X be a random variable and A a set of ira Let us form, for a
positive A, the stIm
k = +00
5i. = 2' k J.P{k 1 =:it x < (k + 1 ) 1 . � c A} . ( 1 )
k = -0<)
If this series converges absolutely for every A. then as A � 0, 51
tends toward a definite limit, which is by definition the integral
J xP ( dE ) • ( 2 )
A
In this abstract form the concept of an integral was introduced
by Frechet2 ; it is indispensable for the theory of probability.
(The reader will see in the following paragraphs that the usual
definition for the conditional mathematical expectation of the
variable x under hypothesis A coincides with the definition of
the integral (2) except for a constant factor.)
We shall give here a brief survey of the most important
properties of the integrals of form (2) . The reader will find their
proofs in every textbook on real variables, although the proofs
are usually carried out only in the case where P (A ) is the Lebesgue
measure of sets in R". The extension of these proofs to the general
case does not entail any new mathematical problem ; for the most
part they remain word for word the same.
I. If a random variable x is integrable on A, then it is in
tegrable on each subset A' of A belonging to if.
II. If x is integrable on A and A is decomposed into no
1 As was stated in § 5 of the third chapter, we are considering in this, as well
as in the following chapters, B01"el fislds of p"'obabilitll only.
• FRtCHET, Swr l'inUgraLs d'uns functionnells eundue a un ensemble
a.bsftra.it, Bull. Soc. Math. France v. 43, 1915, p. 248.
37
38 IV. Mathematical Expectations
more than a countable number of non-intersecting sets All of n:,
then
fX P (dE) = �fxP (dE) .
A II A.
III. If x is integrable, 1 x 1 is also integrable, and in that case
If % P (dE) 1 :S: Il x l P (dE) .
A A
IV. If in each event e, the inequalities 0 < y < x hold, then
along with x, y is also integrable3, and in that case
f y P (dE) � f x P (dE) .
A .A
V. If m < x < M where m and M are two constants, then
mP (A) �lx P (dE) � MP (A) .
.A
VI. If x and y are integrable, and K and L are two real con
stants, then Kx + Ly is also integrable, and in this case
I(Kx + Ly) P (dE)
=
K f x P (dE) + L f Y P (dE) .
.A .A .A
VII. If the series
L:f lx,, 1 P (dE)
converges, then the series
" .A
:i: x" = X
II
converges at each point of set A with the exception of a certain
set B for which P (B) = O. If we set x = 0 everywhere except on
A - B, then
f x P (dE) = L: f x" P (dE) .
.A II .A
VIII. If x and y are equivalent (P {x =F Y} = 0) , then for
every set A of n:
f x P (dE) = f y P (dE) .
.A .A
(3)
I It is assumed that 'V is a random variable, i.e., in the terminology of the
general theory of integration, measurable with respect to ij .
§ 2. Absolute and Conditional Mathematical Expectations 39
IX. If (3) holds for every set A of tj, then x and y are
equivalent.
From the foregoing definition of an integral we also obtain
the following property, which is not found in the usual Lebesgue
theory.
.
X. Let Pi (A) and P2 ( A ) be two probability functions defined
on the same field tj, p eA ) = Pl eA) + P2 (A ), and let x be integrable
on A relative to Pi ( A ) and P2 ( A ) . Then
J x p edE) = J x P1 (dE) + J x P. (dE) .
.A .A A
XI. Every bounded random variable is integrable.
§ 2. Absolute and Conditional Mathematical Expectations
Let x be a random variable. The integral
E (x ) = f xP ( dE)
E
is called in the theory of probability the mathematical expectation
of the variable x. From the properties III, IV, V, VI, VII, VIII,
XI, it follows that
I. I E (x ) I < E ( l x i ) ;
II. E (y ) < E (x ) if 0 < y < x everywhere ;
III. inf (x ) < E (x ) < sup (x) ;
IV. E (Kx + Ly) = KE ( x ) + LE (y) ;
V. E (�x .. ) = � E (x .. ) . if the series � E ( I x .. i ) converges ;
)1 It n
VI. If x and y are equivalent then
E (x ) = E (y) .
VII. Every bounded random variable has a mathematical
expectation.
From the definition of the integral, we have
k = + oo E(x) = lim,2; k m P {km � x < (k + 1 ) m} 1: = - 00
1: = + 00
=
lim�km{F«(k + 1 ) m) - F(km)} . .1: = - 00
40 IV. Mathematical Expectations
The second line is nothing more than the usual definition of the
Stieltjes integral
+""
fa dF(Z} (a) = E (x) . ( 1 )
Formula ( 1 ) may therefore serve as a definition of the mathe
matical expectation E ( x ) •
N ow let u be a function of the elementary event �, and x be a
random variable defined as a single-valued function x = x (u)
of u. Then
P {km :;; x < (k + 1 ) m} = P(") {km � x (u) < (k + 1) m} ,
where P(,.) (A) is the probability function of u. It then follows
from the definition of the integral that
J x P (dE) = J x Pc,,) (dE("»
E EM
and, therefore,
E (x) = f x (u) Ptll) (dE(U» (2)
Btul
where E(u) denotes the set of all possible values of u.
In particular, when u itself is a random variable we have
+ ao
E (x) = j x P (dE) = f x (u) P(") (d Rl) = j x (a) dF(II) (a) . (3)
E R' -�
When x (u) is continuous, the last integral in (3) is the ordinary
Stieltjes integral. We must note, however, that the integral
+ 00
jx (a) dFC") (a)
-00
can exist even when the mathematical expectation E (x ) does not.
For the existence of E (x ) , it is necessary and sufficient that the
integral +00
jl x (a) 1 dF(") (a)
-00
be finite'.
If u is a point (uh �, • • • , u,.) of the space R", then as a result
of (2 ) :
• Cf. V. GLlVENKO, Sur les valeurs probables de /onctions, Rend. Accad.
Lincei v. 8, 1928, pp. 480-483.
§ 2. Absolute and Conditional Mathematical Expectations 41
E (x) = If - · -J x (u1 • u2 • • • • • u,,) P a never becomes smaller than b > O. Then for any
random variable x
P (% � a) � E{fiN)} , ( 1 )
provided the mathematical expectation E{j {%)} exists. For,
E{/(%)} = /1 (x) P (dE) �/I (x) P (dE) >- b P (x � a) ,
E {:Z:� G}
from which ( 1 ) follows at once.
For example, for every positive c ,
P (x � a) � E��:
Z)
. (2)
Now let f ( x) be non-negative, even, and, for positive x, non
decreasing. Then for every random variable x and for any choice
of the constant a > 0 the following inequality holds
P ( I % I >- a) � E{/ (N)}
=
- I (a) . (3)
In particular t
P ( I % _ E(x) 1 2 a) <: E /{N/�)E (.¥
)} . (4)
Especially important is the case f ( x) = x2• We then obtain from
(3) and (4)
where
P (I % I � a) � E �!I) , (5 )
P ( l x - E (%) 1 ::l!! a) s; E{'¥ - .E (xj
}I = 01 (:)
I (6) a a
02 (x) = E{% - E (X)}2
is called the variance of the variable x. It is easy to calculate that
al (x) = E (Xl) - {E (x)}t •
If f (x) is bounded :
1 f (x ) 1 < K ,
then a . 1ower bound for P ( l x l > a) can be found. For
§ 4. Some Criteria for Convergence
E (t (x» = jt (X) P (dE) = (I (x) P (dE) + {I (X) P (dE)
E {! s r E (f (;�� I (a) . ( 8 )
In the case I (x) = x2, we have from ( 8 )
(9)
§ 4. Some Criteria for Convergence
Let
( 1 )
be a sequence of random variables and I (x) be a non-negative,
even, and for positive x a monotonically increasing function6•
Then the following theorems are true :
I. In order that the sequence ( 1 ) converge in probability the
following condition is sufficient : For each £ .> 0 there exists an n
such that for every p > 0, the following inequality holds :
E {f (xlI+p - XII)} < E • (2)
II . In order that the sequence (1 ) converge in probability to
the random variable x, the following condition is sufficient :
lim E{/(xlI - X)} = o. (3 )
.. .. + 00
III. If I (x) i s bounded and continuous and f ( O ) = 0, then
conditions I and II are also necessary.
IV. If f (x) is continuous, f (O ) = 0, and the totality of all
XlJ Xa. • • • , x .. , . . . • X is bounded,then conditions I and II are also
necessary.
• Therefore f(a;) > 0 if a; 'i' O.
IV. Mathematical Expectations
From II and IV, we obtain in particular
V. In order that sequence ( 1 ) converge in probability to x,
it is sufficient that
lim E (x" - x ) 2 = 0 • (4)
If also the totality of all X 1 0 x2, • • • , x .. , . . . , x is bounded, then the
condition is also necessary.
For proofs of I - IV see Slutsky [1] and Frechet [1 ] . How
ever, these theorems follow almost immediately from formulas
(3) and (8) of the preceding section.
§ 5. DHferentiation and Integration of Mathematical Expectations
with Respect to · a Parameter
Let us put each elementary event � into correspondence with a
definite real function x ( t) of a real variable t. We say that x (t )
is a random lunertion i f for every fixed t , the variable x ( t) i s a
random variable. The question now arises, under what conditions
can the mathematical expectation sign be interchanged with the
integration and differentiation signs. The two following theorems,
though they do not exhaust the problem, can nevertheless give a
satisfactory answer to this question in many simple cases.
THEOREM I : li the mathematical expectation E [x ( t ) ] is finite
lor any t, and x ( t ) is always differentiable lor any t, while the
derivaJtive x' (t ) 01 x ( t) with respect to t is always less in abso
lute value than some c01I.8tant M, then
d
dt E (x (t») = E (x' (t») .
THEOREM II : 11 x ( t) always remains less, in absolute value,
than some constant K and is integrable in the Riemann sense, then
jE (X(t») dt = E [jX (t) dt] ,
provided E [x ( t) ] is integrable in the Riemann sense.
Prool 01 Theorem 1. Let us first note that x' ( t ) as the limit of
the random variables
� (t + II) - � (t)
II
1 1
h = t ' 2 " " ' n " "
is also a random variable. Since x' (t) is bounded, the mathe-
§ 5. Differentiation and Integration of Mathematical Expectations 45
matical expectation E [x' ( t) ] exists (Property VII of mathe
matical expectation, in § 2 ) . Let us choose a fixed t and denote
by A the event
The probability P (A ) tends to zero as h -+ 0 for every B > O. Since
I
X (t + "1 - N (')
I
� M , I x (t) \ � M
holds everywhere, and moreover in the case A
then
I X (t + "1 - x (t) - x/(t) I � E ,
I
E x (, + "l - Ex (t) - E x'(t) I � E I x (t + "l - x (t) - x/(t) I
= p eA) EA I x (t + "l - x (t) _ x'(t) 1 + p eA) EA I x (t + "1 - x (t) - x'(t) I
� 2M P (A ) + E .
We may choose the f > 0 arbitrarily, and P (A) is arbitrarily
small for any sufficiently small h. Therefore
:t E x (t) = lim
E x (t + "1 - E x (t) = Ex'(t) ,
A ..... 0
which was to be proved.
Proof of Theorem II. Let
k="
S,. = i-2:x (t + kh) , b - a h = -- . n
k - l b
Since S .. converges to J = J x ( t) dt, we can choose for any
II
e > 0 an N such that from n > N there follows the inequality
P (A) = P{I S. - I I > e} < e .
If we set
then
1: = "
S� = * 2 E x (t + k h) = E (S,,) ,
k=l
I S: - E (]) I = I E (S" - ]) 1 >< E I S" - I I
= p eA) EA I S" - I I + p eA) E..t I S,. - II. � 2 K P (A) + E � (2K + 1) 8 .
46 IV. Mathematical Expectations
Therefore, S: converges to E (J) , from which results the equation
IJ
JE x (t) dt = lirn S: = E (]) .
41
Theorem II can easily be generalized for double and triple
and higher order multiple integrals. We shall give an application
of this theorem to one example in geometric probability. Let G be a
measurable region of the plane whose shape depends on chance ;
in other words, let us assign to every elementary event � of a field
of probability a definite measurable plane region G. We shall
denote by J the area of the region G, and by P (x, y ) the prob
ability that the point ( x, y ) belongs to the region G. Then
E (]) = j j P (x , y) dx dy .
To prove this it is sufficient to note that
J = jj/ (x . y) dx dy .
P (X', y) = E/(x . y) ,
where f (x, y) is the characteristic function of the region G
(f (x, y ) = 1 on G and [ (x, y) = 0 outside of G) &.
• Cf. A. KOLMOGOROV and M. LEONTOVICH, Zur Berechnung der mittleren
Brownschen Placke, Physik. Zeitschr. d. Sovietunion, v. 4, 1933.
Chapter V
CONDITIONAL PROBABILITIES AND
MATHEMATICAL EXPECTATIONS
§ 1. Conditional Probabilities
In § 6, Chapter I, we defined the conditional probability, p. (B) ,
of the event B with respect to trial !l. It was there assumed that !l
allows of only a finite number of different possible results. We
can, however, define Pa: (B) also for the case of an m: with an infinite
set of possible results, i.e. the case in which the set E is partitioned
into an infinite number of non-intersecting subsets. In particular,
we obtain such a partitioning if we consider an arbitrary function
u of , and define as elements of the partition \ll.. the sets u = con
stant. The conditional probability p ... (B) we also denote by P .. (B) .
Any partitioning m: of the set E can be defined as the partitioning
m: .. which is "induced" by a function u of " if one assigns to every "
as u U) , that set of the partitioning m: of E which contains ,.
Two functions u and u' of , determine the same partitioning
9lu = m:.,. of the set E, if and only if there exists a one-to-one cor
respondence u' = f (u) between their domains ty(") and ty("') such
that u' U) is identical with /uU) . The reader can easily show that
the random variables P" (B) and P .. .(B) , defined below, are in this
case the same. They are thus determined, in fact, by the partition
21 .. = 2l..dtself.
To define p .. (B) we may use the following equation :
( 1 )
It i s easy to prove that if the set E (w) of all possible values of u is
finite, equation ( 1 ) holds true for any choice of A (when P .. (B)
is defined as in § 6, Chap. 1) . In the general case ( in which P" (B)
is not yet defined) we shall prove that there always exists one
and only one random variable P,, (B) ( except for the matter of
equivalence) which is defined as a function of u and which satis
fies equation ( 1 ) for every choice of A from ty(v) such that
47
48 V. Conditional Probabilities and Mathematical Expectations
pc .. ) (A) > O. The function P,, (B) of u thus determined to within
equivalence, we call the conditional probability of B with respect
to u ( or, for a given u) . The value of Pu (B) when u = a we shall
designate by Pu (a; B) .
The proof of ,the existence and uniqueness of p .. (B) . If we
multiply ( 1 ) by P {u c A} = pc .. ) (A ) , we obtain, on the left,
P {u c A} P .. c:..4. (B) = P (B {u c A}) = P (B u- I (A »)
and, on the right,
P{u c A} E{ .. c:A} P,, (B) = f Pu (B) P (dE)
= J Pu (B) P('l) (dE("») J
{u C:A} A
leading to the formula
P (B u - I (A»)
=
Jpu (B) P(") (dE(u») j
.do
(2 )
and conversely ( 1 ) follows from (2) . In the case Ph,) (A ) = 0,
in which case ( 1 ) is meaningless, equation (2) becomes trivially
true. Condition (2 ) is thus equivalent to ( 1 ) . In accordance with
Property IX of the integral ( § 1, Chap. IV) the random variable
x is uniquely defined (except for equivalence) by means of the
values of the integral
Jx P d (E)
A
for all sets of S:. Since Pu (B) is a random variable determined
on the probability field Ol(" ) , pc .. » , it follows that formula (2 )
uniquely determines this variable p .. (B) except for equivalence.
We must still prove the existence of Pu (B) . We shall apply
here the following theorem of Nikodyml :
Let s: be a Borel field, P (A ) a non-negative completely additive
set function defined on s: ( in the terminology of the probability
theory, a random variable on OJ, P) ) , and let Q (A ) be another
completely additive set function defined on S:, such that from
Q (A ) =i= 0 follows the inequality p eA ) > O. Then there exists a
function f a) ( in the terminology of the theory of probability,
a random variable) which is measurable with respect to S:, and
which satisfies, for each set A of S:, the equation
1 O. NIKODYM, Sur une geniraliBation dSB integralsB ds M. J. Ra don, Fund.
Math. v. 15, 1980 p. 168 ( Theorem III ) .
§ 1. Conditional Probabilities 49
Q (A ) = jf (O P (dE) .
A
In order to apply this theorem to our case, we need to prove
1 ° that
Q (A ) = P (Bu- I ( A »
i s a completely additive function on illY) , 2°. that from Q (A ) +0
follows the inequality Ply) (A) > O.
Firstly, 2° follows from
o � P {B u - I (A») ::a P (u -1 (A» = pt-) (A) .
For the proof of 1 ° we set
A = 1;A .. .
then ..
u - 1 (A) = 1; u - 1 (A,,)
"
and B u - 1 (A) = ,!' B u - 1 (A .. ) . ..
Since P is completely additive, it follows that
P (B u - 1{A,,») = 2' P (B U- l (�») ,
"
which was to be proved.
From the equation ( 1 ) follows an important formula ( if we
set A = E("» :
P (B) = E (P .. (B» . (3)
Now we shall prove the following two fundamental properties
of conditional probability.
THEOREM I. It is almost sure that
(4)
THEOREM II. If B is decomposed into at most a countable
number of sets B .. :
B = ,!' B"
"
then the following equality holds almost surely :
P,, (B) = 1; P,, (B .. ) • ( 5)
"
These two properties of Pu (B) correspond to the two char
acteristic properties of the probability function P (B) : that
o < P (B) < 1 always, and that P (B) is completely additive. These
50 V. Conditional Probabilities and Mathematical Expectations
allow us to carry over many other basic properties of the absolute
probability P (B) to the conditional probability Pu (B) . However,
we must not forget that P .. (B) is,for a fixed set B, a random vari
able determined uniquely only to within equivalence.
Proof of Theorem I. If we assum�ontrary to the assertion
to be proved-that on a set M e EC") with pc .. ) (M) > 0, the in
equality Pu (B) > 1 +e, e > 0, holds true, then according to for
mula ( 1 )
which i s obviously impossible. I n the same way we prove that
almost surely Pu (B) > O.
Proof of Theorem II. From the convergence of the series
2' E I P .. (BII) 1 = ,l' E (P .. 0,
then from Property V of mathematical expectation just referred
to it follows that for each A of the above kind we have the relation
que ..t} (� Pu (Bft» ) = � q"e ..t} (P .. (BII» = p{ue ..t} (B) = qu e..t} (P,,(B,,»).
and from this, equation (5) immediately follows.
To close this section we shall point out two particular cases.
If, first, u (t) = c ( a constant) , then Pc (A ) = p eA) almost
surely. If, however, we set u (t) = t, then we obtain at once
that P.; (A) is almost surely equal to one on A and is almost surely
equal to zero on A. Pi (A) is thus revealed to be the characteristic
function of set A.
§ 2. Explanation of a Borel Paradox
Let us choose for our basic set E the set of all points on a
spherical surface. Our a will be the aggregate of all Borel sets
of the spherical surface. And finally, our p eA) is to be propor
tional to the measure of set A. Let us now choose two diametrically
§ 3. Conditional Probabilities with Respect to a Random Variable 51
opposite points for our poles, so that each meridian circle will be
uniquely defined by the longitude '1', 0 � 'II < n . Since tp varies
from 0 only to n, - in other words, we are considering complete
meridian circles (and not merely semicircles) - the latitude 8
n n
must vary from - n to +n (and not from - 2 to + z ) ' Borel set
the following problem : Required to determine "the conditional
probability distribution" of latitude e, - n � e < +n, for a
given longitude '1'"
It is easy to calculate that
6,
p,Ael � e < el} = 1 jl eose l de .
6,
The probability distribution of 8 for a given rp is not uniform.
If we assume that the conditional probability distribution of
8 "with the hypothesis that e lies on the given meridian circle"
must be uniform, then we have arrived at a contradiction.
This shows that the concept of a conditional probability with
regard to an isolated given hypothesis whose probability equals 0
is inadmissible. For we can obtain a probability distribution
for 8 on the meridian circle only if we regard this circle as an
element of the decomposition of the entire spherical surface into
meridian circles with the given poles.
§ 3. Conditional Probabilities with Respect to a Random Variable
If x is a random variable and P., (B) as a function of x is
measurable in the Borel sense, then P., (B) can be defined in an
elementary way. For we can rewrite formula (2) in § 1, to look
as follows :
P (B) pt' (A) = j P. (B) P(.,) (dE) • ( 1 )
.A
In this case we obtain from ( 1 ) at once that
II
P (B) F�' (a) = jP. (a ; B) dF(·) (a) • (2)
-00
In accordance with a theorem of Lebesgue2 it follows from (2 )
that
P ( . B) = P (B) r F;' {a + II) - J1l(a) II -+ 0 (3 ) • a, un ]M (a + h) _ ptZl {a)
which is always true except for a set H of points a for which
p< ... ) (H) = o.
• Lebeague, l. e., 1928. pp. 301-302.
52 V. Conditional ProbabiHties and Mathematical Expectations
P., (a ; B) was defined in § 1 except on a set G, which is
such that pc .. ) (G) = 0. If we now regard formula (8 ) as the defi
nition of P.., (a ; B) ( setting P., (a ; B) = ° when the limit in the
right hand side of (8 ) fails to exist ) , then this new variable
satisfies all requirements of § 1 .
If, besides, the probability densities f(x) (a ) and f 0, then formula (8 ) becomes
f: (a) P", (a ; B) = P (B) "' (a} '
(4)
Moreover. from formula ( 3 ) it follows that the existence of a
l imit in (8 ) and of a probability density I(x) (a) results in the
existence of f<;) (a) . In that case
P (B) I: (a) � 1(") (a) . (5 )
If P (B) > 0, then from (4) we have
I
tt» ( ) _ P% (a ; B) f"' (a)
B ,, - P (B) . ( 6 )
In case I(x) (a) = 0, then according to ( 5 ) f(� (a) = ° and there
fore (6 ) also holds. If, besides, the distribution of x is contjnuous,
we have
+00 + 00
P CB) = E (P .. 0, the condition
§ 4. Conditional Mathematiclil Expectations 53
q.C .. o (Y) = qIlC A} E. (y) f ( 1 )
is called ( if it exists ) the conditional mathematical expectation of
the variable y for known value of u.
If we multiply ( 1 ) by Pcv) (A) , we obtain
f y P (d E) = f E" (y) P{II) (dE{U») .
{MC A} A
( 2 )
Conversely from (2 ) follows formula ( 1 ) . In case PC,,) (A ) = 0,
in which case ( 1 ) is meaningless, (2) becomes trivial. In the
same manner as in the case of conditional probability ( § 1 ) we
can prove that E., (y) is determined uniquely--except for equiva
lence-by (2 ) .
The value of E,, (y) for u = a we shall denote by E,, (a; y) . Let
us also note that E.. (y) , as weB as P u (y ) , depends only upon the
partition �" and may be designated by Ewu (y)
•
The existence of E (y) is implied in the definition of Ev (y) ( if
we set A = E(v) , then qUC A} (Y) = E (y» .
We shall now prove that the ,existence of E (y) is also sufficient
for the existence of E,, (y) . For this we only need to prove that by
the theorem of Nikodym ( § 1 ), the set function
Q (A) = fY P (dE)
{MeA}
is completely additive on if(") and absolutely continuous with
respect to PCv) (A ) . The first property is proved verbatim as in
the case of conditional probability ( § 1 ) . The second property
absolute continuity-is contained in the fact that from Q (A ) + 0
the inequality PC,,) (A) > 0 must follow. If we assume that
pc .. ) (A ) = P {u c A} = O, it is clear that
Q (A) = f y P (dE) = 0 ,
{U CA}
and our second requirement is thus fulfilled.
If in equation ( 1 ) we set A = E (") , we obtain the formula
E (y) = E E.. (y) •
We can show further that almost surely
E.. (ay + bz) = aE .. (y) + bEu (z) ,
(3)
(4)
where a and b are two arbitrary constants. (The proof is left to
the reader.)
54 V. Conditional Probabilities and Mathematical Expectations
If u and v are two functions of the elementary event e, then
the couple (u, v ) can always be regarded as a function of e. The
following important equation then holds :
Eu E(u.o) (y) = Eu (y) •
For, Eu (y) is defined by the relation
quc.A} (Y) = quc A} Eu (y) •
(5)
Therefore we must show that EIlE(" . .,) (y) satisfies the equation
qu c .A} (y) = E{u c A} Eu E(u. v) (y) •
From the definition of E(u.v) (y) it follows that
E{uc.A} (Y) = E{UC A} E(u,lI) (Y) '
From the definition of EuE (".v) (y) it follows, moreover, that
(6)
(7 )
quc4} E(u,lI) (y) = qucA} Eu E(u • .,) (y) • (8 )
Equation (6) results from equations ( 7 ) and (8') and thus proves
our statement.
If we set y = P u (B) equal to one on B and to zero outside of B,
then Eu (y) = Pu (B) .
E(Il •• ) (y) = p(u. v) (B) .
In this case,from formula (5) we obtain the formula
E" P(U, lI) (B) = P u (B) . (9)
The conditional mathematical expectation Eu (y) may also be
defined directly by means of the corresponding conditional prob
abilities. To do this we consider the following sums :
k = + oo
S). (u) = .!' k l Pu {k). =, y « k + 1 ) l} = � Rk ' ( 10)
k � - 00 k
If E (y ) exists, the series ( 10) almost certainly· converges. For
we have from formula (3) , of § 1 ,
E I R.t 1 = I H I P { k l ;;iii y < (k + 1 ) l} .
and the convergence of the series
k = + 00
� ' kl l P {kl � y < (k + 1 ) l}
=
,I E J R.t 1 1: - - O. In this case ( 1 ) is equivalent to the relation
p{tle B}(u c A) = P (u c A)
and therefore t o the relation
E{DeB}P. (U C A) = P (u c A ) •
(3)
(4)
O n the other hand, i t i s obvious that equation ( 4 ) follows from
57
58 VI. Independence ; The Law of Large Numbers
(2) . Conversely since Pv (u cA ) is uniquely determined by (4)
to within probability zero. then equation (2) follows from (4)
almost certainly.
DEFINITION 2 : Let M be a set of functions ",. U) of �. These
functions are called mutually independent in their totality if the
following condition is satisfied. Let M' and M" be two non
intersecting subsets of M. and let A' ( or A " ) be a set from iY
defined by a relation among u,. from M' ( or Mil) ; then we have
P (A' A") = P (A') P (A") .
The aggregate of aUtu", of M' ( or of Mil ) can be regarded as
coordinates of some function u' ( or u" ) . Definition 2 requires
only the independence of u' and u" in the sense of Definition 1 for
each choice of non-intersecting sets M' and Mil.
If Uh u2 • • • • • u,. are mutually independent. then in all cases
P {UI C A I ' US c A l ' . . . , Un C Aft}
= P (Ul C Al) P (Ui c As) . . . P (u" c Ali) ,
(5)
provided the sets Ax, belong t o the corresponding %(Uk) ( proved
by induction) . This equation is not in general, however, at all
sufficient for the mutual independence of Ult �, • • • , u,..
Equation (5 ) is easily generalized for the case of a countably
infinite product.
From the mutual independence of u,... in each finite group
(u"" "". ' . . . , u,.J it does not necessarily follow that all ul• are
mutually independent.
Finally, it is easy to note that the mutual independence of the
functions ",. is in reality a property of the corresponding parti
tions mu,. . Further, if "� are single-valued functions of the cor
responding "/. , then from the mutual independence of u,. follows
that of u; . .
§ 2. Independent Random Variables
If Xli Xa, • • • , X" are mutually independent random variables
then from equation (2) of the foregoing paragraph follows, in
particular, the formula
F( .. •• Zo . . . . , Zft) (a1, ai ' . . . , aft) = F(r.) (a 1) F(z,l (a,) . . . .F<%ft) (a,.) • ( 1 )
If in this case the field ij(z" 2:0, • • • , z,..) consists only of BoreL sets of
§ 2, Independent Random Variables 59
the spave R", then condition ( 1 ) is also sufficient for the mutual
independence of the variables XI, X2, ' , , , xn'
Proof, Let x' = (Xi, > Xi • • • • " Xit) and x" = (xi, ' x;. ' , . " xi".) be
two non-intersecting subsystems of the variables Xl> X2, , , , , xn•
We must show, on the basis of formula ( 1 ) , that for every two
Borel sets A' and A" of R" ( or Rm) the following equation holds :
P (x' c A', x" c A") = P (x' c A') P (x" c A") • ( 2 )
This follows at once from ( 1 ) for the sets of the form
A' = {Xi, < al , Xi. < ai ' . . . • xi� < a.l:} ,
A"
=
{Xi, < bl , Xi, < ba , , • . , Xi. < bm} .
It can be shown that this property of the sets A' and A" is pre
served under formation of sums and differences, from which
equation (2 ) follows for all Borel sets,
Now let X = {xl'} be an arbitrary ( in general infinite) aggre
gate of random variables, If the field ij 0 and u2 (y) > 0) . The n um
ber {2 is called the correlation ratio of y with respect to x, and g2
the same for x with respect to y (Pearson) .
From (5 ) it further follows that
E (xy) = E (x ) E ( y) •
To prove this we apply Formula ( 1 5 ) of § 4, Chap. V:
E (xy) = E E.., (xy) = E [x Ea: (y)] = E [x E (y)]
=
E (y) E (x) •
Therefore, in the case of independence
r
=
E(x. y) - E (x) E (y)
" (x) " (y)
(6)
is also equal to zero ; 1', as is well known, is the correlation co
efficient of x and y.
If two random variables x and y satisfy equation (6 ) , then
they are called uncorrelated. For the sum
S = Xl + X2 + . . . + x"
§ 3. The Law of Large Numbers 61
where the Xl. X2, • • • , x'" are uncorrelated in pairs. we can easily
compute that
Olll (S) = 01 (%J) + 01 (%.) + . . . + 01 (XII) • (7 )
In particular, equation (7) holds for the independent variables Xk.
§ 3. The Law of Large Numbers
Random variables s of a sequence
810 82, • • • • 8" • • • •
are called stable, if there exists a numerical sequence
d1• d2 • • • • , d", . . .
such that for any positive £
P { l s" - d,, 1 � e}
converges to zero as n - 00 . If all E (s,,) exist and if we may set
d", = E (s .. ) .
then the stability is normal.
If all s .. are uniformly bounded, then from
P { I SIt - d" I � e} - 0
we obtain the relation
I E (s.) - dn l - 0
and therefore
P { l s" - E (sn) I > e} - O .
n - + oo
n - +oo
n - + oc
( 1 )
( 2 )
The stability of a bounded stable sequence i s thus necessarily
normal.
Let
According to the Tchebycheff inequality,
ot P { l s,, - E (sn) l � e} � ; .
Therefore, the Markov Condition
a! - O
is sufficient for normal stability.
(3 )
62 VI. Independence ; The Law of Large Numbers
If 8 .. - E(8 .. ) are uniformly bounded :
I 8n - E(s .. ) I < M,
then from the inequality (9) in § 3, Chap. IV,
al - .. P { l s,, - E (s .. ) I > e} � aMI ·
Therefore, in this case the Markov condition (3 ) is also necessary
for the stability of the 8".
If
s _ Xl + XI + . . . + x. " - -=---'----''----'---,,_.
and the variables x" are uncorrelated in pairs, we have
a! = -; { al (Xl) + a2 (Xl) + . . . + at (X,,) } • n
Therefore, in this case, the following condition is sufficient for
the normal stability of the arithmetical means 8n:
nl a! = a2 (x1) + 02 (x.) + . . . + al (x,,) = o (nl) (4)
( Theorem of Tchebycheff) . In particular, condition (4) is ful
filled if all variables Xn are uniformly bounded.
This theorem can be generalized for the case of weakly cor
related variables Xn• If we assume that the coefficient of correla
tion T", .. l of Xm and x" satisfies the inequality
1'", ,, � c ( l n - m ! )
and that
k = ,, - l
e" = �c (k) ,
1: = 0
then a sufficient condition for normal stability of the arithmetic
means 8 is2
(5)
In the case of independent summands x " we can state a neces
sary and sufficient condition for the stability of the arithmetic
means s". For every x .. there exists a constant m" (the median of
xn ) which satisfies the following conditions :
P (x" < mIt) � I .
P (x .. > m .. ) � 1 .
1 It is obvious that " •• = 1 always.
• Cf. A. KHINTCHINE, Sur La loi !ortt deIJ gra:ndeB nombrell. C. R. de l'acad.
sci. Paris v. 186, 1928, p. 285.
We set
Then the relations
§ 3. The Law of Large Numbers
X"k = XI.; if I XI.;-mk I < 11.,
X .. k = 0 otherwise,
k=,. k=,.
63
�P{IXk - mi I > n} = �P (XJtt =4= Xk) - O . n - +00 ( 6)
are necessary and sufficient for the stability of variables sn3 •
We may here assume the constants d" to be equal to the E (s" . )
so that in the case where
E (s:) - E (sn) - 0 n - +oo
(and only in this case ) the stability is normal.
A further generalization of Tchebychetr's theorem is obtained
if we assume that the s" depend in some way upon the results of
any 11. trials,
�" �., . . . , �" ,
so that after each definite outcome of all these 11. trials s" assumes
a definite value. The general idea of all these theorems known as
the law of large numbers, consists in the fact that if the depend
ence of variables 8 .. upon each separate trial �k ( k = 1 , 2, . . . , n )
i s very small for a large 11., then the variables s" are stable. I f we
regard
as a reasonable measure of the dependence of variables 8" upon
the trial 9h, then the above-mentioned general idea of the law of
large numbers can be made concrete by the following considera
tions·.
Let
• Cf ... A .. KOLMI?�ORO;Y . flbe'1' die Summen dU'1'ch den Zufall bestimmte'1' unabhangtge'1' GrOSStnl., Math. Ann. v. 99, 1 928, pp. 309-319 ( corrections and notes to this study, v. 102, 1 929 pp. 484-488, Theorem VIII and a supplement
on p. 818 ) .
• Cf. A. KOLMOGOROY. SU'1' la loi des g'1'andes nomb'1'es. Rend. Acead. Lincei
v. 9, 1929 pp. 470-474.
64 VI. Independence ; The Law of Large Numbers
Then
s. - E (s .. ) = Z1 + z. + . . . + Z. ,
E (Zn t) = E EVI• VI • • . . VIi (S.) - E EVI • • , • • • VIk - 1 (S.) = E (S.) - E (S.) = O .
a" (z.t> = E (Z:t) = P"k '
We can easily compute also that the random variables %.1< ( k =
1, 2, . . . , n) are uncorrelated. For let i < k ; then6
EVIl VI • • • • VII- I (Z. l ZIIk) = Z. l EVI• ¥l • • • • Vlt - l (ZIIk)
= z., EVIl VI • • • • VIi _ 1 [EVIl VI, • • . 9.l.t (s.) - E.l " . . . . VI.t _ 1 (s .. )]
= z. , [E". ", . . . Vlt - l (s.) - Eel •• • • • "10 - 1 (s.)] = 0
and therefore
We thus have
al (s.) = 0'2 (Z. I) + a2 {z.!) + . . . + O:' (z .. ) = P.. l + P..2 + . . . + Po"t '
Therefore, the condition
f1!1 + P..2 + . . . + P... -+ 0
is sufficient for the normal stability of the variables 8 ••
§ 4. Notes on the Concept of Mathematical Expectation
We have defined the mathematical expectation of a random
variable x as
+ 00
E (x) = f x P (dE) = fa dF(e) (a) ,
E -00
where the integral on the right is understood as
+ 00 c
E (x) = fa dF(e) (a) = lim f a dF(e) (a) .
- 00 b
b -+ - 00
C -+ + 00
The idea suggests itself t o consider the expression
+ b
E* (x) = lim fa dF(e) (a)
- b
• Application of Formula ( 1 5 ) in § 4, Chap. V.
( 1 )
(2)
§ 4. Notes on the Concept of Mathematical Expectation 65
as a generalized mathematical expectation. We lose in this case,
of course, several simple properties of mathematical expectation.
For example, in this case the formula
E ( x + y) = E (x ) + E ( y)
is not always true. In this form the generalization is hardly
admissible. We may add however that, with some restrictive
supplementary conditions, definition (2) becomes entirely natural
and useful.
We can discuss the problem as follows. Let
be a sequence of mutually independent variables, having the same
distribution function F(z) (a) = p 0
lim P ( I s" - E* (x> 1 > £)= 0 , n --+ + 00 . (3)
The answer is : If suck a constant E* ( x ) exists, it is expressed by
Formula (2) . The necessary and sufficient condition that Formula
( 3 ) hold consists in the existence of limit (2) and the relation
P ( l x l > n)
=
o (! ) . (4)
To prove this we apply the theorem that condition (4) is
necessary and sufficient for the stabnity of the arithmetic means
8ft, where, in the case of stability, we may seta
+ "
d" = fa dF(z) (a) •
-"
If there exists a mathematical expectation in the former sense
(Formula ( 1 » , then condition (4 ) is always fulfilled T. Since in
this case E ( x ) = E* (x) , the condition (3) actually does define a
generalization of the concept of mathematical expectation. For
the g,eneralized mathematical expectation, Properties I - VII
• Ct. A. KOLMOGOROV , Bemerkungen %u meiner Alrbeit. "fIber die Summen
zu!ltUiger Grossen." Math. Ann. v. 102, 1929, pp. 484-488, Theorem XII.
• Ibid, Theorem XIII.
66 VI. Independence ; The Law of Large Numbers
(Chap. IV, § 2 ) still hold ; in general, however, the existence of
E* I x 1 does not follow from the existence of E* (x) .
To prove that the new concept of mathematical expeotation
is really more general than the previous one, it is suffici�_nt to
give the following example. Set the probability density f(lt) (0,)
equal to
1(:1) ( ) _ C a - ( l a l + 2)l ln ( i a l + 2) '
where the constant C is determined by
+00
jl(:I) (a) da = 1 •
- 00
It is easy to compute that in this case condition (4) is fulfilled.
Formula (2 ) gives the value
but the integral
diverges.
E* (x) = 0,
+ 00 + 00
jla l dF(:I) (a) = Jl a 1 /(:1) (a) da
-00 -0(1
§ 5. Strong Law of Large Numhers ; Convergence of Series
The random variables 8" of the sequence
SI ' S2' • • • , s.., . . .
are strongly stable if there exists a sequence of numbers
dh th, . . . , d", . . .
such tha;t the random variables
SIt - d,.
almost certainly tend to zero as n .... +00 . From strong stability
follows, obviously, ordinary stability. If we can choose
dn = E (s,, ) ,
then the strong stability is normal.
In the Tchebycheff case,
s
=
XI + X1I + . . . + x.
• n '
§ 5. Strong Law of Large Numbers ; Convergence of Series 67
where the variables x" are mutually independent. A sufficient8
condition for the normal strong stability of the arithmetic means
8" is the convergence of the series
( 1 )
This condition is the best i n the sense that for any series of con
stants b" such that 00
� !; = +00 ,
,,=1
we can build a series of mutually independent random variables
x" such that
at (x,,) = b"
and the corresponding arithmetic means 8,. will not be strongly
stable.
If all x" have the same distribution function F<') ( a.) , then the
existence of the mathematical expectation
+00
E (x) = fa dF 1 .
• Cf. A . KOLMOGOROV; Sur la loi forte de8 grandes nombre8, C . R . Acad. Sci.
Paris v. 191, 1930, pp. 910-911 .
• The proof of this statement has not yet been published.
68 VI. Independence ; The Law of Large Numbers
Then in order that series ( 1 ) converge with the probability one,
it is necessary and sufficient10 that the following series converge
simultaneously :
00
L P { Ix,, 1 > 1 } ,
,, = 1
00 00
L E (y,,) and L (12 (y,,) 8 = 1 ,, = 1
1. Cf. A. KHINTCHINE and A. KOLMOGOROV, On the Convergence of Series,
Rec. Math. Soc. Moscow, v. 32, 1925, p. 668-677.
Appendix
ZERO·OR.ONE LAW IN THE THEORY
OF PROBABILITY
We have noticed several cases in which certain limiting
probabilities are necessarily equal to zero or one. For example,
the probability of convergence of a series of independent random
variables may assume only these two values1• We shall prove now
a general theorem including many such cases.
THEOREM : Let Xl' X2, • • • , X .. , • • • be any random variables and
let f (Xl ' X2, • • • , X .. , • • • ) be a Baire function2 of the variables
Xl, X2I • • • , X .. , • • • such that the conditional probability
PSl • Zo • • • • • %. {/ (x) = o}
of the relation
l (x1 , x2 , • • • , x .. , . . . ) = 0
remains, when the first n variables Xli X2, • • • , x .. are known, equal
to the absolute probability
P{/ (x) = o} ( 1 )
for every n. Under these conditions the probability ( 1 ) equals
zero or one.
In particular, the assumptions of this theorem are fulfilled if
the variables x .. are mutually independent and if the value of the
function f ( x) remains unchanged when only a finite number of
variables are changed.
Proof of the Theorem : Let us denote by A the event
f ( x ) = O.
We shall also investigate the field st of all events which can be
defined through some relations among a finite number of vari-
1 Cf. Chap. VI, § 5. The same thing is true of the probability
P{s. - d. _ O}
in the strong law of large numbers ; at least, when the variables x. are mutu
ally independent.
t A Baire function is one which can be obtained by successive passages to
the limit, of sequences of functions, starting with polynomials.
69
70 Appendix
abIes x .. . If event B belongs to st, then,according to the conditions
of the theorem ,
PB (A ) = P (A ) .
In the case P (A ) = 0 our theorem is already
P ( A ) > O. Then from (2) follows the formula
P (B) = Pa (A ) P (B) '-" P (B) A PtA ) •
(2 )
true. Let now
(3)
and therefore P (B ) and P A (B) are two completely additive set
functions, coinciding on st ; therefore they must remain equal to
each other on every set of the Borel extension Bst of the field st.
Therefore, in particular,
P (A ) = PA (A) = 1 ,
which proves our theorem.
Several other cases in which we can state that certain prob
abilities can assume only the values one and zero, were discovered
by P. Levy. See P. LEVY, Sur un theoreme de M. Khintchine, Bull.
des Sci. Math. v. 55, 1931, pp. 145-160, Theorem II.
BIBLIOGRAPHY
[ 1] .
[2] .
[1] .
[2] .
[3] .
[ 1] .
[2] .
[8] .
[ 1] .
[1] .
l l] .
[2] .
[1 ] .
[2] .
[1 ] .
[1 ] .
[1] .
BmLIOGRAPHY
BERNSTEIN, S. : On the aziomatic foundation of the theo'Y'1/ of proba
bilitll. (In Russian) . Mitt. Math. Ges. Charkov, 1917, pp. 209-274.
- Theo'Y'1/ of probabilitll, 2nd edition. ( In Russian ) . Moscow, 1927.
Government publication RSFSR.
BOREL, E. : Le8 probabiliti8 denombrables et leurs applications arith
metiques. Rend. Circ. mat. Palermo Vol. 27 ( 1909 ) Pp. 247-271.
- P.rincipes et formules clll8siques, fasc. 1 du tome I du Traiti des
probabilitis par E. BOREL et divers auteurs. Paris : Gauthier-Villars
1925.
- Applications a l'arithmetique et a la th60rie des fonctions, fasc. 1 du
tome II du TraiU des probabiliUs par E. BOREL et divers auteurs.
Paris : Gauthier-Villars 1926.
CANTELLI, F. P. : Una teoria astratta del Caleolo delle pTobabilitd.. Giorn.
1st. Ital. Attuari Vol. 3 ( 1932) pp. 257-265.
- Sullo, legge dei grandi numeri. Mem. Acad. Lincei Vol. 11 (1916 ) .
- Sullo, probabilita come limite della frequenza. Rend. Accad. Lincei
Vol. 26 ( 1917) Pp. 89-45.
COPET,AND H : The theory of probability from the poi nt of view of
admissible numbers. Ann. Math. Statist. Vol. 3 ( 1932 ) Pp. 143-156.
D6RGE, K. : Zu der von R. von Mises gegebenen Begrilndung der Wahr
scheinlichkeitsrechnung. Math. Z. Vol. 32 ( 1930) Pp. 232-258.
FRECHET, M. : Sur la convergence en probabilit6. Metron Vol. 8 ( 1930)
Pp. 1--48.
- Recherches th60riques modernes, fasc. 3 du tome I du Traite des
probabilitis par E. BOREL et divers auteurs. Paris : Gauthier-Villars.
KOLMOGOROV, A. : tJbeT die anal1ltischen Methoden in der Wahrschein
lichkeitsTechnung. Math. Ann. Vol . 104 ( 1931 ) Pp. 415--458.
- The general theo'Y'1/ of measure and the theo'Y'1/ ofl.robabilit1l. (In
Russian) . Sbornik trudow sektii totshnych nauk K. . , Vol. 1 ( 1929 )
pp. 8-21.
LtVY, P. : Calcul des probabiliUs. Paris : Gauthier-Villars.
LOMNICKI, A. : Nouveauz fondem efl.ts du calcul des pTobabilitis. Fun
dam. Math. Vol. 4 ( 1923 ) Pp. 34-71.
MISES, R. V. : Wahrscheinlichkeitsrechnung. Leipzig u. Wien : Fr.
Deuticke 1931.
[2] . - GTU'ndlagen deT Wahrscheinlichkeitsrechnung. Math. Z. Vol. 5
( 1919 ) pp. 52-99.
[8] . - Wah'T'scheinlichkeitsrechnung, Statistik und Wahrheit. Wien : Julius
Springer 1928.
[3'] . - Probabilitll, Statistics and Truth ( translation of above) . New York :
The MacMillan Company 1939.
[ 1 ] . REICHENBACH, H. : A:nomatik deT Wahrscheinlichkeitsrechnung. Math.
Z. Vol. 34 ( 1932 ) Pp. 568-619.
73
74
[1] .
[2] .
[1] .
[1] .
[2] .
Bibliography
SLUTSKY, E . : ()ber stochastische AS1lmptoten und Grenzwerte. Metron
Vol. 5 ( 1925) Pp. 3-89.
- On the question of the logical foundation of the theory of proba
bility. ( In Russian) . Westnik Statistiki, Vol. 12 ( 1922 ) , pp. 18-21.
STEINHAUS, H. : LeB probabilit6s denombrables et leur rapport d la
theorie de La m.6BUre. Fundam. Math. Vol. 4 (1928) Pp. 286-310.
TORNIER, E . : WahrBcheinlichkeitsrechnung und Zahlentheone. J. reine
angew. Math. Vol. 160 ( 1929) Pp. 177-198.
- Grundlagen der Wahrscheinlichkeitsrechnung. Acta math. Vol. 60
( 1938) Pp. 239-380.
SUPPLEMENTARY
BIBLIOGRAPHY
NOTES TO SUPPLEMENTARY BIBLIOGRAPHY
The fundamental work on the measure-theoretic approach to
probability theory is A. N. Kolmogorov's GrurulbegriJJe der
W a.krsckeinlichkeitsrecknung, of which the present work is an
English translation. It is not an overstatement to say that for
the past twenty-three years most of the research work in proba
bility has been influenced by this approach, and that the axiomatic
theory advanced by Kolmogorov is considered by workers in
probability and statistics to be the correct one.
The publication of Kolmogorov's Grurulbegriffe initiated a new
era in the theory of probability and its methods ; and the amount
of research generated by the fundamental concepts due to Kolmo
gorov has been very great indeed. In preparing this second edition
of the English translation of Kolmogorov's monograph, it seemed
desirable to give a bibliography that would in some way reflect
the present status and direction of research activity in the theory
of probability.
In recent years many excellent books have appeared. Three of
most outstanding in this group are those by Doob [12 ] , Feller
[ 17] , arid Loeve [54] . Other books dealing with general proba
bility theory, and specialized topics in probability are : [2 ] , [8 ] ,
[6 ] , [7 ] , [9 ] , [19 ] , [28] , [26] , [27] , [28 ] , [84] , [89 ] , [41. ] . [42 ] ,
[47] , [49 ] , [50] , [ 67 ] , [70] , [72 ] . Since these books contain many
references to the literature, an attempt will be made in this bibli
ography to list some of the research papers that have appeared in
the past few years and several that are in the course of publication.
The model developed by Kolmogorov can be briefly described
as follows : In every situation (that is, an experiment, observa
tion, etc.) in which random factors enter, there is an associated
probability space or triple (.0, e, p) , where .0 is an abstract space
(the space of elementary events) , � is a u-algebra of subsets of !J
(the sets of events) , and p (E) is a measure (the probability of
the event E) defined for E e e. and satisfying the condition
p ( !J ) = 1. The Kolmogorov model has recently been discussed by
77
78 Supplementary Bibliography
Los [56 ] , who considers the use of abstract algebras and u-algebras
of sets instead of algebras and u-algebras. Kolmogorov [44] has
also considered the use of metric Boolean algeb!"as in probability.
There are many problems, especially in theoretical physics, that
do not fit into the Kolmogorov theory, the reason being that these
problems involve unbounded measures. Renyi [ 68] has developed
a general axiomatic theory of probability (which contains Kolmo
gorov's theory as a special case) in which unbounded measures
are allowed. The fundamental concept in this theory is the condi
tional probability of an event. Csaszar [ 10 ] has studied the
measure-theoretic structure of the conditional probability spaces
that occur in Renyi's theory.
In another direction, examples have been given by various
authors which point up the fact that Kolmogorov's theory is too
general. Gnedenko and Kolmogorov [27] have introduced a more
restricted concept which has been termed a perfect probability
space. A perfect probability space is a triple (D, �, p ) such that for
any real-valued �measurable function g and any linear set B
for which {w : g (ro) E B} Ii �, there is a Borel set D £ B such that
P{w : g (ro) E D} = P{ro : g (ro) E B}. Recently, Blackwell [ 5 ] has
introduced a concept that is more restricted than that of a per
fect space. The concept introduced is that of a Lusin space. A
Lusin space is a pair (n. � ) such that (a ) t is separable, and
(b) the range of every real-valued �-measurable function g on
D is an analytic set. It has been shown that if (D , �, p) is a Lusin
space and p any probability measure on e. then (D. e. p) is a
perfect probability space.
In § 6 of Chap. I, Kolmogorov gives the ��finition of a Markov
chain. In recent years the theory of Markov chains and processes
has been one of the most active areas of research in probability.
An excellent introduction to this theory is given in [ 17 ] . Other
references are [2 ] , [ 3 ] , [ 6 ] , [ 12 ] , [ 19 ] . [23 ] , [26] , [34] , [39 ] ,
[ 50 ] , [54 ] , [ 67] , [70 ] , [72 ] . Two papers of interest are those of
Harris and Robbins [29 ] on the ergodic theory of Markov chains,
and Chung [8 ] on the theory of continuous parameter processes
with a denumerable number of states. The paper by Chung unifies
and extends the results due to Doob (cf. [ 12 ] ) and Levy [51. ] ,
[52 ] , [53] .
Notes 79
A number of workers in probability are utilizing the theory of
semi-groups [30 ] in the study of Markov processes and their
structural properties [63 ] . In this approach, due primarily to
Yosida [80 ] , a one-parameter ( discrete or continuous) semi
group of operators from a Banach space to itself defines the
Markov process. Hille [32] and Kato [38 ] have used semi-group
methods to integrate the Kolmogorov differential equations, and
Kendall and Reuter [ 40] have investigated several pathological
cases arising in the theory. Feller [ 18 ] and Hille [31 ] have
studied the parabolic differential equations arising in the con
tinuous case. Doob [13] has employed martingale theory in the
semi-group approach to one-dimensional diffusion processes.
Also, Hunt [33] has studied semi-groups of (probability) meas
ures on Lie groups.
Recently several papers have appeared which are devoted to a
more abstract approach to probability and consider random vari
ables with values in a topological space which may have an alge
braic structure. In [ 14 ] , [ 21 ] , [22 ] , [58] , [ 59 ] , and [61 ] , problems
associated with Banach-space-valued random variables are con
sidered ; and in [4 ] similar problems are considered for Orlicz
(generalized Lebesgue) spaces. Robbins [69] has considered
random variables with values in any compact topological group.
Segal [75] has studied the structure of probability algebras and
has used this algebraic approach to extend Kolmogorov's theorem
concerning the existence of real-valued random variables having
any preassigned joint distribution (cf. § 4 of Chap. III ) . Segal
[76, Chap. 3, § 13 ] has also considered a non-commutative proba
bility theory.
Prohorov [ 66 ] has studied convergence properties of proba
bility distributions defined on Banach spaces and other function
spaces. These problems have been considered also by LeCam [48 ]
and Parzen [64 ] .
The measure-theoretic definition and basic properties of condi
tional probabilities and conditional expectations have been given
by Kolmogorov ( Chap. IV ; cf. also [ 12 ] and [ 54] ) . Using an
abstract approach, S. T. C. Moy [60 ] has considered the prop
erties of conditional expectation as a linear transformation of
the space of all extended real-valued measurable functions on a
80 Supplementary Bibliography
probability space into itself. In [ 61 ] she considers the conditional
expectation of Banach-space-valued random variables. Naka
mura and Turamuru [62 ] consider an expectation as a given
operation of a C·-algebra ; and Umegaki [79] considers condi
tional " expectation as a mapping of a space of measurable opera
tors belonging to a �-integrable class associated with a certain
W·-algebra into itself. The work of Umegaki is concerned with
the development of a non-commutative probability theory. The
results of Segal [74 ] , Dye [ 15 ] , and others, in abstract integration
theory are utilized in the above studies. Other papers of interest
are [ 1 ] , [1 6 ] , [ 36 ] , and [ 45 ] .
The L. Schwartz theory of distributions [73] has been utilized
by Gel'fand [24 ] in the study of generalized stochastic processes ;
and by Fortet [20 ] and Ito [35] in the study of random
distributions.
Several books devoted to the study of limit theorems in proba
bility are available : [27 ] , [42 ] , [47] , and [49 ] . In addition, [ 12 ]
and [54 ] should be consulted. Research and review papers of
interest are [ 11 ] , [14] , [25 ] , [37 ] . [46] , [ 55 ] , [ 57 ] , [ 65 ] , [71 ] ,
[77] , and [78] .
[1]
[2]
[3]
[4]
[6]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[16]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
SUPPLEMENTARY BIBLIOGRAPHY
ALDA, V., On COfIditionaZ Elepectations, Czechoslovak Math. J., Vol. 6
( 1965 ) , pp. 503-606.
AIu.EY, N., On the Theof'1[ of Stochastic Processes and Their Applic/V
tion to the Theof"/l of Cosmic Radiation, Copenhagen, 1943.
BARTLETT, M. S., An Introduction to Stochastic Processes, Cambridge,
1965.
BHARUCHA-REID, A. T., On Random Elements in Orlicz Spaces,
( Abstract) Bull. Amer. Math. Soc., Vol. 62 ( 1966 ) . To appear.
BLACKWELL, D., On a Class of Probability Spaces, Proc. Third Berkeley
Symposium on Math. Statistics and Probability, Vol. 2 ( 1956 ) . To
appear.
BLANC-LAPIERRE, A., and R. FORTET, ThAorie des fonctions aMatoires,
Paris, 1953.
BOCHNER, S., HM'ttI.OfI.ic Analysis and the TheO'l"/l of Probabilitt/,
Berkeley and Los Angeles, 1966.
CHUNG, K. L., Foundations of the Theo'l"/l of Continuous Paratm6ter
Mark01J Chains, Proc. Third Berkeley Symposium on Math. Statistics
and Probabiilty, Vol. 2 ( 1966 ) . To appear.
CRAMER, H., Mathematical Methods of Statistics, Princeton, 1946.
csASZAR, A., Bu,. Ie stru.cture des espace de probabilite cOflditionnelle,
Acta Math. Acad. Sci. Hung., Vol. 6 ( 1956 ) , pp. 337--361.
DERMAN, C., and H. ROBBINS, The Strong Law of Large Numbers when
the First Moment does not Ezist, Proc. Nat. Acad. Sci. U.S.A., Vol.
41 ( 1955 ) , pp. 686-587.
DOOB, J. L., Stochastic Processes, New York, 1953.
DOOB, J. L., Martingales and One-Dimensional Diffusion, Trans. Amer.
Math. Soc., Vol. 78 (1955) , pp. 168-208.
Doss, S., Bur le thAor�me limite central pour des variables alktoires
dans espace de Banach, Publ. Inst. Statist. Univ. Paris, Vol. 3 ( 1954) ,
pp. 143--148.
DYE, H. A., The Radon-Nikodym Theorem for Finite Rings of OPBf'/V
tors, Trans. Amer. Math. Soc.,72 ( 1952) , pp. 243--280.
FABIAN, V., A Note on the Conditional Ea:pectations, Czechoslovak Math.
J., Vol. 4 ( 1954) , pp. 187-191.
FELLER, W., An Introduction to Probability TheOf"/l and Its Appliotv
tions, New York, 1950.
FELLER, W., Diffusion Processes in One Dimension, Trans. Amer. Math.
Soc., Vol. 77 (1954) , pp. 1--31.
FORTET, R., Calcul des probabilites, Paris, 1960.
FORTET, R., Random Distributions with an Application to Telephone
Engineering, Proc. Third Berkeley Symposium on Math. Statistics
and Probability, Vol. 2 (1956) . To appear. .
FORTET, R., and E. MOURIER, R�sulta.ts compl�mentaires sur les �Uments
aUatoitres pt'enant leurs valeurs dans un espace de Banach, Bull. Sci.
Math. (2). Vol. 78 (1964) , pp. 14--30.
FORTET, R., and E. MOURIER, LeB fonctions aUatoires comme eUmentB
aUatoire8 dans lea espace de Banach, Stud. Math., Vol. 16 (1966 ) ,
pp. 62-79.
81
82 Supplementary Bibliography
[28] FaEcHET, M., Recherches thAO'T'iques modernes sur Ie calcul des proba
bilitls. II. Methode des fonctions OR'bitraires. ThAarie des evenements
en chaine dans d'un nombre fini d'etats possibles, Paris, 1938.
[24] GEL'FAND, I. M., Gene,.alized Random p,.ocesses, Dokladr Akad. Nauk
SSSR ( N.S. ) , Vol. 100 ( 1966 ) , pp. 868-866. [In Russtan.]
[25] GIHMAN, I. L., Some Limit Theorems for Conditional Distributions,
Doklady Akad. Nauk SSSR ( N.S.) , Vol. 91 ( 1963 ) , pp. 1008-1006.
[In Russian.]
[26] GNEDENKO, B. V., Course in the Theof'1l of Probability, Moscow
Leningrad, 1950. [In Russian.]
[27] GNEDENKO, B. V., and A. N. KOLMOGOROV, Limit Distributions for Sums
of Independent Random Variables, Translated by K. L. Chung with
an appendix by J. L. Doob, Cambridge, 1954.
[28] HALMOS, P. R., Measure Theory, New York, 1950.
[29] HARRIs, T. E., and H. ROBBINS, Ergodic Theory of Markov Chains
Admitting an Infinite Invariant Measure, Proc. Nat. Acad. Sci.
U.S.A., Vol. 89 ( 1953 ) , pp. 860-864.
[80] HILLE, E., Functional Analysis and Semi-Groups, New York, 1948.
[31] HILLE, E., On the Integration Problem for Fokker-Planck's Equation
in the TheOf'1l of Stochastic Processes, Onzi�me congr� des math.
scand. (1949 ) , pp. 185-194.
[32] HILLE, E., On the Integration of Kolmogo'Y'off's Differential Equations,
Proe. Nat. Aead. Sci. U.S.A., Vol. 40 ( 1954) , pp. 20-26 .
[38] HUNT, G. A., Semi-Groups of Measures on Lie Groups, Trans. Amer.
Math. Soc., Vol. 81 ( 1966 ) , pp. 264-293.
[84] ITa, K., Theof'1l of P'Y'obability, Tokyo, 1958.
[85] ITa, K., Stationaf'1l Random Distributions, Mem. ColI. Sci. Univ. Kyoto,
Ser. A. Math., Vol. 28 ( 1954 ) , pp. 209-223.
[86] JIi'uNA, M., Conditional Probabilities on Strictly Sepa'Y'able �-Algebras,
Czechoslovak Math. J., Vol. 4 ( 1954 ) , pp. 372-380. [In Czech.]
[37] XALLIANPUR, G., On a Limit Theorem fOT Dependent Random Variables,
Doklady Akad. Nauk S S SR ( N.S.) Vol. 101 ( 1955 ) , pp. 13-16. [In
Russian.]
[38] KATO, T., On the Semi-Groups Generated by Kolmogoroff's Differential
Equations, J. Math: Soc. Japan, Vol. 6 ( 1954 ) , pp. 1-15.
[89] KAWADA, Y., The TheoT1I of Probabiilty, Tokyo, 1952.
[40] KENDALL, D. G., and G. E. H. REUTER, Some Pathological Markov Proc
esses with a DenumeTable Infinity of States and the Associated Semi
gTOUpS of TTansfo-rmations in Z, Proc. Symp. on Stochastic Processes
( Amsterdam ) , 1954. To appear.
[41] KHINTCHINE, A., A81/mptotische Gesetze d'T Wah'Y'scheinlichkeits,.ech
nung, Berlin, 1933. [Reprint, CHELSEA PUBLISHING COMPANY.]
[42] KHINTCHINE, A., Limit Laws of Sums of Independent Random Vari
ables, Moscow-Leningrad, 1938. [In Russian.]
[43] KOLMOGOROV, A., Grundbegriffe de,. Wahrscheinlichkeitsrechnung, Ber
lin, 1933. [The present work is an E nglish translation of this.]
[44] KOLMOGOROV, A., Algebres de BooZe metriques completes, VI Zjazd
Mat. Pols., Warsaw ( 1948 ) , pp. 21-30.
[45] KOLMOGOROV, A., A Theorem on the Convergence of Conditional Mathe
matical Ea:pectations and Some of Its Applications, Comptes Rendul
du Premier Congres des MatMmaticiens Hongrois ( 1952 ) , pp. 367-
386. [In Russian and Hungarian.]
[46] KOLMOGOROV, A., Some Work of Recent YeOR's in the Field of Limit
TheoTems in the Theory of Probability, Vestnik Moskov. Univ. Ser.
Fiz.-Mat. Estest. Nauk , Vol. 8 ( 1953 ) , pp. 29-38.
Supplementary Bibliography 83
[47] KUNISAWA, K.., Limit Theorems in Probability Theo"1l, Tokyo, 1949.
[48] LECAMJ L., Con1JfWgence 111. Distribution of Stochastic Processes, Univ. California Pub!. Statistics. To appear.
[49] LEvY, P., TMfYi"U de l'addition des 1JOh'ia,bles aUatMres, Paris, 1937.
[60] LEvv, P., ProcesBU8 stochastiquss et mOU1Jement Brownien, Paris, 1948.
[61] LEvY, P., SystMnea mGrkoviens et stationnaires. Cas denombrable, Ann.
Sci. Ecole Norm. Sup., Vol. 68 ( 1961) , pp. 327-401.
[52] LEvY, P., CompUment a l'etude des processus de MfJIf'ko/J, Ann. Sci.
Ecole Norm. Sup., Vol. 69 (1952) , pp. 203-212.
[53] LEvY, P., Processus marko1Jiens et stationnaires du cinquieme t'IIPe
(infiniU denombrable d'etats possibles, parametre continu) , C. R.
Acad. Sci. Paris, Vol. 236 ( 1953 ) , pp. 1680-1682.
[54] LOM, M., Probability TheO'1"/l, New York, 1955.
[66] Lom, M., Variational Terms and the Central Limit Problem, Proc.
Third Berkeley Symposium on Math. Statistics and Probability, Vol.
2 ( 1966) . To appear.
[56] i..08, J., On the Azioma,tic Treatment of Probability, Colloq Math., Vol.
3 ( 1955 ) , pp. 125-137.
[67] MARSAGLIA, G., Iterated Limits and the Central Limit Theo-rem fo-r
Dependent Variables, Proc. Amer. Math. Soc., Vol. 5 ( 1964) , pp. 987-
991.
[58]
[59]
[60]
[61]
[62]
[68]
[64]
[65]
[66]
[67]
[68]
[69]
[70]
[71]
[72]
MOURIER, E., EUments aUatoires dans un espace de Banach, Ann. lnat.
H. Poincare, Vol. 13 ( 1963 ) , pp. 161-244.
MOURIER, E., L-Random Elements and L-Random Elements in Banach
S'/!f!';Ces, Proc. Berkeley Symposium on Math. Statistics and Proba
bIlity, Vol. 2 ( 1956) . To appear.
Moy, S. T. C., Characterizations of Conditional E�pectation as a TratUl
fo-rmation on Function Spaces, Pacific J. Math., Vol. 4 (1964 ) , pp. 47-
68.
MoY, S. T .. C., Conditional E�pectations of Banach Space Valued
Random Variables and Their Properties, (Abstract) Bull. Amer.
Math. Soc., Vol. 62 (1966 ) . To appear.
NAKAMURA, M. and T. TURUMARU, Ezpectation in an Operator Algebra,
T6hoku Math. J., Vol. 6 (1954) , pp. 182-188.
NEVEU, J., Etude des semi-groups de MfJIf'ko/J, (Thesis) Paris, 1965.
PARZEN, E., Con1JfWgence in Distribution and Fou'l'ier-Stieltjes Trans
fo-rms of Random Functions,(Abstraet) Ann. Math. Statistics, Vol. 26
(1955 ) , p. 771.
PARZEN, E., A Central Limit Theo-rem for Multilinear Stochastic Proc
esses, (Abstract) Ann. Math. Statistics, Vol. 27 (1956) , p. 206.
PROHOROV, Yu. V., Probability Distributions in Functional Spaces,
Uspehi Matern. Nauk ( N.S . ) , Vol. 8 ( 1953) , pp. 165-167. [In Russian.]
RENYI, A., The Calculus of Probabilities, Budapest, 1954. [In Hun
garian.]
RENYI, A., On a New A�oma,tic Theo"1l of Probability, Acta Math.
Acad. Sci. Hung., Vol. 6 ( 1956 ) , pp. 285-335.
ROBBINS, H., On the Equidistribution of Sums of Independent Random
Variables, Proc. Amer. Math. Soc., Vol. 4 ( 1953 ) , pp. 786-799.
ROMANOVSKI, V. I., Discrete Markov Chains, Moscow-Leningrad, 1949.
[In Russian.]
ROSENBLATT, M., A Central Limit Theorem and a Strong Mi�ing Con
dition, Proc. Nat. Acad. Sci. U.S.A., Vol. 42 ( 1966 ) , pp. 43-47.
SARYMSAKOV, T. A., Elements of the Theo'I"J/ of Markov Processes,
Moscow, 1964. [In Russian.]
84
[78]
[74]
[76]
[76]
[77]
[78]
[79]
[80]
Supplementary Bibliography
SCHWARTZ, L., Thlorie des distributions, Paris, 1960-61.
SEGAL, I. E., A Non-Comm,uta:ti1Je Eil:t6nsion of A bswact Integra.tion,
Ann. Math. Vol. 67 ( 1963 ) , pp. 401-457.
SEGAL, I. E., A bstract Proba.bilitfl Spaces a.nd a. TheO'f'6'm of Kolmo
gorojJ, Amer. J. Math., Vol. 76 ( 1964) , pp. 177-181.
SEGAL, I. E. A Ma.thBm.a.tica.l Approach to Ele'm6'nta'f'f/ PtJ.'f'tic16s a.nd
Their Fields, University of Chicago, 1966. [Mimeographed Lecture
Notes.]
TAKANO, K., On Some Unnit Theorems of Proba.bilitfl Distributions,
Ann. Inst. Statist. Math., Tokyo, Vol. 6 ( 1964 ) , pp. 87-118.
TSURUMI, S., On the Strong La.w of La.rge Numbe'f's, T8hoku Math.
J., Vol. 7 ( 1956 ) , pp. 166-170.
UMEGAKI, H., Conditiona.l Ezpecta.tion in a.n Opera.to'f' Algebra., T8hoku
Math. J., Vol. 6 ( 1964) , pp. 177-181.
YOSIDA, K., Ope'f'a.tO'f' Theoretica.l Trea.tment of the Ma.'f'kof!'s P'f'ocess,
Proc. Imp. Acad. Japan, Vol. 14 ( 1938 ) , pp. 368-867.
C H E LSEA S C I E N T I F I C B O O KS
THEORY OF PROBABILITY
8y 8. V. GNEDENKO
This textbook, by Russia's leading probabilist, is
suitable for senior undergraduate and first-year
graduate courses. It covers, in highly readable
form, a wide range of topics and, by carefully
selected exercises and examples, keeps the reader
throughout in close touch with problems in science
and engineering.
The translation has been made from the fourth
Russian edition by Prof. B. D. Seckler. Earlier
editions have won wide and enthusiastic accept
ance as a text at many leading colleges and
universities.
"extremely well written . . . suitable for indi
vidual study . . . Gnedenko's book is a milestone in
the writing on probability theory."-Science.
Partial Contents : I. The Concept of PrQbability
(Various approaches to the definition. Space of
Elementary Events. Classical Definition. Geomet
rical Probability. Relative Frequency. Axiomatic
construction . . . ) . II. Sequences of Independent
Trials. III Markov Chains IV. Random Variables
and Distribution Functions ( Continuous and dis
crete distributions. Multidimensional d. functions.
Functions of random variables. Stieltjes integral ) .
V. Numerical Characteristics of Random Variables
(Mathematical expectation. Variance . . . Moments ) .
VI. Law of Large Numbers ( Mass phenomena.
Tchebychev's form of law. Strong law of large
numbers . . . ) . VII. Characteristic Functions ( Prop
erties. Inversion formula and uniqueness theorem.
Helly's theorems. Limit theorems. Char. functs. for
multidimensional random variables . . . ) . VIII. Clas
sical Limit Theorem ( Liapunov's theorem. Looal
limit theorem) . IX. Theory of Infinitely Divisible
Distribution Laws. X. Theory of Stochastic Proc
esses ( Generalized Markov equation. Continuous
S. processes. Purely discontinuous S. processes.
Kolmogorov-Feller equations. Homogeneous S.
processes with i ndependent increments. Stationary
S. process. Stochastic integral. Spectral theorem of
S. processes. Birkho1f-Khinchin� ergodic theorem) .
XI. Elements of Queueing Theory ( General char
acterization of the problems. Birth-and-death proc
esses. Single-server queueing systems. Flows. Ele
ments of the theory of stand-by systems ) . XII.
Elements of Statistics ( Problems. Variational se
ries. Glivenko's Theorem and Kolmogorov's cri
terion. Two-sample problem. Critical region . . .
Confidence limits) . TABLES. BIBLIOGRAPHY. AN
SWERS TO THE EXERCISES.
--4th ed. Summer, 1 967. Approx. 500 pp. 6x9. [ 1 32] $9.50
C H E L S E A S C I E N T I F I C B O O K S
GYROSCOPIC THEORY
8y G. GREENHILL
This work is intended to serve as a collection in one
place of the various methods of the theoretical
explanation of the motion of a spinning body, and
as a reference for mathematical formulas required
in practical work.
Originally published as a report to the British
Advisory Committee for Aeronautics.
CHAPTER HEADINGS : I. Steady Gyroscopic Mo
tion. II. Gyroscopic Applications. III. General Un
steady Motion of a Top or Gyroscope. IV. Geomet
rical Representation of the Motion of a Top. V.
Algebraical Cases of Top Motion. VI. Numerical
Illustrations and Diagrams. VII. The Spherical
Pendulum. VIII. M�tion referred to Moving Origin
and Axes. IX. Dynamical Problems of Steady
Motion and Small Oscillation. Fold-out plates.
-1 91 4-67. vi + 277 + P lates. 6 \12xl 03,4. [205] $9.50
COMMUTATIVE NORMED RINGS
By I. M. GELFAND, D. A. RAIKOV, and G. E. SHILOV
Translated from the Russian. In the second Eng
lish edition-to appear in 1967-Chapter II has
been revised in accordance with a manuscript espe
cially prepared for this edition by Professor Shilov.
Partial Contents : CHAPS. I AND II. General
Theory of Commutative N ormed Rings. III. Ring
of Absolutely Integrable Functions and their Dis
crete Analogues. IV. Harmonic Analysis on Com
mutative Locally Compact Groups. V. Ring of
Functions of Bounded Variation on a Line. VI.
Regular Rings. VII. Rings with Uniform Con
vergence. VIII. N armed Rinfs with an Involution
and their Representations. X. Decomposition of
N ormed Ring into Direct Sum of Ideals. HIS
TORICO-BIBLIOGRAPHICAL NOTES. BIBLIOGRAPHY.
-2nd ed. 1 967 . 306 pp. 6x9. [ 1 70] $6.50
LES I NTEGRALES DE STIELTJ ES et leurs
Appl ications aux Problemes de la Physique
Mathematique
By N. GUNTHER
The present work is a reprint of Vol. I of the
publications of the V. A. Steklov Institute of
Mathematics, in Moscow. The text is in French.
-1 932-49. 498 pp. 5%x8. [63] $6.50