柯尔莫哥洛夫-概率论基本概念

柯尔莫哥洛夫-概率论基本概念 FOUNDATIONS OF THE THEORY OF PROBABILITY BY A. N. KOLMOGOROV Second Enllli.la Edition TRANSLATION EDITED BY NATHAN MORRISON WITH AN ADDED BIBLIOCRAPHY BY A. T. BHARUCHA·REID UNIVERSITY OF ORECON CHELSEA PUBLISHING COMPANY NEW YORK C...

FOUNDATIONS OF THE THEORY OF PROBABILITY BY A. N. KOLMOGOROV Second Enllli.la Edition TRANSLATION EDITED BY NATHAN MORRISON WITH AN ADDED BIBLIOCRAPHY BY A. T. BHARUCHA·REID UNIVERSITY OF ORECON CHELSEA PUBLISHING COMPANY NEW YORK COPYRIGHT 1950 BY CHELSEA PUBLISHING COMPANY COPYRIGHT © 1956, CHELSEA PUBLISHING COMPANY LIBRARY OF CONGRESS CATALOGUE CARD NUMBER 56-11512 PRINTED IN THE UNITED STATES OF AMERICA EDITOR'S NOTE In the preparation of this English translation of Professor Kolmogorov's fundamental work, the original German monograph Grundbeuriff eder Wahrscheinlichkeitrechnunu which appeared in the Ergebnisse Der Mathematik in 1933, and also a Russian translation by G. M. Bavli published in 1936 have been used. It is a pleasure to acknowledge the invaluable assistance of two friends and former colleagues, Mrs. Ida Rhodes and Mr. D. V. Varley, and also of my niece, Gizella Gross. Thanks are also due to Mr. Roy Kuebler who made available for comparison purposes his independent English translation of the original German monograph. Nathan Morriaon PREFACE The purpose of this monograph is to give an axiomatic foundation for the theory of probability. The author set himself the task of putting in their natural place, among the general notions of modern mathematics, the basic concepts of probability theory-concepts which until recently were considered to be quite peculiar. This task would have been a rather hopeless one before the introduction of Lebesgue's theories of measure and integration. However, after Lebesgue's publication of his investigations, the analogies between measure of a set and probability of an event, and between integral of a function and mathematical expectation of a random variable, became apparent. These analogies allowed of further extensions; thus, for example, various properties of independent random variables were seen to be in complete analogy with the corresponding properties of orthogonal functions. But if probability theory was to be based on the above analogies, it still was necessary to make the theories of measure and integra tion independent of the geometric elements which were in the foreground with Lebesgue. This has been done by Frechet. While a conception of probability theory based on the above general viewpoints has been current for some time among certain mathematicians, there was lacking a complete exposition of the whole system, free of extraneous complications. (Cf., however, the book by Frechet, [2] in the bibliography. ) I wish t o call attention to those points of the present exposition which are outside the above-mentioned range of ideas familiar to the specialist. They are the following : Probability distributions in infinite-dimensional spaces ( Chapter III, § 4) ; differentiation and integration of mathematical expectations with respect to a parameter ( Chapter IV, § 5 ) ; and especially the theory of condi tional probabilities and conditional expectations ( Chapter V) . It should be emphasized that these new problems arose, of neces sity, from some perfectly concrete physical problems.1 1 Cf., e.g., the paper by M. Leontovich quoted in footnote 6 on p. 46; a110 the joint paper by the author and M. Leontovich, Zur Statistik der kontinuie". lichen Systeme und des zeitlichen Ve".lau!e8 der physikali8chen Vorgiinge. PhYI. Jour. of the USSR, Vol. 3, 1983, pp. 35-63. v vi Preface The sixth chapter contains a survey, without proofs, of some results of A. Khinchine and the author of the limitations on the applicability of the ordinary and of the strong law of large num bers. The bibliography contains some recent works which should be of interest from the point of view of the foundations of the subject. I wish to express my warm thanks to Mr. Khinchine, who nas read carefully the whole manuscript and proposed several improvements. Kljasma near Moscow, Easter 1933. .4.. Kolmogortn CONTENTS Page EDITOR'S NOTE. . • • . . . • . • . • • . . . . . . . . • . . . . . . . . • • • . . • • • • • iii P REFACE . . . . • . • • • . • • . • • • . • . . . . . . . . . . . . . . . . . . . • • • • • • • . v I. ELEMENTARY THEORY OF PROBABILITY § 1. Axioms . ...... ...................... .. ...... ... .. 2 § 2. The re lation to experimen tal data . . . . . . . . . . . . . . . . . .. 3 § 3. Notes on terminology . . . . , ....... , .... , .... , , , . , , .. 5 § 4. Immediate corollaries of the ax ioms ; conditional pro ba- bil it ies ; Theorem of Ba yes . . . . . . . . . . ....... . . , , . " 6 § 5. Independence ,.,.,., . . ,........................... 8 § 6. Conditiona l probab ilities a s random var ia ble s; Marko v chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 12 n. INFINITE PROBABILITY FIELDS § 1. Axiom of Cont in uity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 14 § 2. Bo rel fields of pro babil it y. . . . . . . . . . . . . . . . . . . . . . . . . .. 16 § 3. Examp les of in fin ite fiel ds of probabil it y . . . . . . . , . . . . . 18 III. RANDOM VARIABLES § 1. Probabi lity functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 21 § 2. De fin it ion of random variables and of d istrib ut ion f unc - tio ns . . . . . . . . . ...... . . . ... . . . . . . . . . . . .. . . . . .... 22 § 3. M ult i-d imen siona l d istr ibution f unct ions . . , . . . . . . . . . .. 24 § 4. Probab ilities in infin ite-d im ens ional spaces . , . . . . . . . . .. 27 § 5. Eq uiva lent ran do m var ia bl es ; vari ous k inds of converg- ence . . . . .. . . . . . , ........ , ..... , ..... " ........ 33 IV. MATHEMATICAL EXPECTATION8 § 1. Abstract Lebesgue inte grals , , .... , .. , .. , .......... , 37 § 2. Abso lute and cond iti onal mathematical e xpectat ions . , .. 39 § 3. The Tcheb ycheff ine qua lit y . . . . . . . . . . . . . . . . . . . , , .. " 42 § 4. So me cr iter ia for con vergence ....... ' . . . . .. . , , . . . . , .. 43 § 5. Differentiat ion an d inte gra tion of mathe mat ical expecta - tions w ith re spect to a para meter . ... � , . . . . . . . . . . .. 44 vii viii V. CONDITIONAL PROBABILITIES AND MATHEMATICAL EXPECTATIONS § 1 . Conditional probabilities . . . . . . . . . . . . . . . . . . .. . . . .. . . 47 § 2. Explanation of a Borel paradox . . . . . . . . . . . . . . . . . . . .. 50 § 3. Conditional probabilities with respect to a random vari- able . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 § 4. Conditional mathematical expectations. . . . . . . . . . . . . .. 52 VI. INDEPENDENCE; THE LAw OF LARGE NUMBERS § 1. Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 § 2. Independent random variables . . . . . . . . . . . . . . . . . . . . . .. 58 § 3. The Law of Large Numbers . . . . . . . . . . . .. . . . . . . . . . . . . 61 § 4. Notes on the concept of mathematical expectation . . .. . . 64 § 5. The Strong Law of Large Numbers; convergence of a series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 ApPENDIx-Zero-or-one law in the theory of probability . . ... 69 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . 73 NOTES TO SUPFLEMENTARY BIBLIOGRAPHY ... . . . ... . . " .... 77 SUPPLEMENTARY BIBLIOGRAPHy . . . . . . . . . . . . . . . . . .. . . . . .. 81 Chapter I ELEMENTARY THEORY OF PROBABILITY We define as elementary theory of probability that part of the theory in which we have to deal with probabilities of only a finite number of events. The theorems which we derive here can be applied also to the problems connected with an infinite number of random events. However, when the latter are studied, essen tially new principles are used. Therefore the only axiom of the mathematical theory of probability which deals particularly with the case of an infinite number of random events is not introduced until the beginning of Chapter II (Axiom VI) . The theory of probability, as a mathematical discipline, can and should be developed from axioms in exactly the same way as Geometry and Algebra. This means that after we have defined the elements to be studied and their basic relations, and have stated the axioms by which these relations are to be governed, all further exposition must be based exclusively on these axioms, independent of the usual concrete meaning of these elements and their relations. In accordance with the above, in § 1 the concept of a field of probabilities is defined as a system of sets which satisfies certain conditions. What the elements of this set represent is of no im portance in the purely mathematical development of the theory of probability (cf. the introduction of basic geometric concepts in the Foundations of Geometry by Hilbert, or the definitions of groups, rings and fields in abstract algebra) . Every axiomatic ( abstract ) theory admits, as is well known, of an unlimited number of concrete interpretations besides those from which it was derived. Thus we find applications in fields of science which have no relation to the concepts of random event and of probability in the precise meaning of these words. The postulational basis of the theory of probability can be established by different methods in respect to the selection of axioms as well as in the selection of basic concepts and relations. However, if our aim is to achieve the utmost simplicity both in 1 2 I. Elementary Theory of Probability the system of axioms and in the further development of the theory, then the postulational concepts of a random event and its probability seem the most suitable. There are other postula tional systems of the theory of probability, particularly those in which the concept of probability is not treated as one of the basic concepts, but is itself expressed by means of other concepts.1 However, in that case, the aim is different, namely, to tie up as closely as possible the mathematical theory with the empirical development of the theory of probability. § 1. Axiomaa Let E be a collection of elements �, "1, " • • • , which we shall call elementary events, and if a set of subsets of E; the elements of the set lY will be called random events. I. lY is a fieldS of sets. II. � contains the s,et E. III. To each set A in � is assigned a non-negative real number peA). This number peA) is called the probability of the event A. IV. peE) equals 1 . V. If A and B have no element in common, then peA + B) = P(A) + PCB) A system of sets, �, together with a definite assignment of numbers peA), satisfying Axi-oms I-V, is called a field of prob ability. Our system of Axioms I-V is consistent. This is proved by the following example. Let E consist of the single element � and let � consist of E and the null set O. peE) is then set equal to 1 and P(O) equals O. 1 For example, R. von Mises[1] and [2] and S. Bernstein [1] . , The reader who wishes from the outset to give a concrete meaning to the following axioms, is referred to § 2. • Cf. HAUSDORFF, Mengenlehre, 1927, p. 78. A system of sets is called a field if the sum, product, and difference of two sets of the system also belong to the same system. Every non-empty field contains the null set O. Using Hausdorff's notation, we designate the product of A and B by AB ; .. the sum by A + B in the case where AB = 0; and in the general ease by A + B; the difference of A and B by A-B. The set E-A, which is the complement of A, will be denoted by A. We shall assume that the reader is familiar with the fundamental rules of operations of sets and their sums, products, and differenees. All subsets of 8' will be designated by Latin capitals. § 2. The Relation to Experimental Data 3 Our system of axioms is not, however, complete, for in various problems in the theory of probability different fields of proba bility have to be examined. The Construction of Fields of Probability. The simplest fields of probability are constructed as follows. We take an arbitrary finite set E = {EL• EI • • • •• Et} and an arbitrary set {PI' Pi" .. , Pt} of non-negative numbers with the sum Pl + P2 + . . . + Pk = 1. � is taken as the set of all subsets in E, and we put P{"" ''- ' " ., Ei.t} = Pi, + P,- + . . . + 1"1' In such cases, Ph P2, • • • , p" are called the probabilities of the elementary events ell e20 . . . , �" or simply elementary probabili ties. In this way are derived all possible finite fields of probability in which if consists of the set of all subsets of E. (The field of probability is called finite if the set E is finite.) For further examples see Chap. II, § 3. § 2. The Relation to Experimental Data4 We apply the theory of probability to the actual world of experiments in the following manner: 1) There is assumed a complex of conditions,S, which allows of any number of repetitions. 2) We study a definite set of events which could take place as a result of the establishment of the conditions S. In individual cases where the conditions are realized, the events occur, gener ally, in different ways. Let E be the set of all possible variants ill f2, . . . of the outcome of the given events. Some of these vari ants might in general not occur. We include in set E all the vari ants which we regard a priori as possible. 3) If the variant of the events which has actually occurred • The reader who is interested in the purely mathematical development of the theory only, need not read this section, since the work following it is based only upon the axioms in § 1 and makes no use of the present discussion. Here we limit ourselves to a simple explanation of how the axioms of the theory of probability arose and disregard the deep philosophical dissertations on the concept of probability in the experimental world. In establishing the premises necessary for the applicability of the theory of probability to the world of actual events, the author has used, in large measure, the work of R. v. Mises, [1] pp.21-27. 4 I. Elementary Theory of Probability upon realization of conditions S belongs to the set A ( defined in any way), then we say that the event A has taken place. Example : Let the complex S of conditions be the tossing of a coin two times. The set of events mentioned in Paragraph 2}con sists of the faCt that at each toss either a head or tail may come up. From this it follows that only four different variants (elementary events) are possible, namely : HR, RT, TR, TT. If the "event A" connotes the occurrence of a repetit ion, then it will consist of a happening of either of the first or fourth of the four elementary events. In this manner, every event may be regarded as a set of elementary events. 4) Under certain conditions, which we shall not discuss here, we may assume that to an event A which may or may not occur under conditions S, is assigned a real number P(A) which has the following characteristics: (a) One can be practically certain that if the complex of con dit ions S is repeated a large number of t imes, n, then if m be the number of occurrences of event A, the ratio mIn will differ very slightly from P (A) . (b) If P(A) is very small, one can be practically certain that when conditions S are realized only once, the event A would not occur at all. The Empirical De duction of the Axioms. In general, one may assume that the system � of the observed events A, B, C, . . . to which are assigned definite probabilit ies, form a field containing as an element the set E (Axioms I , I I, and the first part of III , postulating the existence of probabilities ) . It is c lear that o 0, then the quotient P (B) = P(AB) .t. P(A) (5) is defined to be the conditional probability of the event B under the condition A. From (5) it follows immediately that § 4. Immediate Corollaries of the Axioms 7 P(AB) =P(A) PA(E) • (6 ) And by induction we obtain the general formula (the Multi- plication Theorem) P (AI AI' .. A,,) = P (At) P A, (AI) p .• , A, (A3) . . . P A, A, ... AM- t (All)' (7) The following theorems follow easily: P4(B) > 0, PiE ) = 1, P A(B + C) = P A(B) + p .. (C). (8) (9) (10) Comparing formulae (8 )-(10 ) with axioms III-V, we find that the system tr of sets together with the set function P A(B) (pro vided A is a fixed set) , form a field of probability and therefore, all the above general theorems concerning P (B) hold true for the conditional probability P A(B) (provided the event A is fixed ) . It is also easy to see that P .l(A) = 1 . From (6 ) and the analogous formula P (AB) = PCB) PB (A) we obtain the important formula : (11 ) P (A) = P(AlP..dB) (12) B P(B) ' which contains, in essence, the Theorem of Bayes. THE THEOREM ON TOTAL PROBABILITY: Let Al + A2 + . . . + An = E (th is assumes that the events Ah A2, • • • , An are mutually exclusive) and let X be arbitrary. Then P(X) = P(A1) PA,(X) + P(Az) PA,(X) + . . . + P(AII) PAM(X) . . (13) Proof: X = A1X + A2X + . . . + A"X; using ( 4 ) we have P(X) = peAl X) +P(A2 X) + . . . + P (A" X) and according to (6 ) we have at the same time P (A;X) = P (Ai) PAt (X). THE THEOREM OF BAYES: Let Al + A2 + . . . + An = E and X be arbitrary, then , P(Aj)PA,(X) \ Px(A.) = P(A1)PA,(X) + P(A2)PA.(X) + ... :t peA,,) PA.(X)' (14) � = 1, 2, 3 • . . . ,n. 8 I. Elementary Theory of Probability AI, Az, • • • , Aft are often called "hypotheses" and formula (14) is considered as the probability Px (Ai) of the hypothesis AI after the occurrence of event X. [P (At) then denotes the a priori probability of Ai'] Proof: From (12) we have P (A-) _ P(A;) PAi(X) x • - P(X) . To obtain the formula (14) it only remains to substitute for the probability P (X) its value derived from (13) by applying the theorem on total probability. § 5. Independence The concept of mutual independence of two or more experi ments holds, in a certain sense, a central position in the theory of probability. Indeed, as we have already seen, the theory of probability can be regarded from the mathematical point of view as a special application of the general theory of additive set func tions. One naturally asks, how did it happen that the theory of probability developed into a large individual science possessing its own methods? In order to answer this question, we must point out the spe cialization undergone by general problems in the theory of addi tive set functions when they are proposed in the theory of probability. The fact that our additive set function P(A) is non-negative and satisfies the condition P(E) = 1, does not in itself cause new difficulties. Random variables (see Chap. III) from a mathe matical point of view represent merely functions measurable with respect to P (A), while their mathematical expectations are abstract Lebesgue integrals. (This analogy was explained fully for the first time in the work of Frechet6.) The mere introduction of the above concepts, therefore, would not be sufficient to pro duce a basis for the development of a large new theory. Historically, the independence of experiments and random variables represents the very mathematical concept that has given the theory of probability its peculiar stamp. The classical work or LaPlace, Poisson, Tchebychev, Markov, Liapounov, Mises, and • See Frechet [1] and [2]. § 5. Independence 9 Bernstein is actually dedicated to the fundamental investigation of series of independent random variables. Though the latest dissertations (Markov, Bernstein and others) frequently fail to assume complete independence, they nevertheless reveal the necessity of introducing analogous, weaker, conditions, in order to obtain sufficiently significant results (see in this chapter § 6, Markov chains). We thus see,in the concept of independence, at least the germ of the peculiar type of problem in probability theory. In this book, however, we shall not stress that fact, for here we are interested mainly in the logical foundation for the specialized investigations of the theory of probability. In consequence, one of the most important problems in the philosophy of the natural sciences is--:-in addition to the well known one regarding the essence of the concept of probability itself-4;o make precise the premises which would make it possible to regard any given real events as independent. This question, however, is beyond the scope of this book. Let us turn to the definition of independence. Given nexperi ments �(l), �(2), • • • , 2{(>I), that is, n decompositions i = 1.2 • . . .• n of the basic set E. It is then possible to assign r = r1r2 • • • r" proba bilities (in the general case ) PIJ, q, • . . q. = P (A�:) A�:) ... A��») � 0 which are entirely arbitrary except for the single condition7 that (1) DEFINITION 1. n experiments 2{(1), 2{(2), • • • , 2{(n) are called mutually independent, if for any qu Q2, • • • , q .. the following equation holds true: P(A�:)A��) ... A��») = P(A�!») P (A�:») ... P(A��»). (2) • One may construct a field o f probability with arbitrary probabilities sub ject only to the above-mentioned conditions, as follows: E is composed of r elements ¢� II, • . . II.. . Let the corresponding elementary probabilities be PfJ, fJ, • • • ,,_, and finally let A � be the set of all i;IJ, IJ, • • • fl. for which qi = q. 10 I. Elementary Tl'eory of Probability Among the r equations in (2), there are only r-r1-r2- • • • -r" + n - 1 independent equations8• THEOREM I. If n experiments 91(1), 91(2), . . • , 91(") are mutu ally independent, then any m of them (m < n) , 9,[(;'), �(il), • • • , �(i .. ), are also independent9• In the case of independence we then have the equations: p (A�:')A��) ... A�::») = p (A�:'») p (A�:'») . . . p (A�::») (3) (all ik must be different.) DEFINITION II. n events Ah A2, • • • , A" are mutMlly indepen dent, if the deconpositions (trials) E = Ak + Ak (k = 1.2 •. . .• n) are independent. In this case r1 = r2 = . . . = r" == 2, r = 2"; therefore. of the 2" equations in (2) only 2" - n - 1 are independent. The necessary and sufficient conditions for the independence of the events Al, A2 • . . . , A" are the following 2" - n - 1 equations"o: (4) m = 1, 2, . . . , n, All of these eqMtions are mutually independent. In the case n = 2 we obtain from (4) only one condition (22 -2- • Actually, in the ease of independence, one may choose arbitrarily only ')', +,.. + . . . + 'R probabilities plil = P(A(j)) so as to comply with the n conditions q y LP�) = 1. '/ Therefore, in the general case, we have 1'-1 degrees of freedom, but in the ease of independence only ')', + 1', + . . . + Y" -no o To prove this it is sufficient to show that from the rnl.ltUI&1 independence of n decompositions follows the mutual independence of the first n-l. Let us assume that the equations (2) hold. Then P(A(I'A") AIR-I') = ""Ii:"P {A(1'A '�) A'O)) q, II .. • • • 1/" _ I � fit Ij� • • .. I./H qo p(A'I)) P(A"») P (AIR-I») �P (Ala)) = p(A'I») p(A'2») p(A'·-I)) 1:' 9. , 'I,, '" .. 9'._1 � 9,. 91 'Ii " " · f .. - I ... II.. Q.E.D. 1. See S. N. Bernstein [1] pp. 47-57. However, the reader can easily prove this himself (using mathematical induction). § 5. Independence 11 1 = 1) for the independence of two events Al and A2 : P(AIA2) = P(Al) P(A2). (5) The system of equations (2) reduces itself, in this case, to three equations, besides (5) : P(AIA2) = P(Al) P(A2) P(AIA2) = peAl) P(A2) P(tLA2) = peAl) P(A2) which obviously follow from (5).11 It need hardly be remarked that from the independence of the events At, A2, • • • , A .. in pairs, i.e. from the relations (i 4';) it does not at all follow that when n > 2 these events are inde pendent12• (For that we need the existence of all equations (4).) In introducing the concept of independence,no use was made of conditional probability. Our aim has been to explain as clearly as possible,in a purely mathematical manner, the meaning of this concept. Its applications, however, generally depend upon the properties of certain conditional probabilities. If we assume that all probabilities P (Aq (I» are positive, then from the equations (3) it follows13 that P ,t(iIIACI,1 •• • ,t(l .. -I) (A�:-») = P (A�:») . '1 9, (1 .. -1 (6) From the fact that formulas (6) hold, and from the Multiplica tion Theorem (Formula (7), § 4), follow the formulas (2). We obtain, therefore, THEOREM II: A necessary and sufficient condition for inde pendence of experiments 91(1), 91(2), • • • , 91(") in the case of posi- 11 P(A1A.) - P(A1) - P(AIA2) := P(AI) - P(A1} P(A.) = P(AI) {, - P(A.)} - P(AJ P(A-a>, etc. 12 This can be shown by the following simple example (S. N. Bernstein) : Let set E be composed of four elements�l' �t. �3' Eo; the corresponding elemen tary probabilities Pt, p" p., p. are each assumed to be � and A ={�1'�1}' B={�l.E:l}' C=Ut.E,}. It is easy to compute that P(A) = PCB) = P(C) = IAI, P(AB)=P{BC) =P{AC) = l.4 = (IAI)', P(ABC) = 1;4 =#0 (IAI)' • .. To prove it, one must keep in mind the definition of conditional proba bility (Formula (6), § 4) and substitute for the probabilities of products the products of probabilities according to formula (3). 12 I. Elementary Theory of Probability tive probabilities P (A�i») is that the conditional probability of the results Aq(i) of experiments m:(') under the hypothesis that several other tests �(i,), �(iol, _ _ _ , �(ik) have had definite results A (i,) A «(,) A (i.) A (ik) is equal to the absolute probability �I ' II ' t. ' · · ·' 'I P(A/i» . On the basis of formulas (4) we can prove in an analogous manner the following theorem: THEOREM III. If all probabilities P (Ale) are positive, then a necessary and sufficient condition for mutual independence of the events Ah Au ... , A" is the satisfaction of the equations P .Ai, Ai. - -- Ai� (Ai) = P (Ai) (7) for any pairwise different ih i2, • • • , ik, i. In the case n = 2 the conditions (7) reduce to two equations: P .. dAI) = P{A2)' } (8) P A. (AI) = P (AI) . It is easy to see that the tirst equation in (8) alone is a necessary and sufficient condition for the independence of Al and A2 pro vided P(A1) > O. § 6. Conditional Probabilities as Random Variables, Markov Chains Let m: be a decomposition of the fundamental set E: E = Al + A2 + . . . +Ar, and x a real function of the elementary event t,which for every set Aq is equal to a corresponding constant aq• x is then called a random variable, and the sum is called the mathematical expectation of the variable x. The theory of random variables will be developed in Chaps. III and IV. We shall not limit ourselves there merely to those random vari ables which can assume only a tinite number of different values. A random variable which for every set Aq assumes the value P Ag, (B), we shall call the conditional probability of the event B after the given experiment � and shall designat.e it by p. (B). Two experiments U(I) and m:(2) are independent if, and only if, § 6. Conditional Probabilities as Random Variables, Markov Chains 13 q=1,2, . . .• " •• Given any decompositions (experiments) 21(1). 2{(2), • • • , 2{(n), we we shall represent by 2{(1)2{(2) • • • 21(") the decomposition of set E into the products ...t (l)A (2) A (n) �, ... • • • f.' Experiments !l(1). !l(2), • • • , !len) are mutually independent when and only when P.m•II) .. . . Ik-t) (A�) = P(A�), k and q being arbitrary14. DEFINITION: The sequence 2(1), 21(2), • • • , !l(n), • • • forms a Markov chain if for arbitrary nand q PI!I(I)I!IIO) • • • 2(lft-1) (A�n) = PI!II"_I) (A�"). Thus. Markov chains form a natural generalization of se· quences of mutually independent experiments. If we set p" .. ".(m,n)=PAI")(A�"I) m 0 , i = 1 , 2 , 3 , · · · , n . , Cf., for example, LEBEsGuE, Le�ons sur l'integration, 1928, p. 152-156 . • See the previous note. • For a definition of Borel sets in R see HAUSDORFF, Mengenlehre, 1927. pp. 1 77·181. 20 II. Infinite Probability Fields F (a1• a2 • . . . • aft) is called the distribution function of the vari ables Xl> X2 • • • • • X". The investigation of fields of probability of the above type is sufficient for all classical problems in the theory of probability6. In particular, a pro.bability function in R" can be defined thus : We take any non-negative point functio.n { ( Xl ! X2 • • • • • Xn ) defined in R". such that and set +00 +00 + "" f J · · · f I (XI , X2 , · · · , x,.) dx1 dx2 • • • dx,, = 1 - ('<) - 00 - 00 P (A) = f f . . . f ! (xI , x2 • " ' , xn) dx! dX2 ' " dx,. (5) A f (X" X2, • • • , xn) is. in this case. the probability density at the po.int (Xl . X2 • • • • • Xn) (cf. Chap. III. § 2 ) . Ano.ther type o.f pro.bability function in R" is o.btained in the follo.wing manner : Let {�,:} be a sequence o.f po.ints o.f R". and let { P; } be a sequence o.f non-negative real numbers. such that � PI = 1 ; we then set, as we did in Example I, P (A) = J;'PI. where the summatio.n I' extends over all indices i for which e belo.ngs to. A. The two types o.f pro.bability functio.ns in R" men tio.ned here do. no.t exhaust all po.ssibilities, but are usually con sidered sufficient for applicatio.ns of the theory of probability. Nevertheless. we can imagine problems of interest for applica tions o.utside of this classical region in which elementary events are defined by means o.f an infinite number of co.ordinates. The correspo.nding fields o.f probability we shall study mo.re closely after intro.ducing several concepts needed for this purpose. (Cf. Chap. III. § 3 ) . • Cf., for example, R. v. MISES [ 1 ] , pp. 13-19. Here the existence of proba· bilities for "all practically possible" sets of an n-dimensional space is required. Chapter III RANDOM VARIABLES § 1. Probability Functions Giv en a mapping of the set E into a set E' consisting of any type of elements, i.e., a single-valued function u ( �) defined on E, whose values belong to E'. To each subset A' of E' we shall put into correspondence, as its pre-image in E, the set u-1 (A') of all elements of E which map onto elements of A'. Let n:(u) be the system of all subsets A' of E', whose pre-images belong to the field ij. n:(u) will then also be a field. If n: happens to be a Borel field, the same will be true of n:(ltl . We now set Pc,,) (A' ) = p {u - I (A')} . ( 1 ) Since this set-function P(u) , defined on n:(u), satisfies with respect to the field n:(u' all of our Axioms I - VI, it represents a proba bility function on n: (u) . Before turning to the proof of all the facts just stated, we shall formulate the following definition. DEFINITION. Given a single-valued function u (e) of a random event �. The function P(u) (A' ) , defined by (1 ) , is then called the probability function of u. R.emark 1 : In studying fields of probability m, P ) , we call the function P (A ) simply the prob?J>ility function, but PC,,) (A') is called the probability function of u. In the case u (e) = e, PC,,) (A' ) coincides with P (A ) . Remark 2 : The event u-1 (A') consists of the fact that u (�) belongs to A'. Therefore, pcu' (A') is the probability of u (A ') defined on lJ'. This probability function is called the n-dimensional probability function of the random va', iables X10 X2 • • • • • x". As follows directly from the definition of a random variable. the field lJ' contains. for each choice of i and aj (i = 1. 2 • . . . • n), the set of all points in R" for which Xi < ai' Therefore lJ' also con tains the intersection of the above sets. i.e. the set L., a, . . . a. of all points of Rn for which all the inequalities Xi <.. ai hold (i = 1 . 2, . . . • n) 1. If we now denote as the n-dimensional half-open interval the set of all points in R", for which a. < xi < bit then we see at once that each such interval belongs to the field lJ' since [al• at • • • • • all ; bl• b2 • • • • • b,,) = L", bs • • . b. - La, b, . . . b. - Lb, a. '" . . . b .. - • • • - Lb, b • . . . b" - I /Ja • The Borel extension of the system of all n-dimensional half open intervals consists of all Borel sets in R". From this it follows that in the case of a Borel field of probability, the field � contains all the Borel sets in the space R". THEOREM : In the case of a Borel field of probability each Borel function X = f ( Xl. X2 • • • • • x,,) of a finite number of random vari ables Xl. X2 • • • • • X" is also a random variable. All we need to prove this is to point out that the set of all points (Xl' X2, • • • , X,,) in R" for which X = ! (xl• X2, • • • , x,,) < a, is a Borel set. In particular, all finite sums and products of random variables are also random variables. DEFINITION : The function F<:z .. :z • • · · · . :z.) (a a a ) - p<:z, . :z. " ". z·> (L ) l' I ' .. · '" II - tl1 0 • . . . a" is called the n-dimensional distribution. function of the random variables Xl, X2. • • . , X". As in the one-dimensional case, we prove that the n-dimensional distribution function FlZ, . z . . . . . • z.) (alo a2, • • • , an) is non-decreas ing and continuous on the left in each variable. In analogy to equations (3) and ( 4 ) in § 2, we here have , The II, may a1so assume the infinite values ± CIC> . 26 III. Random Variables lim F(a1, a. , . . . , aft) = F(at, • . • , a' _ l > - 00 , a. + l , . . . , a,,) = O . (7) "' - -ao lim F(at, a • • . • . , a..) = F(+oo, + 00 , . . . , +(0) = 1 . (8) ., - +ao . •. -+ + ... . . . . . tJa _ +ag The distribution function F(z, z • . . . %,,) gives directly the values of p(Zl . ZI • • • • • z .. ) only for the special sets L . If our field how-41 a • • • • a" , ever, is a Borel field, then2 P(Zh Z • • · · · 0 Jr,,) is uniquely determined for all Borel se.ts in R" by knowledge of the distribution function F(z', XI , • • "' z.) • we call this derivative the n-dimensional probability density of the random variables Xl, X2, . . . , Xn at the point ah az , . . . , a". If also for every point (al, a2, . . . , an ) 11, III u" F X., • • • , x .. is called continuous. For every Borel set A c R", we have the equality p(r .. Jr, . · . . . %·) (A ) =1J. . ·11(a1 . a2 • • • • , a,,) da1 da2 • • • da" . (9 ) • In closing this section we shall make one more remark about the relationships between the various probability functions and distribution functions. Given the substitution S = (1 , 2 , \11 , il • . . - , . . . , and let ,.s denote the transformation � = � ( k = 1 , 2, . . . , n) of space R" into itself. It is then obvious that p(ri •• :r, • • · . . · :ri.) (A) = P(z, . z • • . . . . z .. l {ri1 (A)} . ( 10) Now let x' = Pk ( X ) be the "projection" of the space R" on the space Rk ( k < n) . so that the point ( Xl. X20 • • • • X .. ) is mapped onto the point (Xl> X., • • • • Xk ) ' Then, as a result of Formula (2) in § 1, • Cf. § 8, IV in the Second Chapter. § 4. Probabilities in Infinite-dimensional Spaces 27 p of the n-dimensional space R-. A subset A of E we shall call a cylinder set if it can be represented in the form A = P;.l,. . . . . ,.. (A:) where A' is a subset of Ra. The class of all cylinder sets coincides, therefore, with the class of all sets which can be defined by rela tions of the form I Cf. HAUSDORFF. Mengtmlehre. 1927. p. 23. 28 III. Random Variables I (XI'. , x" " . . . , XI'.) = 0 • ( 1 ) I n order to determine an arbitrary cylinder set P"' ''' . . . /'R (A') by such a relation, we need only take as f a function which equals 0 on A', but outside of A' equals unity. A cylinder set is a Borel cylinder set if the corresponding set A' is a Borel set. All Borel cylinder sets of the space RM form a field, which we shall henceforth denote by 3M'. The Borel extension of the field ';jM we shall denote, as always, by B';jM. Sets in BijM we shall call Borel sets of the space RM. Later on we shall give a method of constructing and operating with probability functions on ffM, and consequently, by means of the Extension Theorem, on B';jM also. We obtain in this manner fields of probability sufficient for all purposes in the case that the set M is denumerable. We can therefore handle all questions touching upon a denumerable sequence of random variables. But if M is not denumerable, many simple and interesting subsets of RM remain outside of B'ijM. For example, the set of all elements t for which x,. remains smaller than a fixed constant for all indices p., does not belong to the system B'ijM if the set M is non-denumerable. It is therefore desirable to try whenever possible to put each problem in such a form that the space of all elementary events t has only a denumerable set of coordinates. Let a probability function p e A ) be defined on 'ijM. We may then regard every coordinate x,. of the elementary event t as a random variable. In consequence, every finite group (xI'" x". ' . . . • x,..) of these coordinates has an n-dimensional probability function P ",1' • . . . ,.. (A) and a corresponding distribu- • F rom the above it follows that Borel cylinder sets are Borel sets definable by relations of type ( 1 ) . Now let A and B be two Borel cylinder sets defined by the relations / (X" . , x" • • . . . , x".) = 0 , g (x)." x.!. ' . • . , Xl .. ) = 0 • Then we can define the sets A + B, A B, and A - B respectively by the relations t · g = 0 , t' + gl = O . t' + Q) (g) = 0 . where Q) (x) = 0 for x + 0 and Q) (0) = 1 If f and 9 are Borel functions, so also are f·g, r + g' and f' + w (g ) ; therefore, A + B, AB and A - B are Borel cylinder sets. Thus we have shown that the system of sets �M is a field. § 4. Probabilities in Infinite-dimensional Spaces 29 tion function F,.,,.. " ' I'J al, a2, • • • , an) . It is obvious that for every Borel cylinder set A - p-l (A ') - 1', ,. . . . . 1'0 , the following equation holds : P (A ) = P 1', 1' . . . . 1' .. (A') , where A' is a Borel set of Rn. In this manner, the probability function P is uniquely determined on the field � of all cylinder sets by means of the values of all finite probability functions PI',,.. . . . 1''' for all Borel sets of the corresponding spaces R". However, for Borel sets, the values of the probability functions P 1',1' . . . . ,... are uniquely determined by means of the corresponding distribution functions. We have thus proved the following theorem : T.he set of aU finite-dimensional distribU!tion functions Fp,,. . . . . ,' .. uniqU!ely determines the probability function p eA ) for all sets in �M. If p eA ) is defined on �M, then ( according to the extension theorem ) it is uniquely determined on B�M by the values of the distribution functio'Yll F 1', ,... . . . ,.,. • We may now ask the following. Under what conditions does a system of distribution functions F"'�I . . . Jl" given a priori define a field of probability on �M ( and, consequently, on B';JM) ? We must first note that every distribution function F,I. ,. • . . . ,. .. . must satisfy the conditions given in § 3, III of the second chap ter ; indeed this is contained in the very concept of distribution function. Besides, as a result of formulas ( 13) and ( 14) in § 2, we have also the following relations : FIAi.IIoi. . . . �Jtli. , tli. , . . . . �J = F,t. I'" " ,I. (al • ai • . . . . all) . (2) Fp.I' . . . . Pl (al • ai . . . . . ak) = FI'.I' . . . . , • • (al • a2 • . . . • ab + 00 • . . . • +(0) , (3) where k < n and (� ' 2, . . . ' �) is an arbitrary permutation . . tl ' t2 J • • • ) tIl These necessary conditions prove also to be sufficient. as will appear from the following theorem. FUNDAMENTAL THEOREM: Every system of distribution func tions F",1'� " . P'" satisfying the conditions (2) and (3) , defines a probability function p eA ) on �M, which satisfies Axioms 1 - VI. This probability function P (A ) can be extended ( by the exten sion t1/;eorem) to BifM also. 30 III. Random Variables Proof. Given the distribution functions Fp.'I'I " ' /'. ' satisfying the general conditions of Chap. II, § 3, III and also conditions (2) and ( 3 ) . Every distribution function Fl'l fJ. . . . . fJ. .. defines uniquely a corresponding probability function P 1', 1'1 " ' 1'. for all Borel sets of Rn ( cf. § 3 ) . We shall deal in the future only with Borel sets of R" and with Borel cylinder sets in E. For every cylinder set we set A - ,1, - 1 (A') - r #-1 1-'1 • • • �,. , (4) Since the same cylinder set A can be defined by various sets A', we must first show that formula ( 4 ) yields always the same value for P (A) . Let (xl'" xfJ.a ' . . . • x".) be a finite system of random variables x,.. Proceeding from the probability function PfJ.a fJ.I , , ' 1' .. of these random variables, we can, in accordance with the rules in § 3, define the probability function P "Vi, ' " fJ.i1 of each subsystem (xI',' • xp. . , • . . , x/ •. ) • From equations (2) and ( 3 ) it follows that I II "" this probability function defined according to § 3 is the same as the function PP.Vi . . . . P.ik given a priori. We shall now suppose that the cylinder set A is defined by means of A = p-l (A') !'i,fJ.i. · · · I'i1 and simultaneously by means of A = p-l (A") /';,fJ.i • . . . 1';", where all random variables XI'I and xl'J belong to the system (xl'" x". ' . . . , x,u .. ) , which is obviously not an essential restriction. The conditions and (X,u • XI' • • • • , x" ) c A" Jl JI rjm are equivalent. Therefore Ppi, fJ.i . . . . l'ik (A·) = PfJ., ,, . . . . ,u. {(xPi, . x,ui. ' . . . , XfJ.ik) c A'} = p", ,u . . . . ,u .. {(x";, , xl'i. ' • • • , x,uf"') c A'l = Ppi, ,ui . . . . "i'" (A") . which proves our statement concerning the uniqueness of the definition of P ( A ) . § 4. Probabilities in Infinite-dimensional Spaces 31 Let us now prove that the field of probability (�, P) satisfies all the Axioms I - VI. Axiom I requires merely that iYM be a field. This fact has already been proven above. Moreover, for an arbi trary p. : E = p-l (RI) , I' P (E) = PI' (Rl) = 1 , which proves that Axioms II and IV apply in this case. Finally, from the definition of P ( A ) it follows at once that P ( A ) is non negative ( Axiom III) . It is only slightly more complicated to prove that Axiom V is also satisfied. In order to do so, we investigate two cylinder sets A = p-l (A') I'i. I'i, . . . I'ik and B = p-l (B') . 1';. 1';' . . . I'i .. We shall assume that all variables xI" and xfIJ belong to one inclu sive finite system (Xf.ll ' x,.. , • . . , XI'R) . If the sets A and B do not intersect, the relations( ) A' xl'il ' xl'is ' . . . , X" il C and (xu . , Xu . , • • • , xu . ) C B' r1. rJI r1k are incompatible. Therefore P (A + B) = Pf.l. ,. . . . . f.lR { (Xl'i. ' xf.li. " . • , Xf.lil) c A' or (xp;. ' xf.l;. ' . . . , x!';".) c B'} = Pf.l. " , . . . I,. { (Xl'i. , x" is ' . . . , Xf.lik) C A'} + PI'I I'I . . . f.l,, { (Xl'i. , X";, ' . . . , xl'i .. ) C B'} = P (A) + P (B) , which concludes our proof. Only Axiom VI remains. Let Al � AI � . . . � All � . . . be a decreasing sequence of cylinder sets satisfying the condition lim P (AII) = L > O. We shall prove that the product of all sets All is not empty. We may assume, without essentially restricting the problem, that in the definition of the first n cylinder sets Ale. only the first n co ordinates Xl'l in the sequence X,.. ' XI" , . • • , Xf.lR ' ' " 32 occur, i.e. For brevity we set then, obviously III . Random Variables A - p-l (B ) n - I't f't . . . /'. n · P .. (B .. ) = P (A .. ) > L > O. In each set Bn it is possible to find a closed bounded set U .. such that P .. (B" - U,,) < ; • . FrQm this inequality we have for the set the inequality Let, morever, V - p-l (U ) ft - 1'1 II, • . • 1'. ft P (A .. - V .. ) < 2;" From (5) it follows that P (A .. - W .. ) < e . Since W" c V" c An , it follows that P (Wn) � P (A .. ) - e � L - e . (5) lf e i s sufficiently small, P ( W .. ) > 0 and W" is not empty. We shall now choose in each set W .. a point e 0 , the probability P { l x" - x l > a} tends toward zero as n -+ 0Cl 5. I. If the sequence ( 1) converges in probability to x and also to x', then x and x' are equivalent. In fact p{ l x - x' i > �} � p {l x" - x l > 2�} + P {l xlI - x'i > 2�} ; since the last probabilities are as small as we please for a suffici ently large n it follows that p { l x - x' i > �} = 0 and we obtain at once that P {x =f: x1 <: � p {l x - x' i > �} = o . '" II. If the sequence ( 1) almost surely converges to x, then it • This concept is due to Bernoulli ; its completely general treatment was introduced by E. E. Slutsky ( see [1] ) . § 6. Equivalent Random Variables; Various Kinds of Convergl3nce 35 also converges to x in probability. Let A be the convergence set of the sequence ( 1 ) ; then 1 = P (A) � lim P {lxlI+" - x l < t , p = 0 , 1 , 2 , . . . } � lim P { l xn - x l < t} , ,, + 00 . "' 00 from which the convergence in probability follows. III. For the convergence in probability of the sequence ( 1 ) the following condition is both necessary and sufficient: For any � > 0 there exists an n such th(J;t, for every p > 0, the following inequality holds : P {lXn+Jl - xII I > E} < E • Let F1 (a) , F3 (a) , . . . , F,, (a) , . . . , F (a) be the distribution functions of the random variables Xl> X2, • • • , x,,, . . . , x. If the sequence XII converges in probability to x, the distribution func tion F (a) is uniquely determined by knowledge of the functions F,, (a) . We have, in fact, THEOREM : If the sequence xlt X2, • • • , X,,, • • . converges in probability to x, the corresponding sequence of distribution func tions F,, (a) converges at each point of continuity of F (a) to the distribution function F (a) of x. That F (a) is really determined by the F,, (a) follows from the fact that F (a) , being a monotone function, continuous on the left, is uniquely determined by its values at the points of continuity6. To prove the theorem we assume that F is continuous at the point a. Let a' < a ; then in case X < a', x" > a it is necessary that I XII - X I > 0, - a'. Therefore lim P (x < a'. XII � a) = 0 , F(a') = P (x < a') :S: P (xlI a there follows the relation F (a") > lim sup F. (a) . (4) • I n fact. i t has a t most only a countable set o f discontinuities ( see LEBESGUE, Let;ons sur l'integration. 1928. p. 50. Therefore, the points of continuity are everywhere dense, and the value of the function F ( a) at a point of discon tinuity is determined as the limit of its values at the points of continuity on its left. 86 III. Random Variables Since F (a/ ) and F (a") converge to F ( a) for a' -+ a and a" -+ a, it follows from (3) and (4) that lim F .. ( a) = F (a) , which proves our theorem. Chapter IV MATHEMATICAL EXPECTATIONS l § 1. Abstract Lebesgue Integrals Let X be a random variable and A a set of ira Let us form, for a positive A, the stIm k = +00 5i. = 2' k J.P{k 1 =:it x < (k + 1 ) 1 . � c A} . ( 1 ) k = -0<) If this series converges absolutely for every A. then as A � 0, 51 tends toward a definite limit, which is by definition the integral J xP ( dE ) • ( 2 ) A In this abstract form the concept of an integral was introduced by Frechet2 ; it is indispensable for the theory of probability. (The reader will see in the following paragraphs that the usual definition for the conditional mathematical expectation of the variable x under hypothesis A coincides with the definition of the integral (2) except for a constant factor.) We shall give here a brief survey of the most important properties of the integrals of form (2) . The reader will find their proofs in every textbook on real variables, although the proofs are usually carried out only in the case where P (A ) is the Lebesgue measure of sets in R". The extension of these proofs to the general case does not entail any new mathematical problem ; for the most part they remain word for word the same. I. If a random variable x is integrable on A, then it is in tegrable on each subset A' of A belonging to if. II. If x is integrable on A and A is decomposed into no 1 As was stated in § 5 of the third chapter, we are considering in this, as well as in the following chapters, B01"el fislds of p"'obabilitll only. • FRtCHET, Swr l'inUgraLs d'uns functionnells eundue a un ensemble a.bsftra.it, Bull. Soc. Math. France v. 43, 1915, p. 248. 37 38 IV. Mathematical Expectations more than a countable number of non-intersecting sets All of n:, then fX P (dE) = �fxP (dE) . A II A. III. If x is integrable, 1 x 1 is also integrable, and in that case If % P (dE) 1 :S: Il x l P (dE) . A A IV. If in each event e, the inequalities 0 < y < x hold, then along with x, y is also integrable3, and in that case f y P (dE) � f x P (dE) . A .A V. If m < x < M where m and M are two constants, then mP (A) �lx P (dE) � MP (A) . .A VI. If x and y are integrable, and K and L are two real con stants, then Kx + Ly is also integrable, and in this case I(Kx + Ly) P (dE) = K f x P (dE) + L f Y P (dE) . .A .A .A VII. If the series L:f lx,, 1 P (dE) converges, then the series " .A :i: x" = X II converges at each point of set A with the exception of a certain set B for which P (B) = O. If we set x = 0 everywhere except on A - B, then f x P (dE) = L: f x" P (dE) . .A II .A VIII. If x and y are equivalent (P {x =F Y} = 0) , then for every set A of n: f x P (dE) = f y P (dE) . .A .A (3) I It is assumed that 'V is a random variable, i.e., in the terminology of the general theory of integration, measurable with respect to ij . § 2. Absolute and Conditional Mathematical Expectations 39 IX. If (3) holds for every set A of tj, then x and y are equivalent. From the foregoing definition of an integral we also obtain the following property, which is not found in the usual Lebesgue theory. . X. Let Pi (A) and P2 ( A ) be two probability functions defined on the same field tj, p eA ) = Pl eA) + P2 (A ), and let x be integrable on A relative to Pi ( A ) and P2 ( A ) . Then J x p edE) = J x P1 (dE) + J x P. (dE) . .A .A A XI. Every bounded random variable is integrable. § 2. Absolute and Conditional Mathematical Expectations Let x be a random variable. The integral E (x ) = f xP ( dE) E is called in the theory of probability the mathematical expectation of the variable x. From the properties III, IV, V, VI, VII, VIII, XI, it follows that I. I E (x ) I < E ( l x i ) ; II. E (y ) < E (x ) if 0 < y < x everywhere ; III. inf (x ) < E (x ) < sup (x) ; IV. E (Kx + Ly) = KE ( x ) + LE (y) ; V. E (�x .. ) = � E (x .. ) . if the series � E ( I x .. i ) converges ; )1 It n VI. If x and y are equivalent then E (x ) = E (y) . VII. Every bounded random variable has a mathematical expectation. From the definition of the integral, we have k = + oo E(x) = lim,2; k m P {km � x < (k + 1 ) m} 1: = - 00 1: = + 00 = lim�km{F«(k + 1 ) m) - F(km)} . .1: = - 00 40 IV. Mathematical Expectations The second line is nothing more than the usual definition of the Stieltjes integral +"" fa dF(Z} (a) = E (x) . ( 1 ) Formula ( 1 ) may therefore serve as a definition of the mathe matical expectation E ( x ) • N ow let u be a function of the elementary event �, and x be a random variable defined as a single-valued function x = x (u) of u. Then P {km :;; x < (k + 1 ) m} = P(") {km � x (u) < (k + 1) m} , where P(,.) (A) is the probability function of u. It then follows from the definition of the integral that J x P (dE) = J x Pc,,) (dE("» E EM and, therefore, E (x) = f x (u) Ptll) (dE(U» (2) Btul where E(u) denotes the set of all possible values of u. In particular, when u itself is a random variable we have + ao E (x) = j x P (dE) = f x (u) P(") (d Rl) = j x (a) dF(II) (a) . (3) E R' -� When x (u) is continuous, the last integral in (3) is the ordinary Stieltjes integral. We must note, however, that the integral + 00 jx (a) dFC") (a) -00 can exist even when the mathematical expectation E (x ) does not. For the existence of E (x ) , it is necessary and sufficient that the integral +00 jl x (a) 1 dF(") (a) -00 be finite'. If u is a point (uh �, • • • , u,.) of the space R", then as a result of (2 ) : • Cf. V. GLlVENKO, Sur les valeurs probables de /onctions, Rend. Accad. Lincei v. 8, 1928, pp. 480-483. § 2. Absolute and Conditional Mathematical Expectations 41 E (x) = If - · -J x (u1 • u2 • • • • • u,,) P a never becomes smaller than b > O. Then for any random variable x P (% � a) � E{fiN)} , ( 1 ) provided the mathematical expectation E{j {%)} exists. For, E{/(%)} = /1 (x) P (dE) �/I (x) P (dE) >- b P (x � a) , E {:Z:� G} from which ( 1 ) follows at once. For example, for every positive c , P (x � a) � E��: Z) . (2) Now let f ( x) be non-negative, even, and, for positive x, non decreasing. Then for every random variable x and for any choice of the constant a > 0 the following inequality holds P ( I % I >- a) � E{/ (N)} = - I (a) . (3) In particular t P ( I % _ E(x) 1 2 a) <: E /{N/�)E (.¥ )} . (4) Especially important is the case f ( x) = x2• We then obtain from (3) and (4) where P (I % I � a) � E �!I) , (5 ) P ( l x - E (%) 1 ::l!! a) s; E{'¥ - .E (xj }I = 01 (:) I (6) a a 02 (x) = E{% - E (X)}2 is called the variance of the variable x. It is easy to calculate that al (x) = E (Xl) - {E (x)}t • If f (x) is bounded : 1 f (x ) 1 < K , then a . 1ower bound for P ( l x l > a) can be found. For § 4. Some Criteria for Convergence E (t (x» = jt (X) P (dE) = (I (x) P (dE) + {I (X) P (dE) E {! s r E (f (;�� I (a) . ( 8 ) In the case I (x) = x2, we have from ( 8 ) (9) § 4. Some Criteria for Convergence Let ( 1 ) be a sequence of random variables and I (x) be a non-negative, even, and for positive x a monotonically increasing function6• Then the following theorems are true : I. In order that the sequence ( 1 ) converge in probability the following condition is sufficient : For each £ .> 0 there exists an n such that for every p > 0, the following inequality holds : E {f (xlI+p - XII)} < E • (2) II . In order that the sequence (1 ) converge in probability to the random variable x, the following condition is sufficient : lim E{/(xlI - X)} = o. (3 ) .. .. + 00 III. If I (x) i s bounded and continuous and f ( O ) = 0, then conditions I and II are also necessary. IV. If f (x) is continuous, f (O ) = 0, and the totality of all XlJ Xa. • • • , x .. , . . . • X is bounded,then conditions I and II are also necessary. • Therefore f(a;) > 0 if a; 'i' O. IV. Mathematical Expectations From II and IV, we obtain in particular V. In order that sequence ( 1 ) converge in probability to x, it is sufficient that lim E (x" - x ) 2 = 0 • (4) If also the totality of all X 1 0 x2, • • • , x .. , . . . , x is bounded, then the condition is also necessary. For proofs of I - IV see Slutsky [1] and Frechet [1 ] . How ever, these theorems follow almost immediately from formulas (3) and (8) of the preceding section. § 5. DHferentiation and Integration of Mathematical Expectations with Respect to · a Parameter Let us put each elementary event � into correspondence with a definite real function x ( t) of a real variable t. We say that x (t ) is a random lunertion i f for every fixed t , the variable x ( t) i s a random variable. The question now arises, under what conditions can the mathematical expectation sign be interchanged with the integration and differentiation signs. The two following theorems, though they do not exhaust the problem, can nevertheless give a satisfactory answer to this question in many simple cases. THEOREM I : li the mathematical expectation E [x ( t ) ] is finite lor any t, and x ( t ) is always differentiable lor any t, while the derivaJtive x' (t ) 01 x ( t) with respect to t is always less in abso lute value than some c01I.8tant M, then d dt E (x (t») = E (x' (t») . THEOREM II : 11 x ( t) always remains less, in absolute value, than some constant K and is integrable in the Riemann sense, then jE (X(t») dt = E [jX (t) dt] , provided E [x ( t) ] is integrable in the Riemann sense. Prool 01 Theorem 1. Let us first note that x' ( t ) as the limit of the random variables � (t + II) - � (t) II 1 1 h = t ' 2 " " ' n " " is also a random variable. Since x' (t) is bounded, the mathe- § 5. Differentiation and Integration of Mathematical Expectations 45 matical expectation E [x' ( t) ] exists (Property VII of mathe matical expectation, in § 2 ) . Let us choose a fixed t and denote by A the event The probability P (A ) tends to zero as h -+ 0 for every B > O. Since I X (t + "1 - N (') I � M , I x (t) \ � M holds everywhere, and moreover in the case A then I X (t + "1 - x (t) - x/(t) I � E , I E x (, + "l - Ex (t) - E x'(t) I � E I x (t + "l - x (t) - x/(t) I = p eA) EA I x (t + "l - x (t) _ x'(t) 1 + p eA) EA I x (t + "1 - x (t) - x'(t) I � 2M P (A ) + E . We may choose the f > 0 arbitrarily, and P (A) is arbitrarily small for any sufficiently small h. Therefore :t E x (t) = lim E x (t + "1 - E x (t) = Ex'(t) , A ..... 0 which was to be proved. Proof of Theorem II. Let k=" S,. = i-2:x (t + kh) , b - a h = -- . n k - l b Since S .. converges to J = J x ( t) dt, we can choose for any II e > 0 an N such that from n > N there follows the inequality P (A) = P{I S. - I I > e} < e . If we set then 1: = " S� = * 2 E x (t + k h) = E (S,,) , k=l I S: - E (]) I = I E (S" - ]) 1 >< E I S" - I I = p eA) EA I S" - I I + p eA) E..t I S,. - II. � 2 K P (A) + E � (2K + 1) 8 . 46 IV. Mathematical Expectations Therefore, S: converges to E (J) , from which results the equation IJ JE x (t) dt = lirn S: = E (]) . 41 Theorem II can easily be generalized for double and triple and higher order multiple integrals. We shall give an application of this theorem to one example in geometric probability. Let G be a measurable region of the plane whose shape depends on chance ; in other words, let us assign to every elementary event � of a field of probability a definite measurable plane region G. We shall denote by J the area of the region G, and by P (x, y ) the prob ability that the point ( x, y ) belongs to the region G. Then E (]) = j j P (x , y) dx dy . To prove this it is sufficient to note that J = jj/ (x . y) dx dy . P (X', y) = E/(x . y) , where f (x, y) is the characteristic function of the region G (f (x, y ) = 1 on G and [ (x, y) = 0 outside of G) &. • Cf. A. KOLMOGOROV and M. LEONTOVICH, Zur Berechnung der mittleren Brownschen Placke, Physik. Zeitschr. d. Sovietunion, v. 4, 1933. Chapter V CONDITIONAL PROBABILITIES AND MATHEMATICAL EXPECTATIONS § 1. Conditional Probabilities In § 6, Chapter I, we defined the conditional probability, p. (B) , of the event B with respect to trial !l. It was there assumed that !l allows of only a finite number of different possible results. We can, however, define Pa: (B) also for the case of an m: with an infinite set of possible results, i.e. the case in which the set E is partitioned into an infinite number of non-intersecting subsets. In particular, we obtain such a partitioning if we consider an arbitrary function u of , and define as elements of the partition \ll.. the sets u = con stant. The conditional probability p ... (B) we also denote by P .. (B) . Any partitioning m: of the set E can be defined as the partitioning m: .. which is "induced" by a function u of " if one assigns to every " as u U) , that set of the partitioning m: of E which contains ,. Two functions u and u' of , determine the same partitioning 9lu = m:.,. of the set E, if and only if there exists a one-to-one cor respondence u' = f (u) between their domains ty(") and ty("') such that u' U) is identical with /uU) . The reader can easily show that the random variables P" (B) and P .. .(B) , defined below, are in this case the same. They are thus determined, in fact, by the partition 21 .. = 2l..dtself. To define p .. (B) we may use the following equation : ( 1 ) It i s easy to prove that if the set E (w) of all possible values of u is finite, equation ( 1 ) holds true for any choice of A (when P .. (B) is defined as in § 6, Chap. 1) . In the general case ( in which P" (B) is not yet defined) we shall prove that there always exists one and only one random variable P,, (B) ( except for the matter of equivalence) which is defined as a function of u and which satis fies equation ( 1 ) for every choice of A from ty(v) such that 47 48 V. Conditional Probabilities and Mathematical Expectations pc .. ) (A) > O. The function P,, (B) of u thus determined to within equivalence, we call the conditional probability of B with respect to u ( or, for a given u) . The value of Pu (B) when u = a we shall designate by Pu (a; B) . The proof of ,the existence and uniqueness of p .. (B) . If we multiply ( 1 ) by P {u c A} = pc .. ) (A ) , we obtain, on the left, P {u c A} P .. c:..4. (B) = P (B {u c A}) = P (B u- I (A ») and, on the right, P{u c A} E{ .. c:A} P,, (B) = f Pu (B) P (dE) = J Pu (B) P('l) (dE("») J {u C:A} A leading to the formula P (B u - I (A») = Jpu (B) P(") (dE(u») j .do (2 ) and conversely ( 1 ) follows from (2) . In the case Ph,) (A ) = 0, in which case ( 1 ) is meaningless, equation (2) becomes trivially true. Condition (2 ) is thus equivalent to ( 1 ) . In accordance with Property IX of the integral ( § 1, Chap. IV) the random variable x is uniquely defined (except for equivalence) by means of the values of the integral Jx P d (E) A for all sets of S:. Since Pu (B) is a random variable determined on the probability field Ol(" ) , pc .. » , it follows that formula (2 ) uniquely determines this variable p .. (B) except for equivalence. We must still prove the existence of Pu (B) . We shall apply here the following theorem of Nikodyml : Let s: be a Borel field, P (A ) a non-negative completely additive set function defined on s: ( in the terminology of the probability theory, a random variable on OJ, P) ) , and let Q (A ) be another completely additive set function defined on S:, such that from Q (A ) =i= 0 follows the inequality p eA ) > O. Then there exists a function f a) ( in the terminology of the theory of probability, a random variable) which is measurable with respect to S:, and which satisfies, for each set A of S:, the equation 1 O. NIKODYM, Sur une geniraliBation dSB integralsB ds M. J. Ra don, Fund. Math. v. 15, 1980 p. 168 ( Theorem III ) . § 1. Conditional Probabilities 49 Q (A ) = jf (O P (dE) . A In order to apply this theorem to our case, we need to prove 1 ° that Q (A ) = P (Bu- I ( A » i s a completely additive function on illY) , 2°. that from Q (A ) +0 follows the inequality Ply) (A) > O. Firstly, 2° follows from o � P {B u - I (A») ::a P (u -1 (A» = pt-) (A) . For the proof of 1 ° we set A = 1;A .. . then .. u - 1 (A) = 1; u - 1 (A,,) " and B u - 1 (A) = ,!' B u - 1 (A .. ) . .. Since P is completely additive, it follows that P (B u - 1{A,,») = 2' P (B U- l (�») , " which was to be proved. From the equation ( 1 ) follows an important formula ( if we set A = E("» : P (B) = E (P .. (B» . (3) Now we shall prove the following two fundamental properties of conditional probability. THEOREM I. It is almost sure that (4) THEOREM II. If B is decomposed into at most a countable number of sets B .. : B = ,!' B" " then the following equality holds almost surely : P,, (B) = 1; P,, (B .. ) • ( 5) " These two properties of Pu (B) correspond to the two char acteristic properties of the probability function P (B) : that o < P (B) < 1 always, and that P (B) is completely additive. These 50 V. Conditional Probabilities and Mathematical Expectations allow us to carry over many other basic properties of the absolute probability P (B) to the conditional probability Pu (B) . However, we must not forget that P .. (B) is,for a fixed set B, a random vari able determined uniquely only to within equivalence. Proof of Theorem I. If we assum�ontrary to the assertion to be proved-that on a set M e EC") with pc .. ) (M) > 0, the in equality Pu (B) > 1 +e, e > 0, holds true, then according to for mula ( 1 ) which i s obviously impossible. I n the same way we prove that almost surely Pu (B) > O. Proof of Theorem II. From the convergence of the series 2' E I P .. (BII) 1 = ,l' E (P .. 0, then from Property V of mathematical expectation just referred to it follows that for each A of the above kind we have the relation que ..t} (� Pu (Bft» ) = � q"e ..t} (P .. (BII» = p{ue ..t} (B) = qu e..t} (P,,(B,,»). and from this, equation (5) immediately follows. To close this section we shall point out two particular cases. If, first, u (t) = c ( a constant) , then Pc (A ) = p eA) almost surely. If, however, we set u (t) = t, then we obtain at once that P.; (A) is almost surely equal to one on A and is almost surely equal to zero on A. Pi (A) is thus revealed to be the characteristic function of set A. § 2. Explanation of a Borel Paradox Let us choose for our basic set E the set of all points on a spherical surface. Our a will be the aggregate of all Borel sets of the spherical surface. And finally, our p eA) is to be propor tional to the measure of set A. Let us now choose two diametrically § 3. Conditional Probabilities with Respect to a Random Variable 51 opposite points for our poles, so that each meridian circle will be uniquely defined by the longitude '1', 0 � 'II < n . Since tp varies from 0 only to n, - in other words, we are considering complete meridian circles (and not merely semicircles) - the latitude 8 n n must vary from - n to +n (and not from - 2 to + z ) ' Borel set the following problem : Required to determine "the conditional probability distribution" of latitude e, - n � e < +n, for a given longitude '1'" It is easy to calculate that 6, p,Ael � e < el} = 1 jl eose l de . 6, The probability distribution of 8 for a given rp is not uniform. If we assume that the conditional probability distribution of 8 "with the hypothesis that e lies on the given meridian circle" must be uniform, then we have arrived at a contradiction. This shows that the concept of a conditional probability with regard to an isolated given hypothesis whose probability equals 0 is inadmissible. For we can obtain a probability distribution for 8 on the meridian circle only if we regard this circle as an element of the decomposition of the entire spherical surface into meridian circles with the given poles. § 3. Conditional Probabilities with Respect to a Random Variable If x is a random variable and P., (B) as a function of x is measurable in the Borel sense, then P., (B) can be defined in an elementary way. For we can rewrite formula (2) in § 1, to look as follows : P (B) pt' (A) = j P. (B) P(.,) (dE) • ( 1 ) .A In this case we obtain from ( 1 ) at once that II P (B) F�' (a) = jP. (a ; B) dF(·) (a) • (2) -00 In accordance with a theorem of Lebesgue2 it follows from (2 ) that P ( . B) = P (B) r F;' {a + II) - J1l(a) II -+ 0 (3 ) • a, un ]M (a + h) _ ptZl {a) which is always true except for a set H of points a for which p< ... ) (H) = o. • Lebeague, l. e., 1928. pp. 301-302. 52 V. Conditional ProbabiHties and Mathematical Expectations P., (a ; B) was defined in § 1 except on a set G, which is such that pc .. ) (G) = 0. If we now regard formula (8 ) as the defi nition of P.., (a ; B) ( setting P., (a ; B) = ° when the limit in the right hand side of (8 ) fails to exist ) , then this new variable satisfies all requirements of § 1 . If, besides, the probability densities f(x) (a ) and f 0, then formula (8 ) becomes f: (a) P", (a ; B) = P (B) "' (a} ' (4) Moreover. from formula ( 3 ) it follows that the existence of a l imit in (8 ) and of a probability density I(x) (a) results in the existence of f<;) (a) . In that case P (B) I: (a) � 1(") (a) . (5 ) If P (B) > 0, then from (4) we have I tt» ( ) _ P% (a ; B) f"' (a) B ,, - P (B) . ( 6 ) In case I(x) (a) = 0, then according to ( 5 ) f(� (a) = ° and there fore (6 ) also holds. If, besides, the distribution of x is contjnuous, we have +00 + 00 P CB) = E (P .. 0, the condition § 4. Conditional Mathematiclil Expectations 53 q.C .. o (Y) = qIlC A} E. (y) f ( 1 ) is called ( if it exists ) the conditional mathematical expectation of the variable y for known value of u. If we multiply ( 1 ) by Pcv) (A) , we obtain f y P (d E) = f E" (y) P{II) (dE{U») . {MC A} A ( 2 ) Conversely from (2 ) follows formula ( 1 ) . In case PC,,) (A ) = 0, in which case ( 1 ) is meaningless, (2) becomes trivial. In the same manner as in the case of conditional probability ( § 1 ) we can prove that E., (y) is determined uniquely--except for equiva lence-by (2 ) . The value of E,, (y) for u = a we shall denote by E,, (a; y) . Let us also note that E.. (y) , as weB as P u (y ) , depends only upon the partition �" and may be designated by Ewu (y) • The existence of E (y) is implied in the definition of Ev (y) ( if we set A = E(v) , then qUC A} (Y) = E (y» . We shall now prove that the ,existence of E (y) is also sufficient for the existence of E,, (y) . For this we only need to prove that by the theorem of Nikodym ( § 1 ), the set function Q (A) = fY P (dE) {MeA} is completely additive on if(") and absolutely continuous with respect to PCv) (A ) . The first property is proved verbatim as in the case of conditional probability ( § 1 ) . The second property absolute continuity-is contained in the fact that from Q (A ) + 0 the inequality PC,,) (A) > 0 must follow. If we assume that pc .. ) (A ) = P {u c A} = O, it is clear that Q (A) = f y P (dE) = 0 , {U CA} and our second requirement is thus fulfilled. If in equation ( 1 ) we set A = E (") , we obtain the formula E (y) = E E.. (y) • We can show further that almost surely E.. (ay + bz) = aE .. (y) + bEu (z) , (3) (4) where a and b are two arbitrary constants. (The proof is left to the reader.) 54 V. Conditional Probabilities and Mathematical Expectations If u and v are two functions of the elementary event e, then the couple (u, v ) can always be regarded as a function of e. The following important equation then holds : Eu E(u.o) (y) = Eu (y) • For, Eu (y) is defined by the relation quc.A} (Y) = quc A} Eu (y) • (5) Therefore we must show that EIlE(" . .,) (y) satisfies the equation qu c .A} (y) = E{u c A} Eu E(u. v) (y) • From the definition of E(u.v) (y) it follows that E{uc.A} (Y) = E{UC A} E(u,lI) (Y) ' From the definition of EuE (".v) (y) it follows, moreover, that (6) (7 ) quc4} E(u,lI) (y) = qucA} Eu E(u • .,) (y) • (8 ) Equation (6) results from equations ( 7 ) and (8') and thus proves our statement. If we set y = P u (B) equal to one on B and to zero outside of B, then Eu (y) = Pu (B) . E(Il •• ) (y) = p(u. v) (B) . In this case,from formula (5) we obtain the formula E" P(U, lI) (B) = P u (B) . (9) The conditional mathematical expectation Eu (y) may also be defined directly by means of the corresponding conditional prob abilities. To do this we consider the following sums : k = + oo S). (u) = .!' k l Pu {k). =, y « k + 1 ) l} = � Rk ' ( 10) k � - 00 k If E (y ) exists, the series ( 10) almost certainly· converges. For we have from formula (3) , of § 1 , E I R.t 1 = I H I P { k l ;;iii y < (k + 1 ) l} . and the convergence of the series k = + 00 � ' kl l P {kl � y < (k + 1 ) l} = ,I E J R.t 1 1: - - O. In this case ( 1 ) is equivalent to the relation p{tle B}(u c A) = P (u c A) and therefore t o the relation E{DeB}P. (U C A) = P (u c A ) • (3) (4) O n the other hand, i t i s obvious that equation ( 4 ) follows from 57 58 VI. Independence ; The Law of Large Numbers (2) . Conversely since Pv (u cA ) is uniquely determined by (4) to within probability zero. then equation (2) follows from (4) almost certainly. DEFINITION 2 : Let M be a set of functions ",. U) of �. These functions are called mutually independent in their totality if the following condition is satisfied. Let M' and M" be two non intersecting subsets of M. and let A' ( or A " ) be a set from iY defined by a relation among u,. from M' ( or Mil) ; then we have P (A' A") = P (A') P (A") . The aggregate of aUtu", of M' ( or of Mil ) can be regarded as coordinates of some function u' ( or u" ) . Definition 2 requires only the independence of u' and u" in the sense of Definition 1 for each choice of non-intersecting sets M' and Mil. If Uh u2 • • • • • u,. are mutually independent. then in all cases P {UI C A I ' US c A l ' . . . , Un C Aft} = P (Ul C Al) P (Ui c As) . . . P (u" c Ali) , (5) provided the sets Ax, belong t o the corresponding %(Uk) ( proved by induction) . This equation is not in general, however, at all sufficient for the mutual independence of Ult �, • • • , u,.. Equation (5 ) is easily generalized for the case of a countably infinite product. From the mutual independence of u,... in each finite group (u"" "". ' . . . , u,.J it does not necessarily follow that all ul• are mutually independent. Finally, it is easy to note that the mutual independence of the functions ",. is in reality a property of the corresponding parti tions mu,. . Further, if "� are single-valued functions of the cor responding "/. , then from the mutual independence of u,. follows that of u; . . § 2. Independent Random Variables If Xli Xa, • • • , X" are mutually independent random variables then from equation (2) of the foregoing paragraph follows, in particular, the formula F( .. •• Zo . . . . , Zft) (a1, ai ' . . . , aft) = F(r.) (a 1) F(z,l (a,) . . . .F<%ft) (a,.) • ( 1 ) If in this case the field ij(z" 2:0, • • • , z,..) consists only of BoreL sets of § 2, Independent Random Variables 59 the spave R", then condition ( 1 ) is also sufficient for the mutual independence of the variables XI, X2, ' , , , xn' Proof, Let x' = (Xi, > Xi • • • • " Xit) and x" = (xi, ' x;. ' , . " xi".) be two non-intersecting subsystems of the variables Xl> X2, , , , , xn• We must show, on the basis of formula ( 1 ) , that for every two Borel sets A' and A" of R" ( or Rm) the following equation holds : P (x' c A', x" c A") = P (x' c A') P (x" c A") • ( 2 ) This follows at once from ( 1 ) for the sets of the form A' = {Xi, < al , Xi. < ai ' . . . • xi� < a.l:} , A" = {Xi, < bl , Xi, < ba , , • . , Xi. < bm} . It can be shown that this property of the sets A' and A" is pre served under formation of sums and differences, from which equation (2 ) follows for all Borel sets, Now let X = {xl'} be an arbitrary ( in general infinite) aggre gate of random variables, If the field ij 0 and u2 (y) > 0) . The n um ber {2 is called the correlation ratio of y with respect to x, and g2 the same for x with respect to y (Pearson) . From (5 ) it further follows that E (xy) = E (x ) E ( y) • To prove this we apply Formula ( 1 5 ) of § 4, Chap. V: E (xy) = E E.., (xy) = E [x Ea: (y)] = E [x E (y)] = E (y) E (x) • Therefore, in the case of independence r = E(x. y) - E (x) E (y) " (x) " (y) (6) is also equal to zero ; 1', as is well known, is the correlation co efficient of x and y. If two random variables x and y satisfy equation (6 ) , then they are called uncorrelated. For the sum S = Xl + X2 + . . . + x" § 3. The Law of Large Numbers 61 where the Xl. X2, • • • , x'" are uncorrelated in pairs. we can easily compute that Olll (S) = 01 (%J) + 01 (%.) + . . . + 01 (XII) • (7 ) In particular, equation (7) holds for the independent variables Xk. § 3. The Law of Large Numbers Random variables s of a sequence 810 82, • • • • 8" • • • • are called stable, if there exists a numerical sequence d1• d2 • • • • , d", . . . such that for any positive £ P { l s" - d,, 1 � e} converges to zero as n - 00 . If all E (s,,) exist and if we may set d", = E (s .. ) . then the stability is normal. If all s .. are uniformly bounded, then from P { I SIt - d" I � e} - 0 we obtain the relation I E (s.) - dn l - 0 and therefore P { l s" - E (sn) I > e} - O . n - + oo n - +oo n - + oc ( 1 ) ( 2 ) The stability of a bounded stable sequence i s thus necessarily normal. Let According to the Tchebycheff inequality, ot P { l s,, - E (sn) l � e} � ; . Therefore, the Markov Condition a! - O is sufficient for normal stability. (3 ) 62 VI. Independence ; The Law of Large Numbers If 8 .. - E(8 .. ) are uniformly bounded : I 8n - E(s .. ) I < M, then from the inequality (9) in § 3, Chap. IV, al - .. P { l s,, - E (s .. ) I > e} � aMI · Therefore, in this case the Markov condition (3 ) is also necessary for the stability of the 8". If s _ Xl + XI + . . . + x. " - -=---'----''----'---,,_. and the variables x" are uncorrelated in pairs, we have a! = -; { al (Xl) + a2 (Xl) + . . . + at (X,,) } • n Therefore, in this case, the following condition is sufficient for the normal stability of the arithmetical means 8n: nl a! = a2 (x1) + 02 (x.) + . . . + al (x,,) = o (nl) (4) ( Theorem of Tchebycheff) . In particular, condition (4) is ful filled if all variables Xn are uniformly bounded. This theorem can be generalized for the case of weakly cor related variables Xn• If we assume that the coefficient of correla tion T", .. l of Xm and x" satisfies the inequality 1'", ,, � c ( l n - m ! ) and that k = ,, - l e" = �c (k) , 1: = 0 then a sufficient condition for normal stability of the arithmetic means 8 is2 (5) In the case of independent summands x " we can state a neces sary and sufficient condition for the stability of the arithmetic means s". For every x .. there exists a constant m" (the median of xn ) which satisfies the following conditions : P (x" < mIt) � I . P (x .. > m .. ) � 1 . 1 It is obvious that " •• = 1 always. • Cf. A. KHINTCHINE, Sur La loi !ortt deIJ gra:ndeB nombrell. C. R. de l'acad. sci. Paris v. 186, 1928, p. 285. We set Then the relations § 3. The Law of Large Numbers X"k = XI.; if I XI.;-mk I < 11., X .. k = 0 otherwise, k=,. k=,. 63 �P{IXk - mi I > n} = �P (XJtt =4= Xk) - O . n - +00 ( 6) are necessary and sufficient for the stability of variables sn3 • We may here assume the constants d" to be equal to the E (s" . ) so that in the case where E (s:) - E (sn) - 0 n - +oo (and only in this case ) the stability is normal. A further generalization of Tchebychetr's theorem is obtained if we assume that the s" depend in some way upon the results of any 11. trials, �" �., . . . , �" , so that after each definite outcome of all these 11. trials s" assumes a definite value. The general idea of all these theorems known as the law of large numbers, consists in the fact that if the depend ence of variables 8 .. upon each separate trial �k ( k = 1 , 2, . . . , n ) i s very small for a large 11., then the variables s" are stable. I f we regard as a reasonable measure of the dependence of variables 8" upon the trial 9h, then the above-mentioned general idea of the law of large numbers can be made concrete by the following considera tions·. Let • Cf ... A .. KOLMI?�ORO;Y . flbe'1' die Summen dU'1'ch den Zufall bestimmte'1' unabhangtge'1' GrOSStnl., Math. Ann. v. 99, 1 928, pp. 309-319 ( corrections and notes to this study, v. 102, 1 929 pp. 484-488, Theorem VIII and a supplement on p. 818 ) . • Cf. A. KOLMOGOROY. SU'1' la loi des g'1'andes nomb'1'es. Rend. Acead. Lincei v. 9, 1929 pp. 470-474. 64 VI. Independence ; The Law of Large Numbers Then s. - E (s .. ) = Z1 + z. + . . . + Z. , E (Zn t) = E EVI• VI • • . . VIi (S.) - E EVI • • , • • • VIk - 1 (S.) = E (S.) - E (S.) = O . a" (z.t> = E (Z:t) = P"k ' We can easily compute also that the random variables %.1< ( k = 1, 2, . . . , n) are uncorrelated. For let i < k ; then6 EVIl VI • • • • VII- I (Z. l ZIIk) = Z. l EVI• ¥l • • • • Vlt - l (ZIIk) = z., EVIl VI • • • • VIi _ 1 [EVIl VI, • • . 9.l.t (s.) - E.l " . . . . VI.t _ 1 (s .. )] = z. , [E". ", . . . Vlt - l (s.) - Eel •• • • • "10 - 1 (s.)] = 0 and therefore We thus have al (s.) = 0'2 (Z. I) + a2 {z.!) + . . . + O:' (z .. ) = P.. l + P..2 + . . . + Po"t ' Therefore, the condition f1!1 + P..2 + . . . + P... -+ 0 is sufficient for the normal stability of the variables 8 •• § 4. Notes on the Concept of Mathematical Expectation We have defined the mathematical expectation of a random variable x as + 00 E (x) = f x P (dE) = fa dF(e) (a) , E -00 where the integral on the right is understood as + 00 c E (x) = fa dF(e) (a) = lim f a dF(e) (a) . - 00 b b -+ - 00 C -+ + 00 The idea suggests itself t o consider the expression + b E* (x) = lim fa dF(e) (a) - b • Application of Formula ( 1 5 ) in § 4, Chap. V. ( 1 ) (2) § 4. Notes on the Concept of Mathematical Expectation 65 as a generalized mathematical expectation. We lose in this case, of course, several simple properties of mathematical expectation. For example, in this case the formula E ( x + y) = E (x ) + E ( y) is not always true. In this form the generalization is hardly admissible. We may add however that, with some restrictive supplementary conditions, definition (2) becomes entirely natural and useful. We can discuss the problem as follows. Let be a sequence of mutually independent variables, having the same distribution function F(z) (a) = p 0 lim P ( I s" - E* (x> 1 > £)= 0 , n --+ + 00 . (3) The answer is : If suck a constant E* ( x ) exists, it is expressed by Formula (2) . The necessary and sufficient condition that Formula ( 3 ) hold consists in the existence of limit (2) and the relation P ( l x l > n) = o (! ) . (4) To prove this we apply the theorem that condition (4) is necessary and sufficient for the stabnity of the arithmetic means 8ft, where, in the case of stability, we may seta + " d" = fa dF(z) (a) • -" If there exists a mathematical expectation in the former sense (Formula ( 1 » , then condition (4 ) is always fulfilled T. Since in this case E ( x ) = E* (x) , the condition (3) actually does define a generalization of the concept of mathematical expectation. For the g,eneralized mathematical expectation, Properties I - VII • Ct. A. KOLMOGOROV , Bemerkungen %u meiner Alrbeit. "fIber die Summen zu!ltUiger Grossen." Math. Ann. v. 102, 1929, pp. 484-488, Theorem XII. • Ibid, Theorem XIII. 66 VI. Independence ; The Law of Large Numbers (Chap. IV, § 2 ) still hold ; in general, however, the existence of E* I x 1 does not follow from the existence of E* (x) . To prove that the new concept of mathematical expeotation is really more general than the previous one, it is suffici�_nt to give the following example. Set the probability density f(lt) (0,) equal to 1(:1) ( ) _ C a - ( l a l + 2)l ln ( i a l + 2) ' where the constant C is determined by +00 jl(:I) (a) da = 1 • - 00 It is easy to compute that in this case condition (4) is fulfilled. Formula (2 ) gives the value but the integral diverges. E* (x) = 0, + 00 + 00 jla l dF(:I) (a) = Jl a 1 /(:1) (a) da -00 -0(1 § 5. Strong Law of Large Numhers ; Convergence of Series The random variables 8" of the sequence SI ' S2' • • • , s.., . . . are strongly stable if there exists a sequence of numbers dh th, . . . , d", . . . such tha;t the random variables SIt - d,. almost certainly tend to zero as n .... +00 . From strong stability follows, obviously, ordinary stability. If we can choose dn = E (s,, ) , then the strong stability is normal. In the Tchebycheff case, s = XI + X1I + . . . + x. • n ' § 5. Strong Law of Large Numbers ; Convergence of Series 67 where the variables x" are mutually independent. A sufficient8 condition for the normal strong stability of the arithmetic means 8" is the convergence of the series ( 1 ) This condition is the best i n the sense that for any series of con stants b" such that 00 � !; = +00 , ,,=1 we can build a series of mutually independent random variables x" such that at (x,,) = b" and the corresponding arithmetic means 8,. will not be strongly stable. If all x" have the same distribution function F<') ( a.) , then the existence of the mathematical expectation +00 E (x) = fa dF 1 . • Cf. A . KOLMOGOROV; Sur la loi forte de8 grandes nombre8, C . R . Acad. Sci. Paris v. 191, 1930, pp. 910-911 . • The proof of this statement has not yet been published. 68 VI. Independence ; The Law of Large Numbers Then in order that series ( 1 ) converge with the probability one, it is necessary and sufficient10 that the following series converge simultaneously : 00 L P { Ix,, 1 > 1 } , ,, = 1 00 00 L E (y,,) and L (12 (y,,) 8 = 1 ,, = 1 1. Cf. A. KHINTCHINE and A. KOLMOGOROV, On the Convergence of Series, Rec. Math. Soc. Moscow, v. 32, 1925, p. 668-677. Appendix ZERO·OR.ONE LAW IN THE THEORY OF PROBABILITY We have noticed several cases in which certain limiting probabilities are necessarily equal to zero or one. For example, the probability of convergence of a series of independent random variables may assume only these two values1• We shall prove now a general theorem including many such cases. THEOREM : Let Xl' X2, • • • , X .. , • • • be any random variables and let f (Xl ' X2, • • • , X .. , • • • ) be a Baire function2 of the variables Xl, X2I • • • , X .. , • • • such that the conditional probability PSl • Zo • • • • • %. {/ (x) = o} of the relation l (x1 , x2 , • • • , x .. , . . . ) = 0 remains, when the first n variables Xli X2, • • • , x .. are known, equal to the absolute probability P{/ (x) = o} ( 1 ) for every n. Under these conditions the probability ( 1 ) equals zero or one. In particular, the assumptions of this theorem are fulfilled if the variables x .. are mutually independent and if the value of the function f ( x) remains unchanged when only a finite number of variables are changed. Proof of the Theorem : Let us denote by A the event f ( x ) = O. We shall also investigate the field st of all events which can be defined through some relations among a finite number of vari- 1 Cf. Chap. VI, § 5. The same thing is true of the probability P{s. - d. _ O} in the strong law of large numbers ; at least, when the variables x. are mutu ally independent. t A Baire function is one which can be obtained by successive passages to the limit, of sequences of functions, starting with polynomials. 69 70 Appendix abIes x .. . If event B belongs to st, then,according to the conditions of the theorem , PB (A ) = P (A ) . In the case P (A ) = 0 our theorem is already P ( A ) > O. Then from (2) follows the formula P (B) = Pa (A ) P (B) '-" P (B) A PtA ) • (2 ) true. Let now (3) and therefore P (B ) and P A (B) are two completely additive set functions, coinciding on st ; therefore they must remain equal to each other on every set of the Borel extension Bst of the field st. Therefore, in particular, P (A ) = PA (A) = 1 , which proves our theorem. Several other cases in which we can state that certain prob abilities can assume only the values one and zero, were discovered by P. Levy. See P. LEVY, Sur un theoreme de M. Khintchine, Bull. des Sci. Math. v. 55, 1931, pp. 145-160, Theorem II. BIBLIOGRAPHY [ 1] . [2] . [1] . [2] . [3] . [ 1] . [2] . [8] . [ 1] . [1] . l l] . [2] . [1 ] . [2] . [1 ] . [1 ] . [1] . BmLIOGRAPHY BERNSTEIN, S. : On the aziomatic foundation of the theo'Y'1/ of proba bilitll. (In Russian) . Mitt. Math. Ges. Charkov, 1917, pp. 209-274. - Theo'Y'1/ of probabilitll, 2nd edition. ( In Russian ) . Moscow, 1927. Government publication RSFSR. BOREL, E. : Le8 probabiliti8 denombrables et leurs applications arith metiques. Rend. Circ. mat. Palermo Vol. 27 ( 1909 ) Pp. 247-271. - P.rincipes et formules clll8siques, fasc. 1 du tome I du Traiti des probabilitis par E. BOREL et divers auteurs. Paris : Gauthier-Villars 1925. - Applications a l'arithmetique et a la th60rie des fonctions, fasc. 1 du tome II du TraiU des probabiliUs par E. BOREL et divers auteurs. Paris : Gauthier-Villars 1926. CANTELLI, F. P. : Una teoria astratta del Caleolo delle pTobabilitd.. Giorn. 1st. Ital. Attuari Vol. 3 ( 1932) pp. 257-265. - Sullo, legge dei grandi numeri. Mem. Acad. Lincei Vol. 11 (1916 ) . - Sullo, probabilita come limite della frequenza. Rend. Accad. Lincei Vol. 26 ( 1917) Pp. 89-45. COPET,AND H : The theory of probability from the poi nt of view of admissible numbers. Ann. Math. Statist. Vol. 3 ( 1932 ) Pp. 143-156. D6RGE, K. : Zu der von R. von Mises gegebenen Begrilndung der Wahr scheinlichkeitsrechnung. Math. Z. Vol. 32 ( 1930) Pp. 232-258. FRECHET, M. : Sur la convergence en probabilit6. Metron Vol. 8 ( 1930) Pp. 1--48. - Recherches th60riques modernes, fasc. 3 du tome I du Traite des probabilitis par E. BOREL et divers auteurs. Paris : Gauthier-Villars. KOLMOGOROV, A. : tJbeT die anal1ltischen Methoden in der Wahrschein lichkeitsTechnung. Math. Ann. Vol . 104 ( 1931 ) Pp. 415--458. - The general theo'Y'1/ of measure and the theo'Y'1/ ofl.robabilit1l. (In Russian) . Sbornik trudow sektii totshnych nauk K. . , Vol. 1 ( 1929 ) pp. 8-21. LtVY, P. : Calcul des probabiliUs. Paris : Gauthier-Villars. LOMNICKI, A. : Nouveauz fondem efl.ts du calcul des pTobabilitis. Fun dam. Math. Vol. 4 ( 1923 ) Pp. 34-71. MISES, R. V. : Wahrscheinlichkeitsrechnung. Leipzig u. Wien : Fr. Deuticke 1931. [2] . - GTU'ndlagen deT Wahrscheinlichkeitsrechnung. Math. Z. Vol. 5 ( 1919 ) pp. 52-99. [8] . - Wah'T'scheinlichkeitsrechnung, Statistik und Wahrheit. Wien : Julius Springer 1928. [3'] . - Probabilitll, Statistics and Truth ( translation of above) . New York : The MacMillan Company 1939. [ 1 ] . REICHENBACH, H. : A:nomatik deT Wahrscheinlichkeitsrechnung. Math. Z. Vol. 34 ( 1932 ) Pp. 568-619. 73 74 [1] . [2] . [1] . [1] . [2] . Bibliography SLUTSKY, E . : ()ber stochastische AS1lmptoten und Grenzwerte. Metron Vol. 5 ( 1925) Pp. 3-89. - On the question of the logical foundation of the theory of proba bility. ( In Russian) . Westnik Statistiki, Vol. 12 ( 1922 ) , pp. 18-21. STEINHAUS, H. : LeB probabilit6s denombrables et leur rapport d la theorie de La m.6BUre. Fundam. Math. Vol. 4 (1928) Pp. 286-310. TORNIER, E . : WahrBcheinlichkeitsrechnung und Zahlentheone. J. reine angew. Math. Vol. 160 ( 1929) Pp. 177-198. - Grundlagen der Wahrscheinlichkeitsrechnung. Acta math. Vol. 60 ( 1938) Pp. 239-380. SUPPLEMENTARY BIBLIOGRAPHY NOTES TO SUPPLEMENTARY BIBLIOGRAPHY The fundamental work on the measure-theoretic approach to probability theory is A. N. Kolmogorov's GrurulbegriJJe der W a.krsckeinlichkeitsrecknung, of which the present work is an English translation. It is not an overstatement to say that for the past twenty-three years most of the research work in proba bility has been influenced by this approach, and that the axiomatic theory advanced by Kolmogorov is considered by workers in probability and statistics to be the correct one. The publication of Kolmogorov's Grurulbegriffe initiated a new era in the theory of probability and its methods ; and the amount of research generated by the fundamental concepts due to Kolmo gorov has been very great indeed. In preparing this second edition of the English translation of Kolmogorov's monograph, it seemed desirable to give a bibliography that would in some way reflect the present status and direction of research activity in the theory of probability. In recent years many excellent books have appeared. Three of most outstanding in this group are those by Doob [12 ] , Feller [ 17] , arid Loeve [54] . Other books dealing with general proba bility theory, and specialized topics in probability are : [2 ] , [8 ] , [6 ] , [7 ] , [9 ] , [19 ] , [28] , [26] , [27] , [28 ] , [84] , [89 ] , [41. ] . [42 ] , [47] , [49 ] , [50] , [ 67 ] , [70] , [72 ] . Since these books contain many references to the literature, an attempt will be made in this bibli ography to list some of the research papers that have appeared in the past few years and several that are in the course of publication. The model developed by Kolmogorov can be briefly described as follows : In every situation (that is, an experiment, observa tion, etc.) in which random factors enter, there is an associated probability space or triple (.0, e, p) , where .0 is an abstract space (the space of elementary events) , � is a u-algebra of subsets of !J (the sets of events) , and p (E) is a measure (the probability of the event E) defined for E e e. and satisfying the condition p ( !J ) = 1. The Kolmogorov model has recently been discussed by 77 78 Supplementary Bibliography Los [56 ] , who considers the use of abstract algebras and u-algebras of sets instead of algebras and u-algebras. Kolmogorov [44] has also considered the use of metric Boolean algeb!"as in probability. There are many problems, especially in theoretical physics, that do not fit into the Kolmogorov theory, the reason being that these problems involve unbounded measures. Renyi [ 68] has developed a general axiomatic theory of probability (which contains Kolmo gorov's theory as a special case) in which unbounded measures are allowed. The fundamental concept in this theory is the condi tional probability of an event. Csaszar [ 10 ] has studied the measure-theoretic structure of the conditional probability spaces that occur in Renyi's theory. In another direction, examples have been given by various authors which point up the fact that Kolmogorov's theory is too general. Gnedenko and Kolmogorov [27] have introduced a more restricted concept which has been termed a perfect probability space. A perfect probability space is a triple (D, �, p ) such that for any real-valued �measurable function g and any linear set B for which {w : g (ro) E B} Ii �, there is a Borel set D £ B such that P{w : g (ro) E D} = P{ro : g (ro) E B}. Recently, Blackwell [ 5 ] has introduced a concept that is more restricted than that of a per fect space. The concept introduced is that of a Lusin space. A Lusin space is a pair (n. � ) such that (a ) t is separable, and (b) the range of every real-valued �-measurable function g on D is an analytic set. It has been shown that if (D , �, p) is a Lusin space and p any probability measure on e. then (D. e. p) is a perfect probability space. In § 6 of Chap. I, Kolmogorov gives the ��finition of a Markov chain. In recent years the theory of Markov chains and processes has been one of the most active areas of research in probability. An excellent introduction to this theory is given in [ 17 ] . Other references are [2 ] , [ 3 ] , [ 6 ] , [ 12 ] , [ 19 ] . [23 ] , [26] , [34] , [39 ] , [ 50 ] , [54 ] , [ 67] , [70 ] , [72 ] . Two papers of interest are those of Harris and Robbins [29 ] on the ergodic theory of Markov chains, and Chung [8 ] on the theory of continuous parameter processes with a denumerable number of states. The paper by Chung unifies and extends the results due to Doob (cf. [ 12 ] ) and Levy [51. ] , [52 ] , [53] . Notes 79 A number of workers in probability are utilizing the theory of semi-groups [30 ] in the study of Markov processes and their structural properties [63 ] . In this approach, due primarily to Yosida [80 ] , a one-parameter ( discrete or continuous) semi group of operators from a Banach space to itself defines the Markov process. Hille [32] and Kato [38 ] have used semi-group methods to integrate the Kolmogorov differential equations, and Kendall and Reuter [ 40] have investigated several pathological cases arising in the theory. Feller [ 18 ] and Hille [31 ] have studied the parabolic differential equations arising in the con tinuous case. Doob [13] has employed martingale theory in the semi-group approach to one-dimensional diffusion processes. Also, Hunt [33] has studied semi-groups of (probability) meas ures on Lie groups. Recently several papers have appeared which are devoted to a more abstract approach to probability and consider random vari ables with values in a topological space which may have an alge braic structure. In [ 14 ] , [ 21 ] , [22 ] , [58] , [ 59 ] , and [61 ] , problems associated with Banach-space-valued random variables are con sidered ; and in [4 ] similar problems are considered for Orlicz (generalized Lebesgue) spaces. Robbins [69] has considered random variables with values in any compact topological group. Segal [75] has studied the structure of probability algebras and has used this algebraic approach to extend Kolmogorov's theorem concerning the existence of real-valued random variables having any preassigned joint distribution (cf. § 4 of Chap. III ) . Segal [76, Chap. 3, § 13 ] has also considered a non-commutative proba bility theory. Prohorov [ 66 ] has studied convergence properties of proba bility distributions defined on Banach spaces and other function spaces. These problems have been considered also by LeCam [48 ] and Parzen [64 ] . The measure-theoretic definition and basic properties of condi tional probabilities and conditional expectations have been given by Kolmogorov ( Chap. IV ; cf. also [ 12 ] and [ 54] ) . Using an abstract approach, S. T. C. Moy [60 ] has considered the prop erties of conditional expectation as a linear transformation of the space of all extended real-valued measurable functions on a 80 Supplementary Bibliography probability space into itself. In [ 61 ] she considers the conditional expectation of Banach-space-valued random variables. Naka mura and Turamuru [62 ] consider an expectation as a given operation of a C·-algebra ; and Umegaki [79] considers condi tional " expectation as a mapping of a space of measurable opera tors belonging to a �-integrable class associated with a certain W·-algebra into itself. The work of Umegaki is concerned with the development of a non-commutative probability theory. The results of Segal [74 ] , Dye [ 15 ] , and others, in abstract integration theory are utilized in the above studies. Other papers of interest are [ 1 ] , [1 6 ] , [ 36 ] , and [ 45 ] . The L. Schwartz theory of distributions [73] has been utilized by Gel'fand [24 ] in the study of generalized stochastic processes ; and by Fortet [20 ] and Ito [35] in the study of random distributions. Several books devoted to the study of limit theorems in proba bility are available : [27 ] , [42 ] , [47] , and [49 ] . In addition, [ 12 ] and [54 ] should be consulted. Research and review papers of interest are [ 11 ] , [14] , [25 ] , [37 ] . [46] , [ 55 ] , [ 57 ] , [ 65 ] , [71 ] , [77] , and [78] . [1] [2] [3] [4] [6] [6] [7] [8] [9] [10] [11] [12] [13] [14] [16] [16] [17] [18] [19] [20] [21] [22] SUPPLEMENTARY BIBLIOGRAPHY ALDA, V., On COfIditionaZ Elepectations, Czechoslovak Math. J., Vol. 6 ( 1965 ) , pp. 503-606. AIu.EY, N., On the Theof'1[ of Stochastic Processes and Their Applic/V tion to the Theof"/l of Cosmic Radiation, Copenhagen, 1943. BARTLETT, M. S., An Introduction to Stochastic Processes, Cambridge, 1965. BHARUCHA-REID, A. T., On Random Elements in Orlicz Spaces, ( Abstract) Bull. Amer. Math. Soc., Vol. 62 ( 1966 ) . To appear. BLACKWELL, D., On a Class of Probability Spaces, Proc. Third Berkeley Symposium on Math. Statistics and Probability, Vol. 2 ( 1956 ) . To appear. BLANC-LAPIERRE, A., and R. FORTET, ThAorie des fonctions aMatoires, Paris, 1953. BOCHNER, S., HM'ttI.OfI.ic Analysis and the TheO'l"/l of Probabilitt/, Berkeley and Los Angeles, 1966. CHUNG, K. L., Foundations of the Theo'l"/l of Continuous Paratm6ter Mark01J Chains, Proc. Third Berkeley Symposium on Math. Statistics and Probabiilty, Vol. 2 ( 1966 ) . To appear. CRAMER, H., Mathematical Methods of Statistics, Princeton, 1946. csASZAR, A., Bu,. Ie stru.cture des espace de probabilite cOflditionnelle, Acta Math. Acad. Sci. Hung., Vol. 6 ( 1956 ) , pp. 337--361. DERMAN, C., and H. ROBBINS, The Strong Law of Large Numbers when the First Moment does not Ezist, Proc. Nat. Acad. Sci. U.S.A., Vol. 41 ( 1955 ) , pp. 686-587. DOOB, J. L., Stochastic Processes, New York, 1953. DOOB, J. L., Martingales and One-Dimensional Diffusion, Trans. Amer. Math. Soc., Vol. 78 (1955) , pp. 168-208. Doss, S., Bur le thAor�me limite central pour des variables alktoires dans espace de Banach, Publ. Inst. Statist. Univ. Paris, Vol. 3 ( 1954) , pp. 143--148. DYE, H. A., The Radon-Nikodym Theorem for Finite Rings of OPBf'/V tors, Trans. Amer. Math. Soc.,72 ( 1952) , pp. 243--280. FABIAN, V., A Note on the Conditional Ea:pectations, Czechoslovak Math. J., Vol. 4 ( 1954) , pp. 187-191. FELLER, W., An Introduction to Probability TheOf"/l and Its Appliotv tions, New York, 1950. FELLER, W., Diffusion Processes in One Dimension, Trans. Amer. Math. Soc., Vol. 77 (1954) , pp. 1--31. FORTET, R., Calcul des probabilites, Paris, 1960. FORTET, R., Random Distributions with an Application to Telephone Engineering, Proc. Third Berkeley Symposium on Math. Statistics and Probability, Vol. 2 (1956) . To appear. . FORTET, R., and E. MOURIER, R�sulta.ts compl�mentaires sur les �Uments aUatoitres pt'enant leurs valeurs dans un espace de Banach, Bull. Sci. Math. (2). Vol. 78 (1964) , pp. 14--30. FORTET, R., and E. MOURIER, LeB fonctions aUatoires comme eUmentB aUatoire8 dans lea espace de Banach, Stud. Math., Vol. 16 (1966 ) , pp. 62-79. 81 82 Supplementary Bibliography [28] FaEcHET, M., Recherches thAO'T'iques modernes sur Ie calcul des proba bilitls. II. Methode des fonctions OR'bitraires. ThAarie des evenements en chaine dans d'un nombre fini d'etats possibles, Paris, 1938. [24] GEL'FAND, I. M., Gene,.alized Random p,.ocesses, Dokladr Akad. Nauk SSSR ( N.S. ) , Vol. 100 ( 1966 ) , pp. 868-866. [In Russtan.] [25] GIHMAN, I. L., Some Limit Theorems for Conditional Distributions, Doklady Akad. Nauk SSSR ( N.S.) , Vol. 91 ( 1963 ) , pp. 1008-1006. [In Russian.] [26] GNEDENKO, B. V., Course in the Theof'1l of Probability, Moscow Leningrad, 1950. [In Russian.] [27] GNEDENKO, B. V., and A. N. KOLMOGOROV, Limit Distributions for Sums of Independent Random Variables, Translated by K. L. Chung with an appendix by J. L. Doob, Cambridge, 1954. [28] HALMOS, P. R., Measure Theory, New York, 1950. [29] HARRIs, T. E., and H. ROBBINS, Ergodic Theory of Markov Chains Admitting an Infinite Invariant Measure, Proc. Nat. Acad. Sci. U.S.A., Vol. 89 ( 1953 ) , pp. 860-864. [80] HILLE, E., Functional Analysis and Semi-Groups, New York, 1948. [31] HILLE, E., On the Integration Problem for Fokker-Planck's Equation in the TheOf'1l of Stochastic Processes, Onzi�me congr� des math. scand. (1949 ) , pp. 185-194. [32] HILLE, E., On the Integration of Kolmogo'Y'off's Differential Equations, Proe. Nat. Aead. Sci. U.S.A., Vol. 40 ( 1954) , pp. 20-26 . [38] HUNT, G. A., Semi-Groups of Measures on Lie Groups, Trans. Amer. Math. Soc., Vol. 81 ( 1966 ) , pp. 264-293. [84] ITa, K., Theof'1l of P'Y'obability, Tokyo, 1958. [85] ITa, K., Stationaf'1l Random Distributions, Mem. ColI. Sci. Univ. Kyoto, Ser. A. Math., Vol. 28 ( 1954 ) , pp. 209-223. [86] JIi'uNA, M., Conditional Probabilities on Strictly Sepa'Y'able �-Algebras, Czechoslovak Math. J., Vol. 4 ( 1954 ) , pp. 372-380. [In Czech.] [37] XALLIANPUR, G., On a Limit Theorem fOT Dependent Random Variables, Doklady Akad. Nauk S S SR ( N.S.) Vol. 101 ( 1955 ) , pp. 13-16. [In Russian.] [38] KATO, T., On the Semi-Groups Generated by Kolmogoroff's Differential Equations, J. Math: Soc. Japan, Vol. 6 ( 1954 ) , pp. 1-15. [89] KAWADA, Y., The TheoT1I of Probabiilty, Tokyo, 1952. [40] KENDALL, D. G., and G. E. H. REUTER, Some Pathological Markov Proc esses with a DenumeTable Infinity of States and the Associated Semi gTOUpS of TTansfo-rmations in Z, Proc. Symp. on Stochastic Processes ( Amsterdam ) , 1954. To appear. [41] KHINTCHINE, A., A81/mptotische Gesetze d'T Wah'Y'scheinlichkeits,.ech nung, Berlin, 1933. [Reprint, CHELSEA PUBLISHING COMPANY.] [42] KHINTCHINE, A., Limit Laws of Sums of Independent Random Vari ables, Moscow-Leningrad, 1938. [In Russian.] [43] KOLMOGOROV, A., Grundbegriffe de,. Wahrscheinlichkeitsrechnung, Ber lin, 1933. [The present work is an E nglish translation of this.] [44] KOLMOGOROV, A., Algebres de BooZe metriques completes, VI Zjazd Mat. Pols., Warsaw ( 1948 ) , pp. 21-30. [45] KOLMOGOROV, A., A Theorem on the Convergence of Conditional Mathe matical Ea:pectations and Some of Its Applications, Comptes Rendul du Premier Congres des MatMmaticiens Hongrois ( 1952 ) , pp. 367- 386. [In Russian and Hungarian.] [46] KOLMOGOROV, A., Some Work of Recent YeOR's in the Field of Limit TheoTems in the Theory of Probability, Vestnik Moskov. Univ. Ser. Fiz.-Mat. Estest. Nauk , Vol. 8 ( 1953 ) , pp. 29-38. Supplementary Bibliography 83 [47] KUNISAWA, K.., Limit Theorems in Probability Theo"1l, Tokyo, 1949. [48] LECAMJ L., Con1JfWgence 111. Distribution of Stochastic Processes, Univ. California Pub!. Statistics. To appear. [49] LEvY, P., TMfYi"U de l'addition des 1JOh'ia,bles aUatMres, Paris, 1937. [60] LEvv, P., ProcesBU8 stochastiquss et mOU1Jement Brownien, Paris, 1948. [61] LEvY, P., SystMnea mGrkoviens et stationnaires. Cas denombrable, Ann. Sci. Ecole Norm. Sup., Vol. 68 ( 1961) , pp. 327-401. [52] LEvY, P., CompUment a l'etude des processus de MfJIf'ko/J, Ann. Sci. Ecole Norm. Sup., Vol. 69 (1952) , pp. 203-212. [53] LEvY, P., Processus marko1Jiens et stationnaires du cinquieme t'IIPe (infiniU denombrable d'etats possibles, parametre continu) , C. R. Acad. Sci. Paris, Vol. 236 ( 1953 ) , pp. 1680-1682. [54] LOM, M., Probability TheO'1"/l, New York, 1955. [66] Lom, M., Variational Terms and the Central Limit Problem, Proc. Third Berkeley Symposium on Math. Statistics and Probability, Vol. 2 ( 1966) . To appear. [56] i..08, J., On the Azioma,tic Treatment of Probability, Colloq Math., Vol. 3 ( 1955 ) , pp. 125-137. [67] MARSAGLIA, G., Iterated Limits and the Central Limit Theo-rem fo-r Dependent Variables, Proc. Amer. Math. Soc., Vol. 5 ( 1964) , pp. 987- 991. [58] [59] [60] [61] [62] [68] [64] [65] [66] [67] [68] [69] [70] [71] [72] MOURIER, E., EUments aUatoires dans un espace de Banach, Ann. lnat. H. Poincare, Vol. 13 ( 1963 ) , pp. 161-244. MOURIER, E., L-Random Elements and L-Random Elements in Banach S'/!f!';Ces, Proc. Berkeley Symposium on Math. Statistics and Proba bIlity, Vol. 2 ( 1956) . To appear. Moy, S. T. C., Characterizations of Conditional E�pectation as a TratUl fo-rmation on Function Spaces, Pacific J. Math., Vol. 4 (1964 ) , pp. 47- 68. MoY, S. T .. C., Conditional E�pectations of Banach Space Valued Random Variables and Their Properties, (Abstract) Bull. Amer. Math. Soc., Vol. 62 (1966 ) . To appear. NAKAMURA, M. and T. TURUMARU, Ezpectation in an Operator Algebra, T6hoku Math. J., Vol. 6 (1954) , pp. 182-188. NEVEU, J., Etude des semi-groups de MfJIf'ko/J, (Thesis) Paris, 1965. PARZEN, E., Con1JfWgence in Distribution and Fou'l'ier-Stieltjes Trans fo-rms of Random Functions,(Abstraet) Ann. Math. Statistics, Vol. 26 (1955 ) , p. 771. PARZEN, E., A Central Limit Theo-rem for Multilinear Stochastic Proc esses, (Abstract) Ann. Math. Statistics, Vol. 27 (1956) , p. 206. PROHOROV, Yu. V., Probability Distributions in Functional Spaces, Uspehi Matern. Nauk ( N.S . ) , Vol. 8 ( 1953) , pp. 165-167. [In Russian.] RENYI, A., The Calculus of Probabilities, Budapest, 1954. [In Hun garian.] RENYI, A., On a New A�oma,tic Theo"1l of Probability, Acta Math. Acad. Sci. Hung., Vol. 6 ( 1956 ) , pp. 285-335. ROBBINS, H., On the Equidistribution of Sums of Independent Random Variables, Proc. Amer. Math. Soc., Vol. 4 ( 1953 ) , pp. 786-799. ROMANOVSKI, V. I., Discrete Markov Chains, Moscow-Leningrad, 1949. [In Russian.] ROSENBLATT, M., A Central Limit Theorem and a Strong Mi�ing Con dition, Proc. Nat. Acad. Sci. U.S.A., Vol. 42 ( 1966 ) , pp. 43-47. SARYMSAKOV, T. A., Elements of the Theo'I"J/ of Markov Processes, Moscow, 1964. [In Russian.] 84 [78] [74] [76] [76] [77] [78] [79] [80] Supplementary Bibliography SCHWARTZ, L., Thlorie des distributions, Paris, 1960-61. SEGAL, I. E., A Non-Comm,uta:ti1Je Eil:t6nsion of A bswact Integra.tion, Ann. Math. Vol. 67 ( 1963 ) , pp. 401-457. SEGAL, I. E., A bstract Proba.bilitfl Spaces a.nd a. TheO'f'6'm of Kolmo gorojJ, Amer. J. Math., Vol. 76 ( 1964) , pp. 177-181. SEGAL, I. E. A Ma.thBm.a.tica.l Approach to Ele'm6'nta'f'f/ PtJ.'f'tic16s a.nd Their Fields, University of Chicago, 1966. [Mimeographed Lecture Notes.] TAKANO, K., On Some Unnit Theorems of Proba.bilitfl Distributions, Ann. Inst. Statist. Math., Tokyo, Vol. 6 ( 1964 ) , pp. 87-118. TSURUMI, S., On the Strong La.w of La.rge Numbe'f's, T8hoku Math. J., Vol. 7 ( 1956 ) , pp. 166-170. UMEGAKI, H., Conditiona.l Ezpecta.tion in a.n Opera.to'f' Algebra., T8hoku Math. J., Vol. 6 ( 1964) , pp. 177-181. YOSIDA, K., Ope'f'a.tO'f' Theoretica.l Trea.tment of the Ma.'f'kof!'s P'f'ocess, Proc. Imp. Acad. Japan, Vol. 14 ( 1938 ) , pp. 368-867. C H E LSEA S C I E N T I F I C B O O KS THEORY OF PROBABILITY 8y 8. V. GNEDENKO This textbook, by Russia's leading probabilist, is suitable for senior undergraduate and first-year graduate courses. It covers, in highly readable form, a wide range of topics and, by carefully selected exercises and examples, keeps the reader throughout in close touch with problems in science and engineering. The translation has been made from the fourth Russian edition by Prof. B. D. Seckler. Earlier editions have won wide and enthusiastic accept ance as a text at many leading colleges and universities. "extremely well written . . . suitable for indi vidual study . . . Gnedenko's book is a milestone in the writing on probability theory."-Science. Partial Contents : I. The Concept of PrQbability (Various approaches to the definition. Space of Elementary Events. Classical Definition. Geomet rical Probability. Relative Frequency. Axiomatic construction . . . ) . II. Sequences of Independent Trials. III Markov Chains IV. Random Variables and Distribution Functions ( Continuous and dis crete distributions. Multidimensional d. functions. Functions of random variables. Stieltjes integral ) . V. Numerical Characteristics of Random Variables (Mathematical expectation. Variance . . . Moments ) . VI. Law of Large Numbers ( Mass phenomena. Tchebychev's form of law. Strong law of large numbers . . . ) . VII. Characteristic Functions ( Prop erties. Inversion formula and uniqueness theorem. Helly's theorems. Limit theorems. Char. functs. for multidimensional random variables . . . ) . VIII. Clas sical Limit Theorem ( Liapunov's theorem. Looal limit theorem) . IX. Theory of Infinitely Divisible Distribution Laws. X. Theory of Stochastic Proc esses ( Generalized Markov equation. Continuous S. processes. Purely discontinuous S. processes. Kolmogorov-Feller equations. Homogeneous S. processes with i ndependent increments. Stationary S. process. Stochastic integral. Spectral theorem of S. processes. Birkho1f-Khinchin� ergodic theorem) . XI. Elements of Queueing Theory ( General char acterization of the problems. Birth-and-death proc esses. Single-server queueing systems. Flows. Ele ments of the theory of stand-by systems ) . XII. Elements of Statistics ( Problems. Variational se ries. Glivenko's Theorem and Kolmogorov's cri terion. Two-sample problem. Critical region . . . Confidence limits) . TABLES. BIBLIOGRAPHY. AN SWERS TO THE EXERCISES. --4th ed. Summer, 1 967. Approx. 500 pp. 6x9. [ 1 32] $9.50 C H E L S E A S C I E N T I F I C B O O K S GYROSCOPIC THEORY 8y G. GREENHILL This work is intended to serve as a collection in one place of the various methods of the theoretical explanation of the motion of a spinning body, and as a reference for mathematical formulas required in practical work. Originally published as a report to the British Advisory Committee for Aeronautics. CHAPTER HEADINGS : I. Steady Gyroscopic Mo tion. II. Gyroscopic Applications. III. General Un steady Motion of a Top or Gyroscope. IV. Geomet rical Representation of the Motion of a Top. V. Algebraical Cases of Top Motion. VI. Numerical Illustrations and Diagrams. VII. The Spherical Pendulum. VIII. M�tion referred to Moving Origin and Axes. IX. Dynamical Problems of Steady Motion and Small Oscillation. Fold-out plates. -1 91 4-67. vi + 277 + P lates. 6 \12xl 03,4. [205] $9.50 COMMUTATIVE NORMED RINGS By I. M. GELFAND, D. A. RAIKOV, and G. E. SHILOV Translated from the Russian. In the second Eng lish edition-to appear in 1967-Chapter II has been revised in accordance with a manuscript espe cially prepared for this edition by Professor Shilov. Partial Contents : CHAPS. I AND II. General Theory of Commutative N ormed Rings. III. Ring of Absolutely Integrable Functions and their Dis crete Analogues. IV. Harmonic Analysis on Com mutative Locally Compact Groups. V. Ring of Functions of Bounded Variation on a Line. VI. Regular Rings. VII. Rings with Uniform Con vergence. VIII. N armed Rinfs with an Involution and their Representations. X. Decomposition of N ormed Ring into Direct Sum of Ideals. HIS TORICO-BIBLIOGRAPHICAL NOTES. BIBLIOGRAPHY. -2nd ed. 1 967 . 306 pp. 6x9. [ 1 70] $6.50 LES I NTEGRALES DE STIELTJ ES et leurs Appl ications aux Problemes de la Physique Mathematique By N. GUNTHER The present work is a reprint of Vol. I of the publications of the V. A. Steklov Institute of Mathematics, in Moscow. The text is in French. -1 932-49. 498 pp. 5%x8. [63] $6.50

                    本文档为【柯尔莫哥洛夫-概率论基本概念】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，
                    图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。 
 该文档来自用户分享，如有侵权行为请发邮件ishare@vip.sina.com联系网站客服，我们会及时删除。

                    [版权声明] 本站所有资料为用户分享产生，若发现您的权利被侵害，请联系客服邮件isharekefu@iask.cn，我们尽快处理。

                    本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权，请谨慎使用。

                    网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传，仅限个人学习分享使用，禁止用于任何广告和商用目的。
                

下载需要：￥17.0 已有0 人下载

立即下载

柯尔莫哥洛夫-概率论基本概念

你可能还喜欢