Evolution of teaching the probability theory based on textbook by

. The paper is devoted to the study of what changes the course of the probability theory has undergone from the end of the 19 th century to our time based on the analysis of The Theory of Probabilities textbook by Vasyl P. Ermakov published in 1878. In order to show the competence of the author of this textbook, his biography and creative development of V. P. Ermakov, a famous mathematician, Corresponding Member of the St. Petersburg Academy of Sciences, have been briefly reviewed. He worked at the Department of Pure Mathematics at Kyiv University, where he received the title of Honored Professor, headed the Department of Higher Mathematics at the Kyiv Polytechnic Institute, published the Journal of Elementary Mathematics, and he was one of the founders of the Kyiv Physics and Mathematics Society. The paper contains a comparative analysis of The Probability Theory textbook and modern educational literature. V. P. Ermakov's textbook uses only the classical definition of probability. It does not contain such concepts as a random variable, distribution function, however, it uses mathematical expectation. V. P. Ermakov insists on excluding the concept of moral expectation accepted in the science of that time from the probability theory. The textbook consists of a preface, five chapters, a synopsis containing the statements has made significant progress since the end of the 19 th century. Basic concepts are formulated more rigorously; research methods have developed significantly; new sections have appeared.

for the higher grades, where, in addition to tasks related to combinatorics, there were also simple tasks for life insurance (Mačák, 2005, p. 14-15).
By the beginning of the 20 th century, when interest in actuarial mathematics and mathematical statistics grew, probability theory was taught sporadically at universities in the Czech Republic and Austria. In some technical universities in the middle of the 19 th century, the "political arithmetic" was studied (Bilová, Mazliak, & Šišma, 2006, pp. 3-4). It is believed that Doppler and Matzka were the first to expound the probability theory in higher educational institutions of the Czech Republic in the middle of the 19 th century (Mačák, 2005, p. 14). In the 1870s, Panek taught a course at the Czech Polytechnic University containing absolute, relative, and complex probability, geometric probability, Bernoulli's theorem and Poisson's theorem, objective and subjective expectations, posterior probability, the Bayesian formula, Laplace's theorem, insurance, the historical overview of the computation of probabilities, and the method of least squares (Bilová, Mazliak, & Šišma, 2006, pp. 4-5).
In Berlin between 1829 and 1850, Dirichlet gave nine courses of lectures on the probability theory and its application to the theory of errors, in particular to the method of least squares. In these lectures, problems related to the duration of the game, Stirling's formula, the probability of hypotheses, Bernoulli's theorem, the central limit theorem, estimates of the distribution of medians, geometric probability (Fischer, 1994, pp. 40-42).
In the territory of the Russian Empire, the course of probability theory was first taught in 1829 by Zygmund Revkovsky at Vilnius University. There was no textbook in Russian at that time and therefore, he not only developed the course himself, but also introduced his own terminology. In 1849, Professor Bayer began teaching the probability theory at Kharkiv University. In 1850 V. Ya. Bunyakovsky began to give a course in probability theory at Moscow University and in 1857, such a course appeared at St. Petersburg University. In other Russian universities in the middle of the 19 th century, this course was not given (Kletska, 2016, p. 125).
In Russia, the first textbook on the probability theory The Foundations of the Mathematical Probability Theory (1846) belonged to V. Ya. Bunyakovsky. This textbook covered not only the theory, but also the history of the emergence and development of the probability theory. It provided an explanation for problem solving and indicated many practical applications. And the first in Ukraine a textbook on the probability theory was published in 1878 by Prof. V. P. Ermakov, a student and colleague of Vaschenko-Zakharchenko (Kletska, 2016, pp. 125-126).
The purpose of this paper is to study the changes that the course of probability theory has undergone since the publication of V. P. Ermakov's textbook The Probability Theory (Ermakov, 1879) to the present.

Research methods.
The paper uses a comparative analysis of The Probability Theory by V. P. Ermakov and textbooks of the second half of the 20 th century and the beginning of the 21 st century on probability theory for students of higher educational institutions.

Results and discussion.
Review of the biography of V. P. Ermakov. Before proceeding to discuss the textbook and to compare it with other literature, we should make sure of the competence of the author.
Vasyl Petrovych Ermakov is an outstanding mathematician, author of about 150 papers, Corresponding Member of the St. Petersburg Academy of Sciences, Professor at Kyiv University, first Head of the Department of Higher Mathematics at Kyiv Polytechnic Institute. He worked in areas such as the theory of series, variational calculus, differential equations, the theory of special functions, algebra, and the theory of numbers. In particular, he owns one of the most important results in the theory of seriesa very sensitive criterion for the convergence of series with integral terms.
He received his primary education at the parochial school where his father taught. He then studied at Gomel Gymnasium and then at Chernihiv Gymnasium. In 1864, he entered Kyiv University in the Mathematical Department of the Faculty of Physics and Mathematics (Dobrovolsky, 1981, p. 6).
In 1871, Ermakov introduced a new test of the convergence of series, which bears his name and made him known to the mathematical community. In 1873, he defended his thesis on "The General Theory of Integration of Linear Differential Equations of Higher Orders with Partial Derivatives and Constant Coefficients". In 1874, Ermakov was elected Assistant Professor of the Department of Pure Mathematics at Kyiv University. Also, from 1871, for several years, he taught at the Kyiv Women's Gymnasium; from 1874 to 1880, he was a teacher at a military gymnasium and also taught geometry at the Higher Women's Courses (Dobrovolsky, 1981, pp. 19, 23-25 (Gratsianskaia, 1956, pp. 675-679).
On March 16, 1922, Vasyl Petrovych Ermakov died. A. P. Psheborskyi recalled, "V. P. as a Professor enjoyed deep respect and great love. In his relations with students, he was harsh and rude sometimes, but, at the same time, extremely gentle and forgiving, so a lot, or rather everything, was forgiven "Vasyl", as the students called him among themselves in my time. Everyone knew that he would always find warm support from V. P. in his scientific work" (Psheborskyi, 1922).
At Kyiv University and KPI, Ermakov taught such subject matters as probability theory, partial differential equations, ordinary differential equations, theory of vectors in the plane, analytical geometry, spherical trigonometry, etc. Based on these courses, he wrote a large number of textbooks, the hallmarks of which were clarity and ease of presentation (Gratsianskaia, 1956, p. 676).
The Probability Theory ("Теорія вѣроятностей") by Ermakov is considered to be the first textbook on the probability theory in Ukraine. It was first published in 1878 in Izvestia of Kyiv University; a year later, it came out as a separate edition; and 23 years later, it was republished using lithography (Gnedenko & Gikhman, 1956, p. 485).
Thus, V. P. Ermakov is worth a detailed study. Let us analyze, using his example, how the teaching of this science has changed since then and up to the present.
Using definitions of probability.
The modern probability theory is based on the axiomatic approach of A. M. Kolmogorov and, thus, relies heavily on set theory and measure theory. In particular, the mathematical expectation is the Lebesgue integral.
The axiomatic definition is used in educational literature of the late 20th and early 21st century for physical and mathematical specialties (Shiryaev, 1979;Kartashov, 2008;Gnedenko, 1988;Seno, 2007). Other textbooks limit themselves to the use of classical, geometric, and statistical definitions of probability.
In Ermakov's book, only the classical definition is used. The axiomatic approach, of course, cannot be mentioned, since it was formed later. No statistical definition is used.
The geometric definition is not mentioned in the book, although it was known by that time. In particular, it was included in the curriculum of the course that Dirichlet taught in 1838 (Fischer, 1994, p. 42). But does Ermakov really not consider the problems associated with an infinite number of elementary events? He gives the following problem, "The rod is broken at random into three parts; what is the probability that these three pieces can make a triangle?" Vasyl Petrovych emphasizes that in this problem both the number of all and the number of favorable elementary events called cases are infinite, but he does not introduce additional definitions, but solves the problem using the passage to the limit. He proposes to split the rod into 2n http://www.hst-journal.com Історія науки і техніки, 2021, том 11, випуск 2 History of science andtechnology, 2021, vol. 11, issue 2 equal parts and to assume that the rod can only break at the points of division. Denoting the lengths of the formed pieces by x, y and z=2n-x-y, he obtains the inequalities , , .
xn yn x y n   + (1) Analyzing the number of solutions to this system in natural numbers, he obtains the probability for the discrete model, and then, using the passage to the limit, finds the required probability 1 4 (Ermakov, 1879, pp. 29-30). Thus, unlike modern authors, V. P. Ermakov limited himself in his textbook to only the classical definition of probability.
The structure and subject of the textbook.
Textbooks of the late 20th and early 21st century on the probability theory differ greatly depending on the audience for which they are intended. Thus, textbooks for students of physics and mathematics specialties may contain sections such as Markov's chains (for example, (Shiryaev, 1979;Kartashov, 2008;Gnedenko, 1988;Seno, 2007), martingales (Shiryaev, 1979;Kartashov, 2008), stationary sequences (Shiryaev, 1979), etc. However, usually the elementary probability theory (Shiryaev, 1979;Gnedenko, 1988;Seno, 2007;Gmurman, 2004), which includes the classical and geometric definition of probability, is considered first, and then the axiomatic definition is introduced (in the literature for students of physical and mathematical specialties, for example, (Shiryaev, 1979;Kartashov, 2008;Gnedenko, 1988;Seno, 2007), the concept of a random variable and related concepts such as the distribution function, distribution density, mathematical hope, characteristic function. Further, the authors move on to more complex sections, which may include limit theorems, elements of the theory of random processes, etc. Often textbooks, along with the probability theory, also contain mathematical statistics (Kartashov, 2008;Gnedenko, 1988;Seno, 2007;Gmurman, 2004;Kushlyk-Dyvulska, Polishchuk, Orel, & Shtabaliuk, 2014;Sliusarchuk, 2005;Klesov, 2010;Yezhov, 2001;Rabyk, 2004;Gorban & Snizhko, 1999). V. P. Ermakov's textbook contains only the probability theory without mathematical statistics. It consists of a preface, five chapters, a synopsis, and problems with answers. The first chapter is devoted to combinatorics and Newton's binomial. In the second chapter, the concept of the probability of an event is introduced and the techniques for determining the probabilities of unification and intersection (in modern terminology) of events have been considered. In the third chapter, the probabilities of events in repeated tests have been examined, Bernoulli's theorem has been proved, and the concept of mathematical expectation has been introduced. The fourth chapter is devoted to conditional probabilities. The fifth chapter discusses the Jordan method and 2 42 n n − − http: //www.hst-journal.com Історія науки і техніки, 2021, том 11, випуск 2 History of science andtechnology, 2021, vol. 11, issue 2 its application. In the synopsis of the probability theory, in less than six pages, the basic concepts and results of the book have been formulated. Finally, at the end of the textbook, there are 61 self-instructional tasks and answers or hints to each of these tasks.
Thus, in this textbook, there is not even the concept of a random variable and, therefore, there are no distribution functions, characteristic functions, but there is a concept of mathematical expectation.
According to the modern definition, the mathematical expectation of a random variable is introduced as the Lebesgue integral over the probability measure. Usually, it is denoted by M or E.
Since students of many specialties do not study the Lebesgue integral, in their textbooks, the authors have to avoid this concept and introduce separate definitions of the mathematical expectation for discrete and continuous random variables (Gmurman, 2004;Kushlyk-Dyvulska, Polishchuk, Orel, & Shtabaliuk, 2014;Klesov, 2010;Yezhov, 2001;Rabyk, 2004;Gorban & Snizhko, 1999). In particular, for a discrete random variable, the mathematical expectation is the sum of the products of all its values by the corresponding probabilities.
It is this approach that Ermakov uses in his book. Instead of a random variable, he considers "a certain quantity x that can acquire different values, depending on which of the events will occur during observation". And the mathematical expectation is introduced as "the sum of the products of the probability of each event that can occur during a given observation, by the value acquired by the unknown when this event occurs" (Ermakov, 1879, pp. 64-65).
In addition to the simplest properties of the mathematical expectation, Ermakov considers the simplest version of the conditional mathematical expectation, although he does not use this term. He pays great attention to examples of harmless games, that is, those in which the mathematical expectation of the player's payoff is equal to zero.
Ermakov also recalls the concept of moral expectation, which was used in the probability theory of that time. It was suggested by Daniel Bernoulli to address questions about the harmlessness of games. At the same time, Bernoulli and subsequent researchers reasoned as follows: if the poor man and the rich one can receive the same amount of money with the same probabilities, then, although the mathematical expectations of their profits are equal, the moral expectations are different, since in reality, the moral satisfaction of the poor man is much greater than the satisfaction of the rich one. That is, when calculating the moral expectation, the player's property is also taken into account. Ermakov insists on removing the concept of moral expectation from the probability theory, because, firstly, it is difficult to establish an exact measure for determining moral pleasure, and secondly, if a game, according to the theory of moral expectation is harmless for one player, it is disadvantageous for his opponent (Ermakov, 1879, p. 78). Indeed, the concept of moral expectation has disappeared from the probability theory and it is not found in modern textbooks.
Also in Ermakov's book, there are no concepts of variance and the random variable itself.
Let's consider in more detail the contents of each chapter of the book. As mentioned, the first chapter is about combinatorics. It deals with permutations and combinations. At the same time, notation, results, and methods of proof are not very different from modern ones, except that the notion of factorial is not introduced. Some properties are proved in several ways, but the geometric interpretation of the combinations is not used. Further, Ermakov considers Newton's binomial. He not only gives and proves the Newton binomial formula and some properties of binomial coefficients in two ways, but also considers a binomial series with a negative integer exponent, proves the convergence, and finds the sum of this series. He calls a suitable formula Newton's formula with a negative exponent. Also in the first chapter, complex auxiliary formulas for binomial coefficients, which will be used in other sections, have been deduced. The second chapter introduces the concept of probability using the classical definition. In this case, the concept of an incident is introduced intuitively, "Everything that happens in nature is called a phenomenon. Each phenomenon leads to many occurrences; in some of these cases, one event occurs, in others another one" (Ermakov, 1879, p. 21).
No operations on events are formally introduced, but the probability that two events occur is being investigated. In particular, the probability of an intersection of independent events is being investigated, "The probability of a complex event consisting of the simultaneous occurrence of several independent events is equal to the product of the probabilities of simple events" (Ermakov, 1879, p. 34). In this case, the concept of independent events is introduced as follows: events are considered independent of each other if "the probability of each event does not depend on whether other events have occurred or not" (Ermakov, 1879, p. 40). For comparison, in modern probability theory, two events are called independent if ( ∩ ) = ( ) ⋅ ( ). ( As for the probability addition theorem, the textbook by V. P. Ermakov states, "The probability that one of several events will occur is equal to the sum of the probabilities of these events" (Ermakov, 1879, p. 118). This statement is generally incorrect, as it is true in the case of incompatible events. For two compatible event A and B , the probability that at least one of them will occur is by the formula There are also probabilities that the event will take place for the first time in the trial or will not take place at all in the series. In practice, conditional probabilities are already in use, although this term is not used either, "The probability of a complex event, consisting of the occurrence of any events, is equal to the product of several factors, the first factor of which expresses the probability of the first event, the second factorthe probability of the second event calculated on the assumption that the first event has already occurred, the third factorthe probability of the third event calculated under the assumption that the first two events have already occurred, etc." (Ermakov, 1879, p. 117).
The third chapter is devoted to the events during retests and the mathematical expectation, which was already mentioned in detail above. Both Bernoulli's scheme and Bernoulli's theorem have been considered.
In modern textbooks, much attention is paid to limit theorems, especially to those designed for students of physics and mathematics specialties. Boundary theorems include various forms of the law of large numbers and the central boundary theorem. In textbooks for students of physical and mathematical specialties, such as (Shiryaev, 1979, p. 385), the law of the iterated logarithm has been also considered, which occupies an intermediate position between these theorems.
Bernoulli's theorem was first proved by Jacob Bernoulli at the end of the 17 th century. The textbook uses the method proposed by Chebyshov to confirm a more general result, from which Bernoulli's theorem follows as a special case. It is to prove this theorem that the auxiliary formulas given in the first chapter are used. Ermakov formulates it as follows, "The probability that the ratio of the number showing how many times an event occurred in m tests to the total number of tests differs from the probability of the event by an amount not exceeding , lies in the range from to 1, where p is the probability of the event, and q=1-p" (Ermakov, 1879, pp. 118-119). From this, he concludes that this ratio, with an increase in the number of trials, coincides with the probability of an event. Thus, his textbook contains the law of large numbers with proof.
But the central limit theorem is not mentioned in Ermakov's book and it cannot be mentioned, since it does not use the very concept of a random variable distribution.
The next chapter examines the probabilities that are called conditional in modern terminology. In fact, the formula of total probability and the Bayesian formula have been given, and the simplest kind of conditional mathematical expectation has been also considered.
The last fifth chapter is unusual for the modern reader. It considers such a class of problems. "Suppose some experience leads to several events; we will specifically choose from these events the following: 1 , 2 , 3 , … (5) and show how to solve such issues: 1) How likely is it that r events (5) will appear? 2) How likely is it that there will be r or more events (5)

3)
How likely is it that none of the events (5) will occur? 4) How large is the average number of events (5) that have appeared?" (Ermakov, 1879, p. 100).
V. P. Ermakov notes that such problems are quite diverse. Many have been solved, but separate methods were used for all these tasks. He also gives the general method proposed by Jordan. Based on the results of the first chapter, with formulas for the binomial series inclusive, answers to all the above questions have been given, namely, the probability that r events from (5) class will occur is where the sum of the probabilities of all possible events, consisting in the fact that r selected events from (5) class are coming (Ermakov, 1879, p. 103). The rest of the tasks are solved in a similar way. The application of these formulas has been demonstrated for a large number of tasks.
The textbook contains a huge number of tasks to be solved. It also ends with a collection of tasks, to which instructions and answers are given. Interestingly, some of these tasks have already been discussed in detail in the main part of the book.
Discussion. The Probability Theory by V. P. Ermakov was one of the most modern textbooks by the time of its release (Kletska, 2016, p. 126).
The situation with educational literature at the end of the 19 th century in the Russian Empire was very difficult. The course of the probability theory was just being introduced into university programs and lecturers and students had only the textbook by Bunyakovsky, which could not be called an elementary one. True, in 1857 The Application of the Probability Theory in Calculating Observations and Geodetic Measurements by A. N. Savych was published, but it pursued other goals and it could not be a textbook on the probability theory. Therefore, the appearance of the textbook by V. P. Ermakov was an outstanding event (Gnedenko & Gikhman, 1956, pp. 484-485).
B. V. Gnedenko and I. I. Gikhman (1956, p. 486) believed that there are many logical and methodological mistakes in it, much was taken as obvious, although in fact, it required detailed consideration, but it was a step forward compared to other courses, as the author tried to acquaint the reader with the main achievements of that time, illustrating them with many examples. They appreciate this textbook, although they point out the following shortcomings. When defining probability as the ratio of the number of favorable cases to the number of all possible ones, Ermakov does not indicate that such a definition requires the requirement that all cases are equally possible. When formulating the multiplication theorem, he does not indicate that it comes true only for independent events and the very concept of independence appears only 6 pages later. The addition theorem is formulated without requiring the incompatibility of events. Also, the independence requirement is absent in the formulation of Bernoulli's theorem.

Conclusions.
From the end of the 19 th century to the beginning of the 21 st century, the course of probability theory underwent great changes and these changes have influenced even the very concept of probability. An axiomatic approach has appeared, the definition has become clearer and more formalized, and such insufficiently clear concepts as moral expectation have disappeared. Using clear axiomatics, it was possible to introduce such concepts as a random variable, a distribution function, to implement powerful research methods, for example, the method of characteristic functions. Due to the fact that many new sections have appeared, some of the results and methods are no longer studied, as happened, in particular, with the Jordan method. Funding.