Making decisions



Other articles in the cycle
Decision making

Decision making. Example



This work is about the security of information systems in which serious information decisions are made and which can be divided into three types:



  • firstly, information retrieval systems (information retrieval systems (ISS), information-measuring systems (IIS) and others);
  • secondly, transceiver systems (data transmission systems (DTS), request-response systems (ZOS) and others);
  • -, , ( , , ).


In all systems, management is an important phenomenon, process, activity, which include, as components, the organization of the system, the allocation of resources (planning), decision making and communication.



It is difficult to name an area of ​​activity in which decisions would not be made from time to time. This situation and phenomenon has always taken place both now and in the future. A person will not lift a finger without making a decision about it. It is not always realized, but it is exactly so.



Here (in the work) we will focus on the theory of choice and decision making, which studies mathematical models of decision making and their properties. For a long time, the science of decision making has evolved, one might say, one-sided. The classical scheme is covered by the statistical theory based on the risk function, on errors of the first and second kind.



This approach to decision-making has played a positive role and its applicability is not denied today, but is limited to the principles of rationality. The approach is not without its drawbacks. There is a well-known catch phrase attributed to the classic (Gosset (pseudonym Student)) from the statistical theory "about three types of lies: deliberate, unintentional and statistics."



Another direction of decision-making theory - algebraic - appeared somewhat later, but it turned out to be inaccessible for understanding (and, as a consequence, for application). The approach is based on the theory of partial order relations and its particular version - preference relations. I recently wrote about this , but the publication was not approved, to put it mildly.



I see the viciousness of this practice in the fact that such an attitude towards the publication, the readers who have the opportunity to give negative marks, slows down and discourages other readers from getting to know it, relying on someone else's opinion.



Perhaps, after a short time, the hotheads cooled down, nothing offensive was said in the publication, but someone took my remarks personally. Even the educational literature of the second approach is very limited, and although there are monographs, they are difficult for perception, which is a certain brake on the development of the approach.



When dealing with information security (IS), it is desirable to see the whole range of problems and tasks inherent in it, and, of course, the task of information security management, in particular, selection and decision-making, is important in the full list of tasks.



In general, here I return to the theory of relations and its applications, one of which is the mechanism of decision making and the results of the theory of decision making.... In this publication I will reveal the main provisions of the theory, and in the next I will give an example showing the computational aspects and details. First, I will name the main subject elements of the statistical approach in decision-making theory and then briefly describe them.



Risk function (RF). Errors, kind of error;

Initial set of alternatives (IMA);

Optimality principle (OP);

Decision maker (DM);

Selection function (FV);

Utility function (FP);

Decision criteria.



Decision making and ways to minimize risk



The decision is always made in a situation of choice, involving losses, chance and certain risks that it is desirable to minimize. If there is no choice, then there is nothing to decide, act uniquely or do nothing at all, as the alternative indicates.



The rationale and purpose of risk minimization is to apply effective safeguards so that the residual risk in the system becomes acceptable.

Risk minimization Assumes the solution of three questions: identification of those areas where the risk is unacceptably large; selection of the most effective means of protection; evaluating safeguards and determining whether residual risk in the system is acceptable.



Scientific research uses hypotheses that are put forward, formulated, verified, confirmed or refuted, this is a natural way of research. Hypotheses can be very different in content, how they are formulated, and how they are tested. An important class is statistical hypotheses, which are formulated either with respect to the form of the distribution law of a random variable, or with respect to the parameters of this law, or with respect to the rank ordering of the values ​​of the random variable.



Hypotheses formulated regarding probabilistic and statistical and rank values ​​are checked and evaluated using various kinds of statistical techniques and criteria. The results of testing and evaluating statistical hypotheses make it possible to draw qualitative conclusions regarding the phenomena under study. For example, the degree of closeness of the empirical distribution law of a random variable to the theoretical normal or Poisson law.



Null and alternative hypotheses . Usually null hypothesis0consists in the fact that an assumption is made about the form of the probability distribution law of a random variable or about the parameter of such a law, or about the rank sequence. Another hypothesis is1called alternative.



An example . Let the hypothesis0- consists in the fact that the random variable obeys the Poisson distribution law or the normal distribution law. Alternative hypothesis1- the random variable does not obey either the Poisson distribution law or the normal distribution law. There may be several alternative hypotheses. Hypothesis1acts as a negation.



Testing the truth of hypotheses is always performed on a random sample. But the sample is limited (finite), and therefore it cannot perfectly accurately reflect the law of probability distribution in the general population. There is always a risk of formulating such a hypothesis that a β€œbad” sample may give completely false information about the merits of the case. Thus, there is always a chance to arrive at a false decision.



According to the results of applying one of the criteria for statistical testing of hypotheses, one of four situations

arises : - null hypothesis0accepted, and it is correct (respectively, the

false alternative hypothesis is rejected1);

-zero hypothesis0is rejected, and it is false (accordingly

, the correct alternative hypothesis is accepted1);

-zero hypothesis0 rejected, although it is true (accordingly, a false hypothesis is accepted 1);

-zero hypothesis0 accepted, although it is false (respectively, the true alternative hypothesis is rejected 1);

The first two situations represent the correct decision, and the last two are the wrong decision.



Errors of the first and second kind .

An error of the first kind Ξ±1 is a decision consisting in rejecting the correct hypothesis0(third situation, often referred to as "missing a target").

An error of the second kind Ξ±2 is the decision to accept the null hypothesis0although it is false (called "false alarm").





Errors of the 1st and 2nd kind can have different significance and then the choice as the main hypothesis 0when solving the problem at hand becomes important. An error of the first kind should be considered one of the possible errors that is more important to avoid, i.e. it is better to finalize the correct than to accept the wrong.



Let there be an event represented by the vectorS=S(x1,x2,…,xn)in n-dimensional space, which can belong to only one of the two sets V1 or V2. Of interest is a method that, based on the study of an event represented by a vector, would allow, with a minimum error probability, to obtain an answer to the question of which of the two V1 or V2 sets should be attributed to the event under study or the vector corresponding to it.



In other words, the method must classify the event and end with a decision to assign it to a specific class. Theoretically, in the process of making such a decision, errors of two kinds are possible, which are precisely called errors of the first and second kind. At the same time, two hypotheses are put forward:



H0(SΡ”V1) - the hypothesis that the event S belongs to the set V1 and

H1(SΡ”V2)- a hypothesis assuming that the event S belongs to the set V2.



We will assume that an error of the first kind is allowed when the hypothesis is rejectedH0(SΡ”V1), although it is valid, and an error of the second kind is allowed if the hypothesis is accepted H0(SΡ”V1) when the hypothesis is true H1(SΡ”V2) (1) .

Usually null hypothesis0consists in the fact that an assumption is made about the phenomenon under study. Another hypothesis1called alternative.



There may be several alternative hypotheses, and all of them act as a negation of the null.

Hypothesis testing is always performed on a random sample, but in an experiment the sample is always finite, and therefore it cannot perfectly accurately reflect the probability distribution in the general population.



There is always a risk of formulating such a hypothesis that a β€œbad” sample may give completely false information about the essence of the case. There is always a chance to arrive at a false decision. A type I error is often referred to as β€œmissing a target”, and a type II error is called a β€œfalse alarm”.



In conflict situations, the principle of maximum efficiency remains fully valid. The specificity of the conflict is the uncertainty of the situation, which gives rise to risk. Consequently, the general principle of rational behavior in a conflict is maximum efficiency with an acceptable risk (or achieving efficiency not lower than the specified one with minimum operational risk). The concept of risk is far from unambiguous.



Analysis of various events and opportunities allows you to find a rule that determines the solution for each point of the considered n-dimensional space. Indeed, if the observed event is a threat when it manifests itself in the form of an attackA=A(x1,x2,…,xn) (2) , which should be attributed to one of the two images (classes) V1 or V2, then a situation arises that occurs during pattern recognition.



Let the probability of a threat (attack) appearS=S(x1,x2,…,xn), provided that its image belongs to the class V1. This probability, which characterizes the density of images (members) of the class V1, is called the conditional probability density in the class V1, and is denotedΟ†(x1,...,xn/V1) or Ο†(Xn/V1) (3) .



The designation for the conditional density of the probability distribution in the class V2 is introduced similarly, i.e.Ο†(Xn/V2) (4) .

The probability of a "false alarm", i.e. the decision that there is an attack belonging to the class V1, while in reality the attack belongs to the class V2, is written as,

(5)

whereφ(V2)Is the prior probability of attack by an object from class V2.



Similarly, the probability of β€œmissing the target” can be written as , (6)

whereφ(V1)- a priori probability of an attack by an object from class V1; and

RV1,RV2- areas of space corresponding to classes V1 and V2.



Of practical interest is such a decision rule that would minimize the risk W or the average cost of making a decision, determined by the following formulaW=Ξ±1PΞ±1+Ξ±2Pb (7) , where Ξ±1 is the weight of the type I error, Ξ±2 is the weight of the type II error.



Considering that the areasRV1,RV2form the entire space of possible values, and the integral of the probability density over the entire space is equal to unity, we obtain (8) The



interpretation of this approach can be as follows. The problem of choosing the optimal solution is reduced to dividing the space of attack images into two areasRV1,RV2, so that the risk W is minimal. From the expression for W we see that for this purpose the regionRV1should be chosen so that the integral in (8) would take the largest negative value.



In this case, the integrand must take the largest negative value, and outside the regionRV1there is no other where the integrand is negative, i.e. (9)



From relation (9) we easily obtain the following decision rule S V1 if , (10)

which consists in comparing the ratios of the probability densities with a certain threshold ΞΈ, which is constant for certain values ​​of the weights Ξ±1 and Ξ±2. This rule belongs to the class of Bayesian rules, and the ratio of the probability densities is called the similarity coefficient.



In the case α1 = α2 andφ(V1) = φ(V2)the threshold θ is obviously equal to one, and here everything is more or less clear. Problems arise on the left side of the decision rule (10) . Conditional Probability Densitiesφ(Xn/V1) and φ(Xn/V2)are supposed to be known.



In fact, this is not the case. Moreover, obtaining their analytical or even numerical value presents significant difficulties. Therefore, most often they are limited to approximate values, determining the relative frequency with which attacks of an object from the V1 class occur. The limited sample is processed appropriately and unknown distributions are estimated from the processing results.



The initial set of alternatives (options) Ξ©, set by the situation, constraints, resources, and other conditions. The set Ξ© needs to be ordered. Definition. A loose ordering is a binary relation, reflexive, transitive, and asymmetric.



If such a BO is non-reflective, then the ordering is called strict. If in an ordering any two alternatives are comparable, then the ordering is linear or perfect. If not all alternatives are comparable, then the ordering is called partial. Preference relation is a special case of ordering.



The optimality principle defines the concept of better alternatives by mapping Ο†: Ξ© β†’ E1. This property of alternatives is called a criterion , the number Ο† (x) is the assessment of the alternative x by the criterion, E1 is the criterion space in which the coordinates of the points are quantitative estimates according to the corresponding criteria.



Central to the theory is the general problem of decision making, in which both the set of alternatives Ξ© and the optimality principle may be unknown. With known alternatives, a choice problem arises , and additionally, with a known optimality principle, a general optimization problem .



Definition. Decision maker (DM) is a subject of a decision, endowed with certain powers and responsible for the consequences of the adopted and implemented management decision.



This is a person (or a group of persons) who have a goal that serves as a motive for setting a decision-making problem and searching for its solution.

The decision maker's preference is a binary relation defined on the set of alternatives that describes the decision maker's preferences, for example, based on pairwise comparisons.



Definition .The risk function describes the risk or possible loss (damage) when choosing a particular alternative. Risk is the mathematical expectation of the loss function due to decision making. It is a quantitative assessment of the consequences of a decision. Risk minimization is the main criterion for optimality in decision theory.



According to the theory of statistical decisions, it is required to find a rule that would minimize the riskr, or the average cost of making a decision, determined by the formula r=δαPα+δβPβwhere δα - the cost (weight) of a type I error; δβ- the cost of a type II error.



Definition . The choice function C serves as a mathematical expression of the optimality principle and is a mapping that assigns to each X βŠ† Ξ© its subset C (X) βŠ† X [8, p. 32].

A set of options (alternatives) Ξ© = {xi,i=1(1)4}.



Consider the choice function C on this set Ξ©.(xi)=xi; (xi,xj)=xk;where k=min(i,j); C(xi,xj,xk)=(xi,xj,xk)βˆ’xrwhere r=max(i,j,k);C(Ω)=x1...

This function can be represented in logical form by a table.



In the table Ξ² (X) is the presented set of alternatives, Ξ² (C (x)) is the result of a choice in logical (Boolean) variables.

The essence of the decision, its adoption consists in choosing a suitable alternative.



Definition . FunctionalityU(x)- a function that can be used to represent preferences on a certain set of feasible alternatives. FunctionU(x)defined on an ordered set X is called a utility function if for allx,yΡ”X,xβˆ—>y<=>U(x)β‰₯U(y)...



If the set of alternatives X contains a small number of them, then by defining a binary preference relation (BO) on this set , that is, by ordering the alternatives, it is easy to choose a suitable one.



Having a large variety of alternatives that need to be streamlined becomes a laborious process. the difficulty is surmountable when it is possible to measure preferences and replace them with numerical indicators of quality.



Questions of representing preferences in the form of numerical functions belong to the mathematical theory of utility.

If the utility function exists, then in order to find the optimal solution (the maximum alternative according to a given preference), it is enough to find the maximum of the function U (x) on X, for which one can use classical mathematical analysis or optimization methods.



Theorem (existence of a utility function). If a strict preference (>) is given on an infinite set X, then for the existence of a utility function it is necessary and sufficient that X contain a countable set dense in order.



Definition . A set A is called order dense in X if for anyx,yΡ”X\A,x<y there is such zΡ”,x<z<y...

Let V be any monotonically increasing function ofU(x)then V[U(x)]will also be a utility function.



Further, if the preference is not a perfect (linear) ordering, then even then we can prove the existence theorem for the utility functionxβˆ—>y=>U(x)β‰₯U(y), but with a limitation. This is natural, since any function generates perfect ordering, but does not generate information about the initial preference.

A simpler utility function is a linear one,U(Ξ±β€²x+Ξ²β€²y)=Ξ±β€²U(x)+Ξ²β€²U(y), in which Ξ± 'and Ξ²' are defined as constants.



Theorem (existence of a linear utility function). If the set X and the ordering (*>) satisfy the conditions:

- the set of alternatives X is a convex set of the vector space;

- preference on a set of alternatives is continuous;

- mixtures composed of indifferent alternatives are indifferent, then there exists a real linear function U (x) such that for all

x,yΡ”X,xβˆ—>y<=>U(x)β‰₯U(y).



In practice, the two-dimensional case for the variables y and x is of interest.

The utility function takes the following form for the two-dimensional case

U(x,y)=(Ξ±x1/p+Ξ²y1/p)p.

For different values ​​of the parameter p, special cases can be obtained.



If p = 1, then the function is linear and describes perfect substitutes. In this case, the marginal rate of substitution is equal to the ratio of the parameters Ξ± / Ξ²,

U(x,y)=(Ξ±x+Ξ²y).



If p β†’ - ∞, then the Leont'ev function is obtained, which describes perfect complements. The marginal rate of substitution in this case is infinite.

U(x,y)=min(Ξ±x,Ξ²y).



As p β†’ 0, the Cobb-Douglas function is obtained if we impose the additional condition Ξ± + Ξ² = 1

U(x,y)=(xΞ±Β·yΞ²).



Decision making process modeling



The concept of a model in modern science has become familiar and the need to clarify the content of the concept has ceased to be realized. In practice, the concepts of models, procedures, schemes and decision-making methods are often confused and no longer distinguish one from the other. The possibilities of modeling preferences many times overlap those of a person and often the capabilities of the model turn out to be richer than reality.



It is only necessary to talk about a decision-making model in connection with a specific decision-making task (DP) to be solved. This means that a class of basic preference structures has been selected, within which the search for the best solution will be carried out.



Different models for solving the same ZPR will differ precisely in the principles underlying them. We assume that a set of initial structures of preferences (relations) is considered, given in matrix form, for example, matrices of pairwise comparisons. On this set, a certain DPD is investigated and it is said that on the set of initial structures a model for solving the stated DPR is given.



Rather strict requirements are imposed on decision-making models: correctness, adequacy, completeness, universality, etc.

Correctness in mathematics is determined by the existence of a solution, the uniqueness of the solution and its stability.



Adequacy - compliance with the original, i.e., the correctness of reflection in the model of the modeled principles and features of the decision-making process. Differences between normative (prescriptive) and descriptive approaches are significant.

The first is dominated by a priori assumptions about what the general principles should be, formulated as axioms, which the developed decision-making models should satisfy.



In the second, the features of the developed models are described not axiomatically, but attributively, using a system of properties, each of which is meaningfully interpreted by the decision maker and seems to him reasonable and, to one degree or another, desirable.



Completeness for the models is that the underlying principles underlying decision making should be reflected not only accurately, but also sufficiently.

The versatility of the model is determined by the possibility of its application to a wide class of initial preference structures.



Statistical decision making methods



The decision-making problem is formulated as follows.

There are m + 1 statesS0,S1,...,Sm the object of research, forming a complete group of incompatible events, the prior probabilities of states are, respectively, equal 0,1,...,m and 0+1+...+m=1...



For each of the states

, the likelihood functionsWn(x1,...,xn/Sj),j=1(1)m;;

- set of solutionsΞ³1,Ξ³2,...,Ξ³m;

- loss functionsjk=(Sj,Ξ³j),j=1(1)m,k=1(1)m;

Is the quality criterion for choosing a solution f (P) associated with the loss function.



It is required to determine the best rule in the sense of the accepted criterion used in the problemΞ΄(Ξ³1/x1,...,xn) use of observations x1,x2,...,xnto make a decision.

Correspondences are easily established: samples correspond to set Ex1,x2,...,xn, the probability measure P corresponds to the likelihood function Wn(x1,...,xn/Sj),j=1(1)m;



To set preferences on the set P in the sense of the accepted criteria means to define the rule for making a decision with the adopted criteria.

Criteria in the theory of statistical decisions are used depending on the completeness of the initial information. Consider the following set of criteria:

- Bayesian;

- the maximum of the posterior probability;

- maximum likelihood;

- minimax;

- Neumann-Pearson;

- Walda.



The method is based on the criterion for choosing an alternative. In accordance with the named criteria, the decision rules are formulated in the problem. The criteria themselves are compared by the quality of the decision-making rules, for example, by the conditional risk functionrj, which represents the average loss for a given state Sj...



Definition . Bayesian rule (criterion) - is the rule for making the optimal decision that minimizes the average risk function. The minimum value of the mean risk function is called Bayesian risk.



The use of this criterion assumes the presence of:

- loss functions(Sj,Ξ³k);

- conditional probability distribution functions of sample values

Wn(x1,...,xn/Sj),j=0(1)m;

Is the prior probability distribution of states0,...,m...



Definition . A special case of the Bayesian criterion is any minimax rule for choosing a solution under the conditions of the least favorable a priori probability distribution (j) states Sj...



With an unknown a priori distribution of states, a special criterion for the quality of decision-making is established using only the conditional risk functionrj...



The interpretation is as follows. There are many K decision-making rules, for each of which the value of the maximum value of the conditional risk is determined for all possible states of the research objectSj... Of these values, the smallest value is then selected.



This ensures that the losses (on average) will not exceed a certain value r *. Generally speaking, this rule is a very careful criterion.



Definition . The maximum posterior probability of statesSj with observed sample x1,...,xnis called the criterion of the species

.

In this case, one of the hypotheses regarding the statesSj,

j = 1 (1) m, for which the posterior probability is maximum.



This criterion is used with a known prior distribution of statesSj and lack of justification regarding the amount of losses jk... In this situation, partitioning of the sample space is performed. To the areaGk refer those samples x1,...,xnfor which, for all j β‰  k

P(Sk/x1,...,xn)β‰₯P(Sj/x1,...,xn)...

The criterion for making a decision is the maximum of the posterior probability.



Definition . The maximum likelihood criterion is a special case of the maximum a posteriori probability in the absence of a priori information about the distribution of state probabilities, about possible losses and the assumption that all states are equally probable, i.e.i=(m+1)βˆ’1.



According to the criterion in the analysis and observation of the sample x1,...,xn one of the hypotheses regarding the states Sjfor which the likelihood function Wn(x1,...,xn/Sj) more than other likelihood functions Wn(x1,...,xn/Sk),k=0,1,...,jβˆ’1,j+1,...,m.



Now we will consider the situation with two alternatives, which is often encountered in practice.

The decision-making problem is somewhat simplified and, when using any of the previously considered criteria, is reduced to calculating the ratio of likelihood functions for the observed samplex1,...,xn and comparing the result obtained with a predetermined threshold * (thresholds 0 and 1), i.e.

...

When the inequality is satisfied, the decision is madeΞ³1, indicating that the research object is in a state S1... The opposite inequality corresponds to the stateS0 and another decision is made Ξ³0...



The C * threshold value is determined by the used criterion. In the case of Bayes' criterion , where

0,(1) - respectively, the prior probabilities of occurrence of events S0(S1);

01,(10) - losses when an event occurs S0(S1) and accordingly the decisions made Ξ³1(Ξ³0); 00,11- losses with correct decisions.



With the criterion the maximum posterior probability, the formula is simplified

βˆ—=p0/p1, and

for the criterion of maximum likelihood it becomes constant C * = 1.

When using the minimax criterion, the threshold is calculated by the formula with inequality, in which instead of0,1 substitute the values ​​of the prior probabilities 0,1, at which the value of the average risk takes the maximum value



Definition The Neumann-Pearson criterion is the rule for choosing an alternative, in which the value of the threshold is determined based on a given value of the probability of a type I error (Ξ±).



Type 1 error occurs when the sample falls into the critical regionG1, although the phenomenon under study is in a state S0, i.e. the hypothesis is correct0and she is rejected.



Type II error occurs when the sample falls within the valid rangeG0, although the phenomenon under study is in a state S1, i.e. a false hypothesis is accepted -0.To determine the threshold value, it is necessary to solve the following integral equation (for Ξ±) with respect to C *

,

whereW10(y) - one-dimensional distribution density of the ratio of the likelihood function under the hypothesis 0...



In turn, the probability of an error of the second kind Ξ² is determined from the solution of the right integral equation, whereW11(y) - one-dimensional distribution density of the ratio of the likelihood function under the hypothesis 1...



Definition . The Wald criterion is a rule for choosing a solution, in which the ratio of the likelihood functions is compared with two thresholds01. Precise definition of thresholds 01is fraught with significant mathematical difficulties. ...

Conclusion



The paper gives a brief overview of the capabilities of the existing theory of statistical decision making. The main elements and components of the theory, applications and models are identified. A brief description of the named elements is given and their descriptions are given.



In educational terms, it is important to know about the existence of such a theory and, when the need arises and becomes aware of the need to make decisions, turn to its basics. I would like to note that in this area, as well as in the area of ​​education, everyone considers themselves (especially parents) to be quite competent.



But it is precisely the consequence of upbringing that alcoholism and drug addiction flourishes among young people, and the consequence of under-education is the decisions made that lead us to what we have in our country.



I do not exclude that again someone will be found and say that the conclusion is not the topic.



Bibliography
1. .., .. . -.: . , 1973. – 172.

2. .., .. .- .: , 1996. β€” 192.

3. .. . – .: .. «», 2005. – 144.

4. /.., .. .–.: , 2000.

5. . . β€” .: , 1976. – 248.

6. . . –.: , 1978. – 352 .

7. ., . . – .: ,1976.– 416.

8. .. . : / . . [ .]. β€” .: , 1982.

9. . .– .: ,1987. –608.

10. .. . – .: ,1969. – .

11. . . –.: , 1962. – .

12. .. . – .: – 89, 2003. –352.

13. CCEB-96/011. 1: . 1.0. 96/01/31. E/E. .

14. .. . . β€” .: ,1989. β€” 288.

15. .. . β€” .: ,1998. β€” 416.

16. – .: ,1953.-

17. . β€” .: ,1950. β€” 806.

18. . Z-. β€” .: , 1971.- 288.

19. FIPS PUB 191, . 9 1994 . E/E.

21. Katzke, Stuart W. ,Phd., β€œA Framework for Computer Security Risk Management”, NIST, October, 1992.

22. .. . β€” .: , 1977. β€” 302.

23. .. - .– .: , 1972. β€” 117.

24. .. . β€” .: , 1969. β€” 160.

25. .., .. . β€” .: , 1973. β€” 64.

26. . , . . β€” .: ,1970. 27. .. . . . β€” .: , 1995. β€” 120.

28. . . β€” .: , 1967. β€”

29. .. . β€” .: , 1977. β€” 30. . . -.: ,1985. β€”

31. . . β€” .: , 1989. –

32. . . β€” .: , 1997. β€” 376.




All Articles