⛹️ ☣️ 👨‍❤️‍👨 Personal or Social? How to achieve cooperation in a multi-agent environment 🕒 ⏪ 😫

Hey! My name is Dmitry, and I want to tell you about our article “Balancing Rational and Other-Regarding Preferences in Cooperative-Competitive Environments”, which was recently admitted to the AAMAS (A *) conference.

In this paper, we explore how a group of agents can be trained to achieve their own goals in mixed environments without interfering or even helping each other. We analyzed several existing solutions and offered our own. The post turned out to be high-level, the technical details are in the article .

Who are we

My name is Dmitry Ivanov , I am a third-year graduate student in economics at St. Petersburg HSE. I work in the Agent Systems and Reinforcement Learning group at JetBrains Research, as well as at the International Laboratory for Game Theory and Decision Making at HSE.

, 1 “ ” — -, , . JetBrains Research, -- .

, : , . , . — (. 1).

Fig. 1. The prisoner's dilemma. — . 1. .

. , : , . , 3 . , 2 . , , , 4 . : , , .. . .

— , (Peysakhovich and Lerer, 2017). , . . , — ‘Cooperate’ ‘Defect’. , . Sequential Social Dilemma (Leibo et al., 2017), , , .

, , — ( , ?) , . , ? : ?

: , (Rashid et al., 2018). : , . . (SW = Social Welfare):

$SW (r) = \ sum_i r_i$

SW , , , (). — , . , . “” ? (. 1). , , Defect-Cooperate Cooperate-Cooperate: 4 , , ! , , SW , — , . , ,

, : , VDN, QMIX, COMA . , credit assignment reward disentanglement — , . — . SW , SW — . — , , .

Cooperative Reward Shaping

— , , . , , , λ:

( ) (Peysakhovich and Lerer, 2017; Lerer and Peysakhovich, 2019; Durugkar et al., 2020), , Cooperative Reward Shaping (CRS). . , “ ”. , , credit assignment. , .

, : , credit assignment . : , , — . , . , — — . — QMIX COMA!

? , . , . , SW -, . . , , BAROCCO — ?

. , — Eldorado (. 2). . — 1000 , +1. , -1. , . , . , .

Fig. 2. Wednesday Eldorado — . 2. Eldorado

BAROCCO : selfish ( ), CRS ( ), COMA ( + credit assignment, ). , . , .
BAROCCO , .. λ. , , .

Life expectancy (total for 2 agents) Gini index (less = fairer) — ( 2 ) ( = )

_{. 3. Eldorado. — . CRS BAROCCO λ=1 , . Selfish - , λ=0, BAROCCO CRS . — λ} _{BAROCCO. — , — , . — .}

BAROCCO ( ), 1000 2000 . , ( ) , : , . , , . , .
BAROCCO , , . , , - .
CRS COMA . Eldorado , . - , ( 1000 ), , , . , , .
, λ ( ) . 0.5. .

λ. , , -, ( ), -, — . , . , reciprocity (), (Eccles et al., 2019; Lerer and Peysakhovich, 2019). , , . , .

: . , , . , , , , .

Personal or Social? How to achieve cooperation in a multi-agent environment

Who are we

Cooperative Reward Shaping

More articles: