Hey! My name is Dmitry, and I want to tell you about our article “Balancing Rational and Other-Regarding Preferences in Cooperative-Competitive Environments”, which was recently admitted to the AAMAS (A *) conference.
In this paper, we explore how a group of agents can be trained to achieve their own goals in mixed environments without interfering or even helping each other. We analyzed several existing solutions and offered our own. The post turned out to be high-level, the technical details are in the article .
Who are we
My name is Dmitry Ivanov , I am a third-year graduate student in economics at St. Petersburg HSE. I work in the Agent Systems and Reinforcement Learning group at JetBrains Research, as well as at the International Laboratory for Game Theory and Decision Making at HSE.
, 1 “ ” — -, , . JetBrains Research, -- .
, : , . , . — (. 1).
. , : , . , 3 . , 2 . , , , 4 . : , , .. . .
— , (Peysakhovich and Lerer, 2017). , . . , — ‘Cooperate’ ‘Defect’. , . Sequential Social Dilemma (Leibo et al., 2017), , , .
, , — ( , ?) , . , ? : ?
: , (Rashid et al., 2018). : , . . (SW = Social Welfare):
SW , , , (). — , . , . “” ? (. 1). , , Defect-Cooperate Cooperate-Cooperate: 4 , , ! , , SW , — , . , ,
, : , VDN, QMIX, COMA . , credit assignment reward disentanglement — , . — . SW , SW — . — , , .
Cooperative Reward Shaping
— , , . , , , λ:
( ) (Peysakhovich and Lerer, 2017; Lerer and Peysakhovich, 2019; Durugkar et al., 2020), , Cooperative Reward Shaping (CRS). . , “ ”. , , credit assignment. , .
, : , credit assignment . : , , — . , . , — — . — QMIX COMA!
? , . , . , SW -, . . , , BAROCCO — ?
. , — Eldorado (. 2). . — 1000 , +1. , -1. , . , . , .
:
BAROCCO : selfish ( ), CRS ( ), COMA ( + credit assignment, ). , . , .
BAROCCO , .. λ. , , .
. 3. Eldorado. — . CRS BAROCCO λ=1 , . Selfish - , λ=0, BAROCCO CRS . — λ BAROCCO. — , — , . — .
:
BAROCCO ( ), 1000 2000 . , ( ) , : , . , , . , .
BAROCCO , , . , , - .
CRS COMA . Eldorado , . - , ( 1000 ), , , . , , .
, λ ( ) . 0.5. .
λ. , , -, ( ), -, — . , . , reciprocity (), (Eccles et al., 2019; Lerer and Peysakhovich, 2019). , , . , .
: . , , . , , , , .