Face control for lemons and Martian strawberries: how I got an internship at Rosselkhozbank after participating in the DS-competition

I think there are not many people left who have not heard about hackathons and Data Science competitions. I heard about them six months ago. Participating in everything that I saw (and even winning something), I could not pass by AgroCode 2020, organized by the Russian Agricultural Bank. I managed to get into the top of the best participants in several directions, and in one direction I took a prize at all. Thanks to these achievements, I became a Data Science specialist at the Center for the Development of Financial Technologies of the Russian Agricultural Bank. And how I did it - read below.

The main agro-coding of the country

To begin with, I will say a few words about the event itself. AgroCode 2020 brought together many people who are not indifferent to new technologies in agriculture. It consisted of several activities:

  1. Agro Data Science Cup data analysis competition with 2 tasks:

  2. Agro Hack 6 :

      , , . 10 .

  3. Agro Idea, .

, , . , , , . . DS- . -10, - 2 !

. 17 .


: , ID , , ( ) 365 .

Dataset for the vegetation index problem

F1- sklearn ( average="weighted").

, . : , . .

? ?

, , NDVI โ€”

NDVI = \ frac {NIR-RED} {NIR + RED}

, 4 : RGB . , RED โ€” , NIR โ€” .

Image of a field with ndvi counted


  • -, 45 , 279 . : - , () , .

  • -, , - ( - ). , .

  • -, . , , . .

. , , . - , , . , .

โ€ฆ , , . , . .

ID, . . - . . : 2 4 .


Distribution of crops in training data

, . - KFold StratifiedKFold , . . , . , . -. .

, , CatBoost. , , , :

params = {
  'iterations': 2000, 
  'depth': 6, 
  'early_stopping_rounds': 500, 
  'l2_leaf_reg': 5, 
  'bagging_temperature': 1, 
  'random_seed': 17, 
  'class_names': classes, 
  'auto_class_weights': 'Balanced', 
  'eval_metric': 'TotalF1', 
  'loss_function': 'MultiClassOneVsAll', 
  'task_type': 'GPU', 
  'devices': '0:1', 
  'verbose': 2000 

โ€œBalancedโ€ โ€œMultiClassOneVsAllโ€. . . , , , random_seed . - . , , . , , .

Predicted crop distribution in test data

18 . , , . , 18 2 . , - . โ€” .

The final leaderboard of the first task

: . . , , , .

( ) . , .

Which lemon will you choose?

. โ€” 1056 1056 .jpg. . , , . . : https://www.kaggle.com/maciejadamiak/lemons-quality-controldataset


def score(y_true, y_preds):
  table = y_true.merge(y_preds, left_on='image_id', right_on='image_id')
  m = keras.metrics.AUC(curve='ROC')
  m.update_state(table.iloc[:, 1:10], table.iloc[:, 10:])
  return m.result().numpy() 


csv-. .py , -. , . , .

20 . , . ? .

, , . , . , , , , .

. , .

aug = ImageDataGenerator(
  brightness_range=[0.5, 1],


, . backbone VGG16, AveragePooling, (Dense) Dropout.

model = VGG16(weights=None, include_top=False, input_shape=[image_size, image_size, 3]) 
x = AveragePooling2D(pool_size=(2, 2))(model.output) 
x = Flatten()(x) 
x = Dense(256, activation='relu')(x) 
x = Dropout(0.5)(x) 
output = Dense(9, activation='sigmoid')(x)

, , .

KFold -. , , .

9 , .

Final leaderboard for the second task

. โ€œ โ€ . , . .

Picture from our presentation


  1. , , .

  2. โ€“ .

  3. ( , , , ..).

5 , , , .

3 . .

Dataset with sensor readings
Dataset with yield values

: -.

, , , .

, :

  • , 19 30 , 23 25 . .

  • . , .

  • - . , .

  • 7.

, . . , . :

Temperature regime in the greenhouse

: - , - , - ( ).

-? . , , . , , โ€” . :


, , , . , , Agro Hack :) ( , ).

, ? - , ! , .


, , , , , , .

