Face control for lemons and Martian strawberries: how I got an internship at Rosselkhozbank after participating in the DS-competition

I think there are not many people left who have not heard about hackathons and Data Science competitions. I heard about them six months ago. Participating in everything that I saw (and even winning something), I could not pass by AgroCode 2020, organized by the Russian Agricultural Bank. I managed to get into the top of the best participants in several directions, and in one direction I took a prize at all. Thanks to these achievements, I became a Data Science specialist at the Center for the Development of Financial Technologies of the Russian Agricultural Bank. And how I did it - read below.





The main agro-coding of the country

To begin with, I will say a few words about the event itself. AgroCode 2020 brought together many people who are not indifferent to new technologies in agriculture. It consisted of several activities:





  1. Agro Data Science Cup data analysis competition with 2 tasks:









    • .





  2. Agro Hack 6 :

























    • .





      , , . 10 .





  3. Agro Idea, .





, , . , , , . . DS- . -10, - 2 !





. 17 .





?

: , ID , , ( ) 365 .





Dataset for the vegetation index problem

F1- sklearn ( average="weighted").





, . : , . .





? ?

, , NDVI โ€”





NDVI = \ frac {NIR-RED} {NIR + RED}





, 4 : RGB . , RED โ€” , NIR โ€” .





Image of a field with ndvi counted
ndvi

?

  • -, 45 , 279 . : - , () , .





  • -, , - ( - ). , .





  • -, . , , . .





. , , . - , , . , .





โ€ฆ , , . , . .





ID, . . - . . : 2 4 .





?

Distribution of crops in training data

, . - KFold StratifiedKFold , . . , . , . -. .





, , CatBoost. , , , :





params = {
  'iterations': 2000, 
  'depth': 6, 
  'early_stopping_rounds': 500, 
  'l2_leaf_reg': 5, 
  'bagging_temperature': 1, 
  'random_seed': 17, 
  'class_names': classes, 
  'auto_class_weights': 'Balanced', 
  'eval_metric': 'TotalF1', 
  'loss_function': 'MultiClassOneVsAll', 
  'task_type': 'GPU', 
  'devices': '0:1', 
  'verbose': 2000 
}
      
      



โ€œBalancedโ€ โ€œMultiClassOneVsAllโ€. . . , , , random_seed . - . , , . , , .





Predicted crop distribution in test data

18 . , , . , 18 2 . , - . โ€” .





The final leaderboard of the first task

: . . , , , .





( ) . , .





Which lemon will you choose?
?

. โ€” 1056 1056 .jpg. . , , . . : https://www.kaggle.com/maciejadamiak/lemons-quality-controldataset





ROC-AUC. :





def score(y_true, y_preds):
  table = y_true.merge(y_preds, left_on='image_id', right_on='image_id')
  m = keras.metrics.AUC(curve='ROC')
  m.update_state(table.iloc[:, 1:10], table.iloc[:, 10:])
  return m.result().numpy() 
      
      



.





csv-. .py , -. , . , .





20 . , . ? .





, , . , . , , , , .





. , .





aug = ImageDataGenerator(
  rotation_range=40,
  width_shift_range=0.1,
  height_shift_range=0.1,
  brightness_range=[0.5, 1],
  shear_range=0.2,
  channel_shift_range=0.2,
  zoom_range=0.2,
  horizontal_flip=True,
  vertical_flip=True,
  fill_mode="nearest"
) 
      
      



?

, . backbone VGG16, AveragePooling, (Dense) Dropout.





model = VGG16(weights=None, include_top=False, input_shape=[image_size, image_size, 3]) 
x = AveragePooling2D(pool_size=(2, 2))(model.output) 
x = Flatten()(x) 
x = Dense(256, activation='relu')(x) 
x = Dropout(0.5)(x) 
output = Dense(9, activation='sigmoid')(x)
      
      



, , .





KFold -. , , .





9 , .





Final leaderboard for the second task

. โ€œ โ€ . , . .





Picture from our presentation

:





  1. , , .





  2. โ€“ .





  3. ( , , , ..).





5 , , , .





3 . .





Dataset with sensor readings
Dataset with yield values

: -.





, , , .





, :





  • , 19 30 , 23 25 . .





  • . , .





  • - . , .





  • 7.





, . . , . :





Temperature regime in the greenhouse

: - , - , - ( ).





-? . , , . , , โ€” . :





?

, , , . , , Agro Hack :) ( , ).





, ? - , ! , .





!





, , , , , , .








All Articles