Why test article code in machine learning. Let's analyze an example

In this article, I'll show you with an example how not paying attention to the code can lead to incorrect results in data mining. 





There is one great course [1] where students learn to do machine learning research. By the end of the semester, students prepare an article. At the lectures, they are simultaneously told how to do it. Typically, preparing an article involves conducting an experiment with data. Based on the results of the semester, several papers are already ready for submission to the journal.





So, in the MLDev project, we help create a template repository for the students of this course. Such a template, or in other words, a template, will allow you to start working on an article faster and less study the various necessary tools.





While I was looking through the works for the spring of 2020, I was interested in an unusual graph in the work “Analysis of the properties of an ensemble of locally approximating models” [2]. The graph with the results of the experiment clearly shows a gap in the values ​​of the presented dependence. But based on the choice of the initial data and the properties of this dependence, there should not be a gap in this place.





It is curious to check why such an unexpected result was obtained in the work.





Repeating the experiment

, . , , . . , .





, , . , . .





. , , [2]. . .





  Google Colab . MixtureLib



GitHub [3]. MixtureLib



. . , .









!git clone https://github.com/andriygav/MixtureLib.git
!python3 -m pip install MixtureLib/src/.
from mixturelib.localmodels import EachModelLinear
from mixturelib.hypermodels import HyperExpertNN, HyperModelDirichlet
from mixturelib.mixture import MixtureEM
      
      



. , . 





correlations = []
sigmas = [0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,
         1.0,0.03,0.04,0.05,0.06,0.07,0.08,0.09,
         0.15,0.25,0.35,0.45,0.55,0.65,0.75,0.85,
         0.95,0.11,0.12,0.13,0.14,0.15,0.22,0.27,
         0.32,0.37,0.42,0.47,0.52,0.57,0.62,0.67,
         0.72,0.77,0.82,0.87,0.92,0.97]
      
      



, EachModelLinear



, . . MixtureEM



HyperNetEM



, . , . , .





torch.random.manualseed(42)
first_model = EachModelLinear(input_dim=2)
second_model = EachModelLinear(input_dim=2)
list_of_models = [firstmodel, second_model]

HpMd = HyperExpertNN(inputdim=2, hiddendim=5,outputdim=2, epochs=1000)
mixture = MixtureEM(HyperParameters={'beta': 1.},
                    HyperModel=HpMd,
                    ListOfModels=listofmodels,
                    modeltype='sample')
      
      



. . 





for i, sigma in enumerate(sigmas):
    # 
    x1 = np.random.normal(0, 1, (200, 1))
    x2 = np.random.normal(0, 1, (200, 1))
    y1 = np.array([f1(x) for x in x1])
    y2 = np.array([f2(x) for x in x2])
    s1 = np.random.normal(0, sigma, 200).reshape((200, 1))
    s2 = np.random.normal(0, sigma, 200).reshape((200, 1))
    X1 = np.hstack([x1, s1]) 
    X2 = np.hstack([s2, x2])
    X = np.vstack([X1, X2])
    Y = np.hstack([y1, y2])
    realsecondw = np.array([[10.], [0.]])
    realfirstw = np.array([[0.], [50.]])
    Xtr = torch.FloatTensor(X)
    Ytr = torch.FloatTensor(Y).view([-1,1])

    # 
    # …
      
      







, . . .





for i, sigma in enumerate(sigmas):
    # 
    # ... 
    
    # 
    torch.random.manual_seed(42)
    mixture.fit(X_tr, Y_tr)
    predicted_first_w = mixture.ListOfModels[0].W.numpy()
    predicted_second_w = mixture.ListOfModels[1].W.numpy()
    weights = []
    weights.append([predicted_first_w[0][0], predicted_first_w[1][0]])
    weights.append([predicted_second_w[0][0], predicted_second_w[1][0]])
    #     ,  
    Y1 = X.dot(weights[0])
    Y2 = X.dot(weights[1])
    correlations.append(cor_Pearson(Y1, Y2))

      
      



, . , . , . , , . , .





Calculation without change in the forward direction on the sigma list
Backward calculation on a sigma list

, , . . , .





. , , , . , , , . , .





fit()



MixtureEM



. , , . . .





? fit()



. MixtureLib



, fit()



. .





, fit()



fit()



scikit-learn



, . partial_fit()



(. ).





, , . . , .





, . fit()



.





, .





  • , , . , .





  • , . , .





  • . , , ?





  • , , . , .





  • , .





[1] “ ”, - https://m1p.org





[2] - https://github.com/Intelligent-Systems-Phystech/2020_Project-51





[3] MixtureLib



- https://github.com/andriygav/MixtureLib





[4] - https://colab.research.google.com/drive/1DZoJN32EpTZVSi2N3BduRCRf-ZST8snP#scrollTo=1JopTLX4eMnX





PS Of course, we contacted and discussed what we found with the authors of the article and the library MixtureLib



. The error was confirmed. The article was made back in March, so in December it is already difficult to reconstruct exactly how the original graph was obtained and how the experiment was carried out. Then you can guess if the experiment was carried out in parts. Especially if you pay attention to the lack of graphs in the original notebook and this list of sigma:





sigmas = [0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,
          1.0,0.03,0.04,0.05,0.06,0.07,0.08,0.09,
          0.15,0.25,0.35,0.45,0.55,0.65,0.75,0.85,
          0.95,0.11,0.12,0.13,0.14,0.15,0.22,0.27,
          0.32,0.37,0.42,0.47,0.52,0.57,0.62,0.67,
          0.72,0.77,0.82,0.87,0.92,0.97]
      
      










All Articles