Reformer - Effective Transformer



Understanding sequentially organized data - be it language, music or video - is difficult, especially when it is highly dependent on the context that surrounds it. For example, if a person or any object disappears from the field of view on a video recording and reappears after a significant period of time, many models will forget how he looked. In the realm of language processing, long short-term memory (LSTM ) neural networks provide sufficient context to successfully translate sentence by sentence . In this case, the context window (i.e., the coverage of the data that the model takes into account when translating) can contain from ten to one hundred words. Newer Transformer modelhas not only improved the quality of consecutive translation, but can be used to generate entire Wikipedia articles by summarizing multiple documents. This is possible due to the fact that the Transformer enlarged the context window to a thousand words. In addition, such a vast considered context allows using the Transformer to process not only text, but also pixels or musical notes, on the basis of which images or music can be generated .



. , . , 100 , 100 100 , 10 , . . , , ( – ). , , , .



Reformer – , 1 16 . Reformer , : (locality-sensitive-hashing, LSH), , (reversible residual layers) .





, – ? LSH : , , -, . , , ( ), . , . , , , , – (chunks), . ( ) , .



image3



: Reformer , ( , ), . LSH , . .





LSH , . GPU, , , . , . , .



, Reformer', : , , . , , . , . , , . , , ; . , , , .



image4



: (a) . (b) , . () , .



Reformer'



Reformer' , 1 16 . , Reformer , . , Reformer' .



, – . Colab- Reformer' . , , Reformer (. ).



image5



: , Reformer'. : «» . Imagenet64.



Reformer' , . Reformer . , Colab- « ». , , , Reformer, .





, Reformer , . , , Reformer', . , . , Reformer'. Colab– , .





  • — Nikita Kitaev, Łukasz Kaiser



All Articles