We have published modern STT models comparable in quality to Google



We have finally published our set of high quality pre-trained speech recognition models (i.e. comparable in quality to Google's premium models ) for the following languages:



  • English;
  • German;
  • Spanish;


You can find our models in our repository along with examples and metrics for quality and speed. We also tried to make getting started with our models as simple as possible - we posted examples on Collab and checkpoints for PyTorch, ONNX and TensorFlow. Models can also be uploaded via TorchHub.



PyTorch ONNX TensorFlow Quality Colab
English (en_v1) link Open in Colab
German (de_v1) link Open in Colab
Spanish (es_v1) link Open in Colab


Why is it important



Speech recognition has traditionally had high barriers to entry for a number of reasons:



  • Data is difficult to collect;
  • Markup for a comparable data unit is much more expensive than in computer vision;
  • High requirements for computing power and outdated technologies;


Here is a list of common problems faced by existing speech recognition solutions prior to our release:



  • Research in this area is usually done with enormous computing power;
  • - , " ", ;
  • , - ;


- , ( ). :



  • - ;
  • ;
  • ;
  • - ;
  • , , ;




, . :



  • ;
  • . , , ;
  • ("1 ");




, — 50 .

— 10-20 .

.








All Articles