We have finally published our set of high quality pre-trained speech recognition models (i.e. comparable in quality to Google's premium models ) for the following languages:
- English;
- German;
- Spanish;
You can find our models in our repository along with examples and metrics for quality and speed. We also tried to make getting started with our models as simple as possible - we posted examples on Collab and checkpoints for PyTorch, ONNX and TensorFlow. Models can also be uploaded via TorchHub.
PyTorch | ONNX | TensorFlow | Quality | Colab | |
---|---|---|---|---|---|
English (en_v1) | ✓ | ✓ | ✓ | link | |
German (de_v1) | ✓ | ✓ | ✓ | link | |
Spanish (es_v1) | ✓ | ✓ | ✓ | link |
Why is it important
Speech recognition has traditionally had high barriers to entry for a number of reasons:
- Data is difficult to collect;
- Markup for a comparable data unit is much more expensive than in computer vision;
- High requirements for computing power and outdated technologies;
Here is a list of common problems faced by existing speech recognition solutions prior to our release:
- Research in this area is usually done with enormous computing power;
- - , " ", ;
- , - ;
- - ;
- ;
- ;
- - ;
- , , ;
—
, . :
- ;
- . , , ;
- ("1 ");
, — 50 .
— 10-20 .
.