We have finally published our set of high quality pre-trained speech recognition models (i.e. comparable in quality to Google's premium models ) for the following languages:

English;
German;
Spanish;

You can find our models in our repository along with examples and metrics for quality and speed. We also tried to make getting started with our models as simple as possible - we posted examples on Collab and checkpoints for PyTorch, ONNX and TensorFlow. Models can also be uploaded via TorchHub.

	PyTorch	ONNX	TensorFlow	Quality
English (en_v1)	✓	✓	✓	link
German (de_v1)	✓	✓	✓	link
Spanish (es_v1)	✓	✓	✓	link

Why is it important

Speech recognition has traditionally had high barriers to entry for a number of reasons:

Data is difficult to collect;
Markup for a comparable data unit is much more expensive than in computer vision;
High requirements for computing power and outdated technologies;

Here is a list of common problems faced by existing speech recognition solutions prior to our release:

Research in this area is usually done with enormous computing power;
- , " ", ;
, - ;

- , ( ). :

- ;
;
;
- ;
, , ;

—

, . :

;
. , , ;
("1 ");

, — 50 .

— 10-20 .

.

Github
Quality metrics
Examples on Colab

We have published modern STT models comparable in quality to Google

Why is it important

—

More articles: