A ruby โ€‹โ€‹script that recognizes text in a video file using the Yandex SpeechKit Yandex service (long audio)

To begin with, I just recently started to immerse myself in IT in general and Ruby in particular, and I was given this assignment as a test one to get a place for an internship. I will say in advance that there is still something to smooth out and improve, but in general the code works.





However, perhaps my experience can be useful to someone, so I present to your attention a detailed description of the creation of this script. IMPORTANT: My operating system is Fedora 32, I also use the bundler pre-installed in the system. So if you also use linux-like systems, read on.





The essence of the task: there is a video file in mp4 format, you need to write a script in pure ruby, which will convert this file to audio, send it to the Yandex SpeechKit Yandex service and, having received the answer, create a text file.





Before starting work, you should carefully study the Yandex documentation for pitfalls and such nuances as the audio format read by Yandex (and, by the way, there are only two of them: OggOpus and LPCM).





Preparatory stage:





Now you can proceed to drawing up a work plan:





  1. Converting a file from mp4 format to audio using ffmpeg utility





  2. Send the resulting file to the Yandex Service Object bucket





  3. Send the received response with the address of the file in the bucket to SpeechKit





  4. Get the answer and convert it to a text file





Next, we will move along the points with explanations of interesting (and not always obvious places)





1. Converting a file from mp4 format to audio using the ffmpeg utility





To format the video file, install ffmpeg into our system





sudo dnf install ffmpeg







And, if you haven't done so yet, put the video file requiring formatting into the folder of our small project (in my case it will be test_task)





In the same folder, create a rubish file (for example, run.rb), in which we will write a script:





touch run.rb







ruby (bash-, : system, exec, popen, ` `) (https://www.rubyguides.com/2018/12/ruby-system/)





` `:





`ffmpeg -i test.mp4 -vn -acodec libopus audio.ogg`





:





test.mp4 โ€“ .





โ€˜-vnโ€™ , , ( ).





libopus , SpeechKit OggOpus .





audio.ogg โ€“ , ( ogg)





, , .





2. Yandex Service Object





.





, , Yandex Object Storage ( ) .





Yandex Object Storage HTTP API, Amazon S3, , Amazon S3.





Amazon S3 aws-sdk-s3, Yandex Object Storage.





aws-sdk-s3. Gemfile :





 source 'https://rubygems.org'
 gem 'aws-sdk-s3'
      
      



gem 'aws-sdk-s3' :





bundle install







run.rb, :





require 'aws-sdk-s3'







.





API- .





: Object Storage Message Queue.





, - , dotenv. , .env , .





, Gemfile:





gem 'dotenv'







:





bundle install







:





require 'dotenv/load'







.env , :





AWS_ACCESS_KEY_ID=XXXXXXXXXXXXXX

AWS_SECRET_ACCESS_KEY=XXXXXXXXXXXXXX
      
      



. C aws, :





 Aws.config.update(

   region: 'ru-central1',

   credentials: Aws::Credentials.new(ENV['AWS_ACCESS_KEY_ID'], ENV['AWS_SECRET_ACCESS_KEY'])

 )
      
      



:





region: 'ru-central1',

   credentials: Aws::Credentials.new(ENV['AWS_ACCESS_KEY_ID'], 
   ENV['AWS_SECRET_ACCESS_KEY'])

 s3 = Aws::S3::Client.new(endpoint: "https://storage.yandexcloud.net")
      
      



, , . (, (puts pp), )





 File.open('audio.ogg', 'r') do |file|

   pp = s3.put_object({

     bucket: 'teststask',

     key: 'audio.ogg',

     body: file

   })

     puts pp

 end
      
      



run.rb ( ).





3. SpeechKit





http httparty (https://github.com/jnunemaker/httparty/blob/master/examples/basic.rb)





.





, Gemfile:





gem 'httparty'







:





bundle install







:





require 'httparty'







.





, , . , , : https://storage.yandexcloud.net/<->/<-->





, :





https://storage.yandexcloud.net/teststask/audio.ogg







, , )) !





post SpeechKit.





, API- , . .





: API- (API- IAM-)





options.





 options = {

   headers: {"Authorization" => "Api-Key #{ENV['API_KEY']}"},

   body: {

     "config" => {

         "specification" => {

             "languageCode" => "ru-RU"

         }

     },

     "audio" => {

         "uri" => "https://storage.yandexcloud.net/teststask/audio.ogg"

     }

   }.to_json

 }
      
      







 response = HTTParty.post('https://transcribe.api.cloud.yandex.net/speech/stt/v2/longRunningRecognize', options).to_h
      
      







, .





.





:





 option = {

    headers: {"Authorization" => "Api-Key #{ENV['API_KEY']}"}

 }
      
      



, , , . ( #{response['id']} "https://operation.api.cloud.yandex.net/operations/#{response['id']}").





2 , , )





 done = false

 until done

   yandex_answer = HTTParty.get("https://operation.api.cloud.yandex.net/operations/#{response['id']}", option).to_h

   puts yandex_answer

   done = yandex_answer['done']

   sleep 2

 end
      
      







4.





ruby, . :





 yandex_array = yandex_answer["response"]["chunks"]

 yandex_text = [] 

 yandex_array.each do |elem|

   yandex_text << elem["alternatives"].first["text"]

 end
      
      







pp yandex_text.uniq!







bash- :





`touch test.txt`







:





File.open("test.txt", 'w') { |file| file.write(":#{yandex_text.join(' ')}") }







In total, we have three files: .env (with environment variables), Gemfile (with three gems: httparty, aws-sdk-s3, dotenv), run.rb (with code).





Voila, you have a little script to format your video to text.








All Articles