To begin with, I just recently started to immerse myself in IT in general and Ruby in particular, and I was given this assignment as a test one to get a place for an internship. I will say in advance that there is still something to smooth out and improve, but in general the code works.
However, perhaps my experience can be useful to someone, so I present to your attention a detailed description of the creation of this script. IMPORTANT: My operating system is Fedora 32, I also use the bundler pre-installed in the system. So if you also use linux-like systems, read on.
The essence of the task: there is a video file in mp4 format, you need to write a script in pure ruby, which will convert this file to audio, send it to the Yandex SpeechKit Yandex service and, having received the answer, create a text file.
Before starting work, you should carefully study the Yandex documentation for pitfalls and such nuances as the audio format read by Yandex (and, by the way, there are only two of them: OggOpus and LPCM).
Preparatory stage:
Now you can proceed to drawing up a work plan:
Converting a file from mp4 format to audio using ffmpeg utility
Send the resulting file to the Yandex Service Object bucket
Send the received response with the address of the file in the bucket to SpeechKit
Get the answer and convert it to a text file
Next, we will move along the points with explanations of interesting (and not always obvious places)
1. Converting a file from mp4 format to audio using the ffmpeg utility
To format the video file, install ffmpeg into our system
sudo dnf install ffmpeg
And, if you haven't done so yet, put the video file requiring formatting into the folder of our small project (in my case it will be test_task)
In the same folder, create a rubish file (for example, run.rb), in which we will write a script:
touch run.rb
ruby (bash-, : system, exec, popen, ` `) (https://www.rubyguides.com/2018/12/ruby-system/)
` `:
`ffmpeg -i test.mp4 -vn -acodec libopus audio.ogg`
:
test.mp4 โ .
โ-vnโ , , ( ).
libopus , SpeechKit OggOpus .
audio.ogg โ , ( ogg)
, , .
2. Yandex Service Object
.
, , Yandex Object Storage ( ) .
Yandex Object Storage HTTP API, Amazon S3, , Amazon S3.
Amazon S3 aws-sdk-s3, Yandex Object Storage.
aws-sdk-s3. Gemfile :
source 'https://rubygems.org'
gem 'aws-sdk-s3'
gem 'aws-sdk-s3' :
bundle install
run.rb, :
require 'aws-sdk-s3'
API- .
: Object Storage Message Queue.
, - , dotenv. , .env , .
, Gemfile:
gem 'dotenv'
:
bundle install
:
require 'dotenv/load'
.env , :
AWS_ACCESS_KEY_ID=XXXXXXXXXXXXXX AWS_SECRET_ACCESS_KEY=XXXXXXXXXXXXXX
. C aws, :
Aws.config.update(
region: 'ru-central1',
credentials: Aws::Credentials.new(ENV['AWS_ACCESS_KEY_ID'], ENV['AWS_SECRET_ACCESS_KEY'])
)
:
region: 'ru-central1',
credentials: Aws::Credentials.new(ENV['AWS_ACCESS_KEY_ID'],
ENV['AWS_SECRET_ACCESS_KEY'])
s3 = Aws::S3::Client.new(endpoint: "https://storage.yandexcloud.net")
, , . (, (puts pp), )
File.open('audio.ogg', 'r') do |file|
pp = s3.put_object({
bucket: 'teststask',
key: 'audio.ogg',
body: file
})
puts pp
end
run.rb ( ).
3. SpeechKit
http httparty (https://github.com/jnunemaker/httparty/blob/master/examples/basic.rb)
, Gemfile:
gem 'httparty'
:
bundle install
:
require 'httparty'
.
, , . , , : https://storage.yandexcloud.net/<->/<-->
, :
https://storage.yandexcloud.net/teststask/audio.ogg
, , )) !
post SpeechKit.
: API- (API- IAM-)
options.
options = {
headers: {"Authorization" => "Api-Key #{ENV['API_KEY']}"},
body: {
"config" => {
"specification" => {
"languageCode" => "ru-RU"
}
},
"audio" => {
"uri" => "https://storage.yandexcloud.net/teststask/audio.ogg"
}
}.to_json
}
response = HTTParty.post('https://transcribe.api.cloud.yandex.net/speech/stt/v2/longRunningRecognize', options).to_h
, .
.
:
option = {
headers: {"Authorization" => "Api-Key #{ENV['API_KEY']}"}
}
, , , . ( #{response['id']} "https://operation.api.cloud.yandex.net/operations/#{response['id']}").
2 , , )
done = false
until done
yandex_answer = HTTParty.get("https://operation.api.cloud.yandex.net/operations/#{response['id']}", option).to_h
puts yandex_answer
done = yandex_answer['done']
sleep 2
end
4.
ruby, . :
yandex_array = yandex_answer["response"]["chunks"]
yandex_text = []
yandex_array.each do |elem|
yandex_text << elem["alternatives"].first["text"]
end
pp yandex_text.uniq!
bash- :
`touch test.txt`
:
File.open("test.txt", 'w') { |file| file.write(":#{yandex_text.join(' ')}") }
In total, we have three files: .env (with environment variables), Gemfile (with three gems: httparty, aws-sdk-s3, dotenv), run.rb (with code).
Voila, you have a little script to format your video to text.