Speech Recognition on ODROID

Share here your ideas for new projects
n2fan
Posts: 10
Joined: Fri Jan 08, 2021 6:47 pm
languages_spoken: english
ODROIDs: N2
Has thanked: 0
Been thanked: 1 time
Contact:

Re: Speech Recognition on ODROID

Post by n2fan »

server on N2 works fine; I was able to decode max 3 processes in parallel in real time.
Last edited by n2fan on Tue Jan 12, 2021 7:21 pm, edited 1 time in total.

nshmyrev
Posts: 39
Joined: Sat Dec 12, 2020 10:14 pm
languages_spoken: english
Has thanked: 12 times
Been thanked: 8 times
Contact:

Re: Speech Recognition on ODROID

Post by nshmyrev »

> thanks, will try the server today. did you make some tests how many channels can be processed in parallel on Intel hardware or any other devices/processors?

Modern i7 server 8 cores processes up to 20 processes in parallel, Odroid will process less of course.

> vosk-server/client-samples/python/tts-test.py is it a text to speech engine included?

It is not fully functional, you can try https://github.com/TensorSpeech/TensorFlowTTS instead

n2fan
Posts: 10
Joined: Fri Jan 08, 2021 6:47 pm
languages_spoken: english
ODROIDs: N2
Has thanked: 0
Been thanked: 1 time
Contact:

Re: Speech Recognition on ODROID

Post by n2fan »

must revise.
It's not that simple with the server;

current state: if the system is constantly running, Odroid can't even process one channel in real time. I have the impression that the processing speed depends on the audio. If there is a lot of noise or music in the background, performance goes down. It works faster if there is only speech without background noise. All cores are not 100% utilised with the server.

it may also be that other processes have an impact, so that processing is not uniform. Would be very grateful for other opinions/impressions. If TTS is added, Odroid is overloaded (current state).

nshmyrev
Posts: 39
Joined: Sat Dec 12, 2020 10:14 pm
languages_spoken: english
Has thanked: 12 times
Been thanked: 8 times
Contact:

Re: Speech Recognition on ODROID

Post by nshmyrev »

n2fan wrote:
Tue Jan 12, 2021 8:28 pm
current state: if the system is constantly running, Odroid can't even process one channel in real time. I have the impression that the processing speed depends on the audio. If there is a lot of noise or music in the background, performance goes down. It works faster if there is only speech without background noise. All cores are not 100% utilised with the server.
Which model is that? With lightweight model it should be realtime.

n2fan
Posts: 10
Joined: Fri Jan 08, 2021 6:47 pm
languages_spoken: english
ODROIDs: N2
Has thanked: 0
Been thanked: 1 time
Contact:

Re: Speech Recognition on ODROID

Post by n2fan »

I tested with the bigger model (vosk-model-small-en-us-0.15, vosk-model-en-us-daanzu-20200905-lgraph) because smaller model has not worked;

odroid@odroid:~/vosk/vosk-server/vosk-server/websocket$ python3 ./asr_server.py /home/odroid/vosk/vosk-api/python/example/model
LOG (VoskAPI:ReadDataFiles():vosk/model.cc:194) Decoding params beam=10 max-active=3000 lattice-beam=2
LOG (VoskAPI:ReadDataFiles():vosk/model.cc:197) Silence phones 1:2:3:4:5:6:7:8:9:10
LOG (VoskAPI:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 0 orphan nodes.
LOG (VoskAPI:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 0 orphan components.
LOG (VoskAPI:CompileLooped():nnet-compile-looped.cc:345) Spent 0.0968211 seconds in looped compilation.
LOG (VoskAPI:ReadDataFiles():vosk/model.cc:221) Loading i-vector extractor from /home/odroid/vosk/vosk-api/python/example/model/ivector/final.ie
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (VoskAPI:ReadDataFiles():vosk/model.cc:251) Loading HCL and G from /home/odroid/vosk/vosk-api/python/example/model/graph/HCLr.fst /home/odroid/vosk/vosk-api/python/example/model/graph/Gr.fst
LOG (VoskAPI:ReadDataFiles():vosk/model.cc:273) Loading winfo /home/odroid/vosk/vosk-api/python/example/model/graph/phones/word_boundary.int
ERROR (VoskAPI:MaybeCreateResampler():online-feature.cc:99) Sampling frequency mismatch, expected 16000, got 8000
Perhaps you want to use the options --allow_{upsample,downsample}
terminate called after throwing an instance of 'kaldi::KaldiFatalError'
what(): kaldi::KaldiFatalError
Aborted (core dumped)
odroid@odroid:~/vosk/vosk-server/vosk-server/websocket$

ffmpeg provides 16000 samples

Output #0, wav, to '/tmp/voice.wav':
Metadata:
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
Metadata:
encoder : Lavc58.35.100 pcm_s16le
av_interleaved_write_frame(): Broken pipe

the same result was with the included test file - python3 ./test.py ./test16k.wav

nshmyrev
Posts: 39
Joined: Sat Dec 12, 2020 10:14 pm
languages_spoken: english
Has thanked: 12 times
Been thanked: 8 times
Contact:

Re: Speech Recognition on ODROID

Post by nshmyrev »

n2fan wrote:
Tue Jan 12, 2021 9:55 pm
I tested with the bigger model (vosk-model-small-en-us-0.15, vosk-model-en-us-daanzu-20200905-lgraph) because smaller model has not worked;
ERROR (VoskAPI:MaybeCreateResampler():online-feature.cc:99) Sampling frequency mismatch, expected 16000, got 8000
Perhaps you want to use the options --allow_{upsample,downsample}
terminate called after throwing an instance of 'kaldi::KaldiFatalError'
what(): kaldi::KaldiFatalError
Aborted (core dumped)
odroid@odroid:~/vosk/vosk-server/vosk-server/websocket$
Run like this:

Code: Select all

VOSK_SAMPLE_RATE=16000 ./asr_server.py
Or edit default value of VOSK_SAMPLE_RATE from 8000 to 16000 inside asr_server.py

n2fan
Posts: 10
Joined: Fri Jan 08, 2021 6:47 pm
languages_spoken: english
ODROIDs: N2
Has thanked: 0
Been thanked: 1 time
Contact:

Re: Speech Recognition on ODROID

Post by n2fan »

I used VOSK_SAMPLE_RATE=16000 and it works fine. Many thanks for the support.
Still courious:
- how big is a difference by recognition between smaller and bigger model? May be you can point me to a publiction that is understandable for not voice recognition experts?
- is it true that the processor load is dependent on the sound environment, e.g. by music or noisy background it requires more resources/processor power
- is it possible to recognize not voice recognition events, e.g. dog in appartment, bell or telephone ring, music/speech from radio or tv (differentiation to external sounds), working of wash machine in the kitchen etc?

nshmyrev wrote:
Wed Jan 13, 2021 2:10 am
n2fan wrote:
Tue Jan 12, 2021 9:55 pm
I tested with the bigger model (vosk-model-small-en-us-0.15, vosk-model-en-us-daanzu-20200905-lgraph) because smaller model has not worked;
ERROR (VoskAPI:MaybeCreateResampler():online-feature.cc:99) Sampling frequency mismatch, expected 16000, got 8000
Perhaps you want to use the options --allow_{upsample,downsample}
terminate called after throwing an instance of 'kaldi::KaldiFatalError'
what(): kaldi::KaldiFatalError
Aborted (core dumped)
odroid@odroid:~/vosk/vosk-server/vosk-server/websocket$
Run like this:

Code: Select all

VOSK_SAMPLE_RATE=16000 ./asr_server.py
Or edit default value of VOSK_SAMPLE_RATE from 8000 to 16000 inside asr_server.py

nshmyrev
Posts: 39
Joined: Sat Dec 12, 2020 10:14 pm
languages_spoken: english
Has thanked: 12 times
Been thanked: 8 times
Contact:

Re: Speech Recognition on ODROID

Post by nshmyrev »

- how big is a difference by recognition between smaller and bigger model? May be you can point me to a publication that is understandable for not voice recognition experts?
Error rates for the models are listed on model page https://alphacephei.com/vosk/models:

For relatively clean but complex speech small model error rate is 10.38% means about 1 of 10 words is not recognized properly, big Daanzu model error rate is 9.28%, not very different to be honest but should be more in more complex conditions.

To get idea of the accuracy for your particular usecase you need to record test database.
- is it true that the processor load is dependent on the sound environment, e.g. by music or noisy background it requires more resources/processor power
Yes, noise takes more time to analyze accurately.
- is it possible to recognize not voice recognition events, e.g. dog in apartment, bell or telephone ring, music/speech from radio or tv (differentiation to external sounds), working of wash machine in the kitchen etc?
In theory it is possible, in practice it is a long way till it will be practical. There are competitions and specialized software doing that right now, very far from integration with Vosk or any other speech recognition toolkit

mad_ady
Posts: 9049
Joined: Wed Jul 15, 2015 5:00 pm
languages_spoken: english
ODROIDs: XU4, C1+, C2, C4, N1, N2, H2, Go, Go Advance
Location: Bucharest, Romania
Has thanked: 595 times
Been thanked: 573 times
Contact:

Re: Speech Recognition on ODROID

Post by mad_ady »

Do you think such recognition tasks could benefit from local NPUs? I think that NPUs will be the next big thing for SBCs, but without a standardized and accesible API, they may not be as useful...

nshmyrev
Posts: 39
Joined: Sat Dec 12, 2020 10:14 pm
languages_spoken: english
Has thanked: 12 times
Been thanked: 8 times
Contact:

Re: Speech Recognition on ODROID

Post by nshmyrev »

mad_ady wrote:
Thu Jan 14, 2021 3:15 am
Do you think such recognition tasks could benefit from local NPUs? I think that NPUs will be the next big thing for SBCs, but without a standardized and accesible API, they may not be as useful...
Yes, NPU will help a lot. Yes, standard API and integrations are big issues here. Most NPUs are not easily accessible.

L67GS
Posts: 340
Joined: Wed Apr 22, 2020 3:02 pm
languages_spoken: English, Jibberish, Pig Latin
ODROIDs: XU4, C1+,(3) C0's, and a whole big pile of accessories, VU7A Plus,, ect....
Has thanked: 105 times
Been thanked: 50 times
Contact:

Re: Speech Recognition on ODROID

Post by L67GS »

nshmyrev wrote:
Wed Jan 13, 2021 8:42 pm
- how big is a difference by recognition between smaller and bigger model? May be you can point me to a publication that is understandable for not voice recognition experts?
Error rates for the models are listed on model page https://alphacephei.com/vosk/models:

For relatively clean but complex speech small model error rate is 10.38% means about 1 of 10 words is not recognized properly, big Daanzu model error rate is 9.28%, not very different to be honest but should be more in more complex conditions.

To get idea of the accuracy for your particular usecase you need to record test database.
- is it true that the processor load is dependent on the sound environment, e.g. by music or noisy background it requires more resources/processor power
Yes, noise takes more time to analyze accurately.
- is it possible to recognize not voice recognition events, e.g. dog in apartment, bell or telephone ring, music/speech from radio or tv (differentiation to external sounds), working of wash machine in the kitchen etc?
In theory it is possible, in practice it is a long way till it will be practical. There are competitions and specialized software doing that right now, very far from integration with Vosk or any other speech recognition toolkit
Can't a voice model be trained by the user to increase accuracy for a given dialect?

User avatar
joerg
Posts: 1248
Joined: Tue Apr 01, 2014 2:14 am
languages_spoken: german, english, español
ODROIDs: C1, C1+, C2, N1, N2, C4
Location: Germany
Has thanked: 79 times
Been thanked: 163 times
Contact:

Re: Speech Recognition on ODROID

Post by joerg »

I have the PS3 Eye Cam now and dismounted the pcb.
The good thing is that the C1 kernel starts the driver:

Code: Select all

[112790.073860] usb 1-1.2: new high-speed USB device number 4 using dwc_otg
[112790.250242] gspca_main: v2.14.0 registered
[112790.260711] gspca_main: ov534-2.14.0 probing 1415:2000
[112792.224838] usbcore: registered new interface driver ov534
[112792.309699] usbcore: registered new interface driver snd-usb-audio
And with arecord I can record my voice:

Code: Select all

sudo arecord -D hw:1,0 -f S16_LE -c 4 -r 16000 -d 15 test.wav
Note that it must be given the channel count -c 4. If not, arecord gives error arecord: set_params:1349: Channels count non available.
I get a file with 1 stereo and 2 mono channels:
Bildschirmfoto vom 2021-01-14 18-37-48.png
Bildschirmfoto vom 2021-01-14 18-37-48.png (79.87 KiB) Viewed 46 times
The quality of audio is much better than with regular webcam I used before. :)
I have only small background noises here where I sit. I can't publicate my terrible voice here world hearable. :lol:
Now I have to learn, how to bring all together to have a speech recognition connected to homeassistant.
These users thanked the author joerg for the post (total 3):
nshmyrev (Fri Jan 15, 2021 4:14 am) • L67GS (Fri Jan 15, 2021 8:33 am) • odroid (Fri Jan 15, 2021 10:07 am)

nshmyrev
Posts: 39
Joined: Sat Dec 12, 2020 10:14 pm
languages_spoken: english
Has thanked: 12 times
Been thanked: 8 times
Contact:

Re: Speech Recognition on ODROID

Post by nshmyrev »

L67GS wrote:
Thu Jan 14, 2021 8:14 am
Can't a voice model be trained by the user to increase accuracy for a given dialect?
We do not support it right now. In theory again it is possible but probably outside of the project scope. Actually it should be accurate with good voice quality for any voice without any training (just like Alexa)

L67GS
Posts: 340
Joined: Wed Apr 22, 2020 3:02 pm
languages_spoken: English, Jibberish, Pig Latin
ODROIDs: XU4, C1+,(3) C0's, and a whole big pile of accessories, VU7A Plus,, ect....
Has thanked: 105 times
Been thanked: 50 times
Contact:

Re: Speech Recognition on ODROID

Post by L67GS »

joerg wrote:
Fri Jan 15, 2021 2:48 am
I have the PS3 Eye Cam now and dismounted the pcb.
The good thing is that the C1 kernel starts the driver:

Code: Select all

[112790.073860] usb 1-1.2: new high-speed USB device number 4 using dwc_otg
[112790.250242] gspca_main: v2.14.0 registered
[112790.260711] gspca_main: ov534-2.14.0 probing 1415:2000
[112792.224838] usbcore: registered new interface driver ov534
[112792.309699] usbcore: registered new interface driver snd-usb-audio
And with arecord I can record my voice:

Code: Select all

sudo arecord -D hw:1,0 -f S16_LE -c 4 -r 16000 -d 15 test.wav
Note that it must be given the channel count -c 4. If not, arecord gives error arecord: set_params:1349: Channels count non available.
I get a file with 1 stereo and 2 mono channels:
Bildschirmfoto vom 2021-01-14 18-37-48.png
The quality of audio is much better than with regular webcam I used before. :)
I have only small background noises here where I sit. I can't publicate my terrible voice here world hearable. :lol:
Now I have to learn, how to bring all together to have a speech recognition connected to homeassistant.
Mine arrived today and I tested them both (Cheese on a desktop) and dismantled one. I like that those microphones will be easy to desolder and pull back closer to the board.

Thank you for doing the hard part on the C1, I didn't even try one on a SBC yet.
nshmyrev wrote:
Fri Jan 15, 2021 4:14 am
L67GS wrote:
Thu Jan 14, 2021 8:14 am
Can't a voice model be trained by the user to increase accuracy for a given dialect?
We do not support it right now. In theory again it is possible but probably outside of the project scope. Actually it should be accurate with good voice quality for any voice without any training (just like Alexa)
I thought Vosk ran the Kaldi engine so it would be possible to train a language model.

nshmyrev
Posts: 39
Joined: Sat Dec 12, 2020 10:14 pm
languages_spoken: english
Has thanked: 12 times
Been thanked: 8 times
Contact:

Re: Speech Recognition on ODROID

Post by nshmyrev »

L67GS wrote:
Fri Jan 15, 2021 8:27 am
I thought Vosk ran the Kaldi engine so it would be possible to train a language model.
It is possible, however, the process is not trivial yet.

Post Reply

Return to “The Ideas”

Who is online

Users browsing this forum: No registered users and 3 guests