Page 1 of 2
NXC: "Speech"-Recognition
Posted: 05 Feb 2011, 18:37
by HaWe
hi folks,
a funny little program which introduces to speech recognition by the NXT.
Actually - it's of course not speech recognition, it's more "Rhythm Detection".
Construction: Sound Sensor at port S2.
http://www.mindstormsforum.de/viewtopic.php?f=70&t=6386
Re: NXC: "Speech"-"Recognition"
Posted: 06 Feb 2011, 15:08
by HaWe
as the pattern of my sound recordings is an oscillation of differnent noise levels (Noise Vibration), I got the idea to use a Fast Fourier Transformation (FFT) for characterizing my SoundRec[400] array.
Unfortunately I have no experience with FFT's at all, and the underlying maths are quite nebulous to me.
IIUC, a FFT approximates a vibration by a sum of sinus waves of different frequencies.
(f1, f2, f3,..., each frequency (resp. wavelength) twice as long as the previous one),
each term multiplied by a specific coefficient.
FFT(t) = c1*sin(f1(t)) + c2*sin(f2(t)) + c3*sin(f3(t)) +...+ cn*sin(fn(t))
As my RecordLenght consists of 400 samples, I suppose the frequencies (resp. wavelengths) could be
f1=1
f2=2
f3=4
f4=8
f5=16
f6=32
f7=64
f8=128
f9=256
That (at least up to f16) should fit, so I have to handle n=9(-16?) terms with 9(-16?) frequencies and 9(-16?) coefficients for 400 noise level samples.
Can anybody tell me how to implement a FFT algorithm for these conditions?
Re: NXC: "Speech"-"Recognition"
Posted: 07 Feb 2011, 21:34
by kvols
Hi doc
I wrote one in Lejos some time ago, and it works pretty well under the given circumstances (small processor, very limited amount of space, coarse sampling frequency etc.). There are lots of FFT algorihms out there, but you'll probably need to do some translation.
Google for FFT numerical recipes:
http://www.google.com/search?q=FFT+numerical+recipes
There is some explanation here:
http://en.wikipedia.org/wiki/Fast_Fourier_transform
Best of luck!
Povl
Re: NXC: "Speech"-"Recognition"
Posted: 08 Feb 2011, 00:29
by gloomyandy
For the talk we gave about leJOS at JavaOne a year or so ago, Roger created a demo that used "speech recognition". It wasn't as sophisticated as what Doc has planned but it worked pretty well and had a number of people fooled until we told them how it worked. A video of our test (and backup if we had problems on the day) is here:
http://www.youtube.com/watch?v=sjPzcmWSfQs
Some clips from the actual talk are here:
http://www.youtube.com/watch?v=fJD6vGHKLTQ
The voice demo starts about 4:30 into the clip.
Andy
Re: NXC: "Speech"-"Recognition"
Posted: 08 Feb 2011, 08:07
by HaWe
well, what was your algorithm like?
Mine is based on the sum of the least square deviations of loudness patterns, and it works quite well as you may have observed. Notice, that the Lego Sound Sensor doesn't detect frequencies but just loudness oscillations (dbA) - nevertheless the recognition works (in a well-defined sub-population of rhythmically concise spoken words)!
But something like a FT oder FFT seems to be even more promising. Any ideas for a FT or FFT with 10 (max 20) terms (coefficients, frequencies)...?
I'm not a programmer and not a mathematician, and I already googled a lot but didn't find something suitable.
Re: NXC: "Speech"-"Recognition"
Posted: 08 Feb 2011, 08:37
by gloomyandy
Hi Doc,
Sorry I'm not sure how the algorithm worked. It was Roger's demo so I'll drop him so mail to find out....
Andy
Re: NXC: "Speech"-"Recognition"
Posted: 08 Feb 2011, 18:38
by HaWe
new version with oscillograph (revised version) :)
Re: NXC: "Speech"-"Recognition"
Posted: 08 Feb 2011, 19:31
by mightor
new version with oscillograph (revised version)
Are the graphs with our without a German accent?
This is pretty cool stuff.
- Xander
Re: NXC: "Speech"-"Recognition"
Posted: 08 Feb 2011, 19:35
by HaWe
accent?
what is "accent" ?
;)
Re: NXC: "Speech"-"Recognition"
Posted: 09 Feb 2011, 08:48
by HaWe
Hi,
what do you think: what would be the best way to transfer al those sound arrays as a file to the PC,
e.g. 10 samples of each of 6 spoken words = 60 arrays[400] ?
in order to process the data on an external computer (by Excel or a ANSI C++ program) .
I think a text file with a separation of all numbers by ";" would be ok.