Spect version 1.0
AUG 93
by Kevin C. Fandre
***************
About:
Spect is a sampling and spectral analysis program written for the Gravis
Ultrasound. It utilizes i386 32 bit instructions for speed but since no
protected mode instructions are used it should run in 8086 virtual mode, ie it
won't interfere with memory managers and other such protected mode programs.
It will not, however, run under windows but who needs to since this program
was written for speed, speed, speed and that precludes windows.
So, at minimum, you must have a 386 or better computer, a VGA compatible
video card, and a Gravis Ultrasound. A math coprocessor is NOT required
and the presence of one will not affect the speed of the program since spect
uses only integer instructions.
***************
Disclaimer:
I am not responsible for any damages whatsoever whether it be eye strain,
migraines, tidal waves, heart failure, or otherwise caused by prolonged
viewing of this program. Loss of data, hardware failures, stock crashes,
and everything else caused by this program are solely the responsibility of
the user. (I always wanted to write one of those.)
***************
Public Domain:
Use of this program is free and I will not try to make you feel guilty for
using it and not paying for it. It may be distributed freely for non-profit
only and I reserve all rights to this program. However, if you use this
program very frequently, use it for *non-private* purposes, or would like to
spur me on to further develop this program and others based on FFT analysis,
you may show your interest by sending a donation commensurate with your level
of usage of this program to:
Kevin C. Fandre
1726 Leonard Dr.
Jacksonville, AR 72706
Such donations are strictly non-obligatory but I would strongly encourage
those who use this program for professional purposes, such as in education
or as a demo in a commercial store for instance, to send in a donation.
Donations in the range of ten to fifteen dollars per copy are quite
acceptable. Please include any comments or suggestions you might have and,
most especially, please describe to what use you are putting this program.
This will help me to better gauge your needs and improve upon this program.
In the future I was planning on adding:
-Window smoothing to clear up spectrums and improve definition
-Adding various kinds of digital filters
-Support for other sound cards or a home built ADC kit
-Support for Gravis' own 16bit daughter card
-Adding mouse and window support and other user niceties
-Adding oscilliscopes
-Adding wavelet analysis
-?(Whatever you want)
***************
Comments:
Please tell me what you think! I would love to hear from you. Send your
comments to either the above address or my email address:
good till May of 94
***************
To test:
Spect is configurable from the command line but for a quick test try
running spect without any command line arguments. This will set up
spect to sample from both the line in and microphone channels in stereo
at 8000 Hz. These parameters should give adequate results and fast screen
updates even on a slow computer but aliasing caused by too low a sampling
rate will cause artifacts(unwanted lines) to appear if there is an abundance
of frequencies higher than 4000 Hz. This means that if you are analyzing
rock music for instance, cymbal crashes will look like static superimposed
on the lower frequencies. The only way to avoid this is to switch to a
higher sampling rate( up to 44100 Hz on the Ultrasound) but this has its
negative side-effects also. But more on this later. First I will briefly
describe what the Fast Fourier Transform(FFT) does and how it does it and
this will give you a better understanding of how to configure spect to give
you optimum performance.
***************
The FFT:
The FFT is based on the Discrete Fourier Transform which is a discrete
extension of the Fourier integral, an integral which can be interpreted as
the continuous representation of the spectral components of any given
function. The Fourier integral itself derives from the Fourier series which
is capable of describing any function over an interval in terms of a
*unique* infinite summation of sines and cosines whose periods are integer
multiples of each other. Well, that was a mouthful but basically it is
possible to describe a function, and sampled data over a range constitutes
a function, as a summation of sine waves. The mathematics of all this was
worked out by the mega-geniuses of the past(Fourier,Laplace, and Euler)
and boils down to this:
1 N-1 jk
X(j) = - SUM x(k)W
N k=0
where,
N = number of points
W = exp(2*pi*i)/N
X(j) - N complex transformed points
x(k) - N complex samples
The points which are graphed on the screen are merely |X(j)| which come
from the sample points x(k). Since we are dealing with real data the
complex parts of the samples are just set to zero.
Now, as you can see the number of operations that must be gone through
is proportional to N^2, N summations by N multiply-adds. Up until 1963
this is the way spectrums were done until Cooley-Tukey came up with a faster
alternative that better utilizes intermediate results. Their original
paper described a method which worked with any number of samples so long
as N was not prime and came to be know as the Fast Fourier Transform. A
special case arises when the number of samples is a power of 2(or a power of
5 or any number for that matter). Spect uses just such an algorithm
and runs in a time proportional to N*log(N). The time savings over
N^2 is enormous.
Its difficult to describe in words how the FFT algorithm utilizes
intermediate results to reduce computation time. Basically, it amounts
to splitting up the set of samples into two odd and even sets, applying
the summation detailed above except that exponents are all multiplied by two.
With the resulting two sets any desired X(j) can be computed by adding an
element from the even results with an appropriate element from the set of odd
results multiplied by W raised to a fractional power depending on which X(j)
is being computed. The neat part is that this procedure can be applied
recursively up to log2(N) times if N is a power of 2 until we produce N sets
of one element each which are nothing more than the sample data themselves.
Then the recursion stops and we can work our way backwards using the simple
combining rules to produce an X(j) using only big O log2(n) operations to
do so. Over N X(j), this is proportional to N*log2(n) total compute time.
The algorithm used by Spect I owe to who knows who but it dispenses with
the stack pushing overhead involved in recursion and utilizes the patterns
evident in doing all the substitutions of the combining rules into themselves
that was affected by the recursion anyway. The only draw back is that the
results are in a scrambled order but a simple table lookup alleviates the
headache of unscrambling them. The code was written using 32bit integer
instructions, utilizes table lookups wherever possible, and squeezes every
last CPU cycle out of the i386.
**************
Command line Arguments:
-? or -h : list options
-l :turn on the UltraSound line in
-q :put in quite mode, turn of output
-m :enable microphone input
-L :turn on the logarithmic scale
-f#:allows you to specify the frequency
-s :tell spect to sample stereo data
Currently file playing is not supported but will be in future versions.
Typing a filename will work though, but there will be no sound, only
graphics. It was there for diagnostic purposes and I decided to leave it.
**************
How to Use:
From within the program, typing F1 will print the help screen.
Here's an example of typical command line parameters:
spect -f16000 -lsL
This tells spect to use a logarithmic scale and sample in stereo from the
line in at 16000 Hz. If you dont use either the l switch or the m switch you
will be recording dead silence so you must use at minimum one of these
switches. The frequency defaults to 8000Hz but can be specified as desired.
Now press the "+" key once. Notice that this screen is identical
to the screen that results when spect is run without any command line
arguments only that is will be a little bit cleaner because aliasing effects
are reduced. Also you can press the PgUp key to see the other half of the
transformed samples since pressing the "+" key doubles the number of
samples to 512.
Doing an FFT transform on real data produces only N/2 relevant points. The
other points are just a mirror image and so are discarded by spect. The
number of bars displayed by spect is 128 so the minimum number of samples
required to produce 128 significant transform points is 256. This is the
minimum. Spect is capable of utilizing up to 2048 points but at the cost
of execution speed. However, what you do get is increased resolution.
It is possible to dynamically increase and decrease the number of samples
computed from within the program by pressing the + key to increase the
number of samples and the - key to decrease the number of samples.
Fortunately, increasing the number of samples to 2048 dosent slow the
computer to a crawl for the reasons described above. A fast 486 helps
when using a lot of samples and you will need them when sampling at higher
frequencies unless you are willing to sacrifice a lot of detail.
Since the display is limited to 128 bars, when zoomed in it is possible to
shift the display left and right one bar at a time by using the cursor keys
and 128 bars at a time by using the PgUp and PgDwn keys. The frequencies
in hertz listed along the bottom of the screen will automatically adjust to
reflect the new range represented by the group of 128 bars currently on screen.
A logarithmic scale is provided to more closely represent the way the ear
perceives sound(decibels are a logarithmic scale afterall). The logarithmic
scale tends to more clearly discriminate fainter signals and deemphasizes the
differences between stronger signals. You can select this scale option
from the command line with the -L(case matters) switch. From within the
program it possible to switch between linear and logarithmic scales with
the "/" key.
To determine the spectral resolution of the transform just divide the
sampling rate by the total number of samples. Thus for the default 8000Hz
the spectral resolution under a 256 point transform is 31.25Hz, meaning that
each bar is 31.25 Hz greater than the next. You can halve this number by
pressing the "+" key and doubling the number of samples. The maximum
observable frequency is always one half of the sampling rate as per the
Nyquist frequency law.
When you experiment with Spect try to pick out the harmonics in various
signals. Guitar solos really stand out and voices look really beautiful.
After starring at the patterns that voices produce you can get a glimmer
of how one might go about writing a voice recognition program since the
patterns are fairly regular. If you cant see the patterns too clearly
then try reducing the sampling frequency. Most of the action, in music at
least, occurs below 4000Hz. The higher harmonics are there of course but
they are fainter and not as interesting to look at and can be done away with
by choosing a lower sampling rate. Only, take care that the higher harmonics
are not too strong or they will fold over onto the lower frequencies and give
false lines if the sampling rate is too low. If you have more horsepower at
your disposal try doubling the sampling rate and doubling the zoom. You will
get the same screen; it will be a little slower though, but much cleaner.
Also, make sure the volume of your input source is set high enough since the
Ultrasound's ADC is not all that sensitive.
Keep in mind that because the Ultrasound only records in 8 bits and lacks
significant dynamic range it will take some fine control of the volume control
to get enough volume to make all of the harmonics show up while at the same
time prevent clipping. You don't want clipping since that will transform
into false lines that aren't in your source. A little won't hurt too much
though...
If you have access to a signal generator and test it out on spect don't be
surprised if you don't get nice sharp bars for every frequency tried. Since
the DFT works on a limited number of points inaccuracies are bound to
show up in the form of broadening. You can see what I mean by just whistling
into a microphone. You will see a nice sharp peak that broadens as it nears
the bottom of the screen. This is due to the inherent inaccuracies involved
in representing a continuous signal by a finite number of discrete points.
Finding the best combination of speed and accuracy by choosing the frequency
and number of samples is a matter of trial and error. In general having
a fast computer will allow you greater flexibility in choosing higher
sampling rates and larger samples while still maintaining decent refresh
speeds. A really fast computer is not essential though. I developed this
program on a 386-20 and it runs pretty fast as far as I can tell(9 frames per
second with a 256 point transform!). A fast 486 should run over three times as
fast I imagine and that will provide near realtime performance with a 256 point
transform. True realtime performance is not possible since it takes a finite
amount of time to record a block of data but with delays in the hundredths of
seconds it should be virtually instantaneous to the eye. Experiment and have
fun!
Oh, one last thing. The graphics look great in the dark with the stereo on.