Monday, January 4, 2010

Sampled Audio Synthesis









Sampled Audio Synthesis


Sampled audio is encoded as a series of samples in a byte array, which is sent through a SourceDataLine to the mixer. In previous examples, the contents of the byte array came from an audio file though you saw that audio effects can manipulate and even add to the array. In sampled audio synthesis, the application generates the byte array data without requiring any audio input. Potentially, any sound can be generated at runtime.


Audio is a mix of sine waves, each one representing a tone or a note. A pure note is a single sine wave with a fixed amplitude and frequency (or pitch). Frequency can be defined as the number of sine waves that pass a given point in a second. The higher the frequency, the higher the note's pitch; the higher the amplitude, the louder the note.


Before I go further, it helps to introduce the usual naming scheme for notes; it's easier to talk about note names than note frequencies.



Note Names


Notes names are derived from the piano keyboard, which has a mix of black and white keys, shown in Figure 10-1.


Keys are grouped into octaves, each octave consisting of 12 consecutive white and black keys. The white keys are labeled with the letters A to G and an octave number.



Figure 10-1. Part of the piano keyboard



For example, the note named C4 is the white key closest to the center of the keyboard, often referred to as middle C. The 4 means that the key is in the fourth octave, counting from the left of the keyboard.


A black key is labeled with the letter of the preceding white key and a sharp (#). For instance, the black key following C4 is known as C#4.


A note to musicians: for simplicity's sake, I'll be ignoring flats in this discussion.



Figure 10-2 shows the keyboard fragment of Figure 10-1 again but labeled with note names. I've assumed that the first white key is C4.



Figure 10-2. Piano keyboard with note names



Figure 10-2 utilizes the C Major scale, where the letters appear in the order C, D, E, F, G, A, and B.


There's a harmonic minor scale that starts at A, but I won't be using it in these examples.



After B4, the fifth octave begins, starting with C5 and repeating the same sequence as in the fourth octave. Before C4 is the third octave, which ends with B3.


Having introduced the names of these notes, it's possible to start talking about their associated frequencies or pitches. Table 10-1 gives the approximate frequencies for the C4 Major scale (the notes from C4 to B4).


Table 10-1. Frequencies for the C4 major scale

Note name

Frequency (in Hz)

C4

261.63

C#4

277.18

D4

293.66

D#4

311.13

E4

329.63

F4

349.23

F#4

369.99

G4

392.00

G#4

415.30

A4

440.00

A#4

466.16

B4

493.88



When I move to the next octave, the frequencies double for all the notes; for instance, C5 will be 523.26 Hz. The preceding octave contains frequencies that are halved, so C3 will be 130.82 Hz.


A table showing all piano note names and their frequencies can be found at http://www.phys.unsw.edu.au/tildjw/notes.html. It includes the corresponding MIDI numbers, which I consider later in this chapter.




Playing a Note


A note can be played by generating its associated frequency and providing an amplitude for loudness. But how can this approach be implemented in terms of a byte array suitable for a SourceDataLine?


A pure note is a single sine wave, with a specified amplitude and frequency, and this sine wave can be represented by a series of samples stored in a byte array. The idea is shown in Figure 10-3.


This is a simple form of analog-to-digital conversion. So, how is the frequency converted into a given number of samples, i.e., how many lines should the sample contain?



Figure 10-3. From single note to samples



A SourceDataLine is set up to accept a specified audio format, which includes a sample rate. For example, a sample rate of 21,000 causes 21,000 samples to reach the mixer every second. The frequency of a note, e.g., 300 Hz, means that 300 copies of that note will reach the mixer per second.


The number of samples required to represent a single note is one of the following



samples/note = (samples/second) / (notes/sec)
samples/note = sample rate / frequency



For the previous example, a single note would need 21,000/300 = 70 samples. In other words, the sine wave must consist of 70 samples. This approach is implemented in sendNote( ) in the NotesSynth.java application, which is explained next.




Synthesizing Notes


NotesSynth generates simple sounds at runtime without playing a clip. The current version outputs an increasing pitch sequence, repeated nine times, each time increasing a bit faster and with decreasing volume.


NotesSynth.java is stored in SoundExamps/SynthSound/.



Here is the main( ) method:



public static void main(String[] args)
{ createOutput( );
play( );
System.exit(0); // necessary for J2SE 1.4.2 or earlier
}



createOutput( ) opens a SourceDataLine that accepts stereo, signed PCM audio, utilizing 16 bits per sample in little-endian format. Consequently, 4 bytes must be used for each sample:



// globals
private static int SAMPLE_RATE = 22050; // no. of samples/sec


private static AudioFormat format = null;
private static SourceDataLine line = null;



private static void createOutput( )
{
format = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED,
SAMPLE_RATE, 16, 2, 4, SAMPLE_RATE, false);
/* SAMPLE_RATE // samples/sec
16 // sample size in bits, values can be -2^15 - 2^15-1
2 // no. of channels, stereo here
4 // frame size in bytes (2 bytes/sample * 2 channels)
SAMPLE_RATE // same as frames/sec
false // little endian */

System.out.println("Audio format: " + format);

try {
DataLine.Info info = new DataLine.Info(SourceDataLine.class, format);
if (!AudioSystem.isLineSupported(info)) {
System.out.println("Line does not support: " + format);
System.exit(0);
}
line = (SourceDataLine) AudioSystem.getLine(info);
line.open(format);
}
catch (Exception e)
{ System.out.println( e.getMessage( ));
System.exit(0);
}
} // end of createOutput( )



play( ) creates a buffer large enough for the samples, plays the pitch sequence using sendNote( ), and then closes the line:



private static void play( )
{
// calculate a size for the byte buffer holding a note
int maxSize = (int) Math.round((SAMPLE_RATE * format.getFrameSize( ))/MIN_FREQ);
// the frame size is 4 bytes
byte[] samples = new byte[maxSize];

line.start( );

/* Generate an increasing pitch sequence, repeated 9 times, each
time increasing a bit faster, and the volume decreasing */
double volume;
for (int step = 1; step < 10; step++)
for (int freq = MIN_FREQ; freq < MAX_FREQ; freq += step) {
volume = 1.0 - (step/10.0);
sendNote(freq, volume, samples);
}


// wait until all data is played, then close the line
line.drain( );
line.stop( );
line.close( );
} // end of play( )



maxSize must be big enough to store the largest number of samples for a generated note, which occurs when the note frequency is the smallest. Therefore, the MIN_FREQ value (250 Hz) is divided into SAMPLE_RATE.



Creating samples

sendNote( ) translates a frequency and amplitude into a series of samples representing that note's sine wave. The samples are stored in a byte array and sent along the SourceDataLine to the mixer:



// globals
private static double MAX_AMPLITUDE = 32760; // max loudness
// actual max is 2^15-1, 32767, since I'm using
// PCM signed 16 bit

// frequence (pitch) range for the notes
private static int MIN_FREQ = 250;
private static int MAX_FREQ = 2000;

// Middle C (C4) has a frequency of 261.63 Hz; see Table 10-1


private static void sendNote(int freq, double volLevel, byte[] samples)
{
if ((volLevel < 0.0) || (volLevel > 1.0)) {
System.out.println("Volume level should be between 0 and 1, using 0.9");
volLevel = 0.9;
}
double amplitude = volLevel * MAX_AMPLITUDE;

int numSamplesInWave = (int) Math.round( ((double) SAMPLE_RATE)/freq );
int idx = 0;
for (int i = 0; i < numSamplesInWave; i++) {
double sine = Math.sin(((double) i/numSamplesInWave) *
2.0 * Math.PI);
int sample = (int) (sine * amplitude);
// left sample of stereo
samples[idx + 0] = (byte) (sample & 0xFF); // low byte
samples[idx + 1] = (byte) ((sample >> 8) & 0xFF); // high byte
// right sample of stereo (identical to left)
samples[idx + 2] = (byte) (sample & 0xFF);
samples[idx + 3] = (byte) ((sample >> 8) & 0xFF);
idx += 4;
}

// send out the samples (the single note)
int offset = 0;
while (offset < idx)
offset += line.write(samples, offset, idx-offset);
}



numSamplesInWave is obtained by using the calculation described above, which is to divide the note frequency into the sample rate.


A sine wave value is obtained with Math.sin( ) and split into two bytes since 16-bit samples are being used. The little-endian format determines that the low-order byte is stored first, followed by the high-order one. Stereo means that I must supply two bytes for the left speaker, and two for the right; in my case, the data are the same for both.




Extending NotesSynth

A nice addition to NotesSynth would be to allow the user to specify notes with note names (e.g., C4, F#6), and translate them into frequencies before calling sendNote( ). Additionally, play( ) is hardwired to output the same tones every time it's executed. It would be easy to have it read a notes files, perhaps written using note names, to play different tunes.


Another important missing element is timing. Each note is played immediately after the previous note. It would be better to permit periods of silence as well.


Consider these challenges more than deficiencies. It's easy to implement this functionality in NotesSynth.












    No comments: