Before the Color, the Gameboy Pocket and Gameboy Light were released but they weren’t hugely different than the DMG, more like minor revisions than a new handheld. Despite the color being release, the Gameboy DMG (Dot Matrix Game) continued to be sold until 2003, after the release of the Gameboy Advance.
This documents the behavior of Game Boy sound; details which aren't relevant to the observable behavior have been omitted unless they clarify understanding. It is aimed at answering all questions about exact operation, rather than describing how to use sound effectively in Game Boy programs. Values in hexadecimal (base 16) are generally written with a $ prefix. Bits are numbered from 0 to 7, where bit N has a weight of 2^N. A nibble is 4 bits, half a byte. Obscure behavior is described separately to increase clarity elsewhere.
The Game Boy has four sound channels: two square waves with adjustable duty, a programmable wave table, and a noise generator. Each has some kind of frequency (pitch) control. The first square channel also has an automatic frequency sweep unit to help with sound effects. The squares and noise each have a volume envelope unit to help with fading notes and sound effects, while the wave channel has only limited manual volume control. Each channel has a length counter that can silence the channel after a preset time, to handle note durations. Each channel can be individually panned to the far left, center, or far right. The master volume of the left and right outputs can also be adjusted.
Different versions of the Game Boy sound hardware have slightly different behavior. The following models have been tested:
Sound registers are mapped to $FF10-$FF3F in memory. Each channel has five logical registers, NRx0-NRx4, though some don't use NRx0. The value written to bits marked with '-' has no effect. Reference to the value in a register means the last value written to it.
Each channel has a frequency timer which clocks a waveform generator. The waveform's volume is adjusted and fed to the mixer. The mixer converts each channel's waveform into an electrical signal and outputs this to the left and/or right channels. Finally, a master volume control adjusts the left and right outputs. The channels have the following units that are connected from left to right:
The mixer has a separate DAC for each channel, followed by on/off controls for left and right outputs. The left/right outputs from each channel are then added together and fed to the left/right master volume controls.
In general, all units in the channels are always running. For example, even if a channel is silent, several units will still be calculating values even though they aren't used.
A timer generates an output clock every N input clocks, where N is the timer's period. If a timer's rate is given as a frequency, its period is 4194304/frequency in Hz. Each timer has an internal counter that is decremented on each input clock. When the counter becomes zero, it is reloaded with the period and an output clock is generated.
The frame sequencer generates low frequency clocks for the modulation units. It is clocked by a 512 Hz timer.
A length counter disables a channel when it decrements to zero. It contains an internal counter and enabled flag. Writing a byte to NRx1 loads the counter with 64-data (256-data for wave channel). The counter can be reloaded at any time.
A channel is said to be disabled when the internal enabled flag is clear. When a channel is disabled, its volume unit receives 0, otherwise its volume unit receives the output of the waveform generator. Other units besides the length counter can enable/disable the channel as well.
Each length counter is clocked at 256 Hz by the frame sequencer. When clocked while enabled by NRx4 and the counter is not zero, it is decremented. If it becomes zero, the channel is disabled.
A volume envelope has a volume counter and an internal timer clocked at 64 Hz by the frame sequencer. When the timer generates a clock and the envelope period is not zero, a new volume is calculated by adding or subtracting (as set by NRx2) one from the current volume. If this new volume within the 0 to 15 range, the volume is updated, otherwise it is left unchanged and no further automatic increments/decrements are made to the volume until the channel is triggered again.
When the waveform input is zero the envelope outputs zero, otherwise it outputs the current volume.
Writing to NRx2 causes obscure effects on the volume that differ on different Game Boy models (see obscure behavior).
A square channel's frequency timer period is set to (2048-frequency)*4. Four duty cycles are available, each waveform taking 8 frequency timer clocks to cycle through:
The first square channel has a frequency sweep unit, controlled by NR10. This has a timer, internal enabled flag, and frequency shadow register. It can periodically adjust square 1's frequency up or down.
During a trigger event, several things occur:
Frequency calculation consists of taking the value in the frequency shadow register, shifting it right by sweep shift, optionally negating the value, and summing this with the frequency shadow register to produce a new frequency. What is done with this new frequency depends on the context.
The overflow check simply calculates the new frequency and if this is greater than 2047, square 1 is disabled.
The sweep timer is clocked at 128 Hz by the frame sequencer. When it generates a clock and the sweep's internal enabled flag is set and the sweep period is not zero, a new frequency is calculated and the overflow check is performed. If the new frequency is 2047 or less and the sweep shift is not zero, this new frequency is written back to the shadow frequency and square 1's frequency in NR13 and NR14, then frequency calculation and overflow check are run AGAIN immediately using this new value, but this second new frequency is not written back.
Square 1's frequency can be modified via NR13 and NR14 while sweep is active, but the shadow frequency won't be affected so the next time the sweep updates the channel's frequency this modification will be lost.
The noise channel's frequency timer period is set by a base divisor shifted left some number of bits.
The linear feedback shift register (LFSR) generates a pseudo-random bit sequence. It has a 15-bit shift register with feedback. When clocked by the frequency timer, the low two bits (0 and 1) are XORed, all bits are shifted right by one, and the result of the XOR is put into the now-empty high bit. If width mode is 1 (NR43), the XOR result is ALSO put into bit 6 AFTER the shift, resulting in a 7-bit LFSR. The waveform output is bit 0 of the LFSR, INVERTED.
The wave channel plays a 32-entry wave table made up of 4-bit samples. Each byte encodes two samples, the first in the high bits. The wave channel has a sample buffer and position counter.
The wave channel's frequency timer period is set to (2048-frequency)*2. When the timer generates a clock, the position counter is advanced one sample in the wave table, looping back to the beginning when it goes past the end, then a sample is read into the sample buffer from this NEW position.
The DAC receives the current value from the upper/lower nibble of the sample buffer, shifted right by the volume control.
Wave RAM can only be properly accessed when the channel is disabled (see obscure behavior).
Writing a value to NRx4 with bit 7 set causes the following things to occur:
Note that if the channel's DAC is off, after the above actions occur the channel will be immediately disabled again.
Each channel has a 4-bit digital-to-analog convertor (DAC). This converts the input value to a proportional output voltage. An input of 0 generates -1.0 and an input of 15 generates +1.0, using arbitrary voltage units.
DAC power is controlled by the upper 5 bits of NRx2 (top bit of NR30 for wave channel). If these bits are not all clear, the DAC is on, otherwise it's off and outputs 0 volts. Also, any time the DAC is off the channel is kept disabled (but turning the DAC back on does NOT enable the channel).
Each channel's DAC output goes to a pair of on/off switches for the left and right channels before they are sent to the left/right mixers. A mixer simply adds the voltages from each channel together. These left/right switches are controlled by NR51. When a switch is off, the mixer receives 0 volts.
The Vin bits of NR50 control mixing of the Vin signal from the cartridge, allowing extra sound hardware.
The mixed left/right signals go to the left/right master volume controls. These multiply the signal by (volume+1). The volume step relative to the channel DAC is such that a single channel enabled via NR51 playing at volume of 2 with a master volume of 7 is about as loud as that channel playing at volume 15 with a master volume of 0.
NR52 controls power to the sound hardware. When powered off, all registers (NR10-NR51) are instantly written with zero and any writes to those registers are ignored while power remains off (except on the DMG, where length counters are unaffected by power and can still be written while off). When powered on, the frame sequencer is reset so that the next step will be 0, the square duty units are reset to the first step of the waveform, and the wave channel's sample buffer is reset to 0.
Power state does not affect wave memory, which can always be read/written. It also does not affect the 512 Hz timer that feeds the frame sequencer.
When the Game Boy is switched on (before the internal boot ROM executes), the values in the wave table depend on the model. On the DMG, they are somewhat random, though the particular pattern is generally the same for each individual Game Boy unit. The game R-Type doesn't initialize wave RAM and thus relies on these. One set of values is
On the Game Boy Color, the values are consistently
Reading NR52 yields the current power status and each channel's active status, which is set by an NRx4 write and cleared by the length counter, pulse 1 sweep unit, or a write that disables that channel's DAC.
Wave RAM reads back as the last value written.
When an NRxx register is read back, the last written value ORed with the following is returned:
That is, the channel length counters, frequencies, and unused bits always read back as set to all 1s.
The cartridge connector includes a sound input called Vin. When enabled via NR50, it is mixed in before the master volume controls. On the DMG and MGB, 0.847 volts gives equivalent to 0 on a channel DAC, and 3.710 volts is equivalent to 15 on a DAC, with other values linearly distributed between those voltages. On the CGB, the range is 1.920 volts to 2.740 volts, a quarter of the DMG range, thus sound fed to the CGB's Vin is significantly louder. When nothing is connected to Vin, it naturally floats at the middle voltage (silence).
Other models behave differently, especially the DMG units which have crazy behavior in some cases. The only useful consistent behavior is using add mode with a period of zero in order to increment the volume by 1. That is, write $V8 to NRx2 to set the initial volume to V before triggering the channel, then write $08 to NRx2 to increment the volume as the sound plays (repeat 15 times to decrement the volume by 1). This allows manual volume control on all units tested.
The charge factor can be calculated for any output sampling rate as 0.999958^(4194304/rate). So if you were applying high_pass() at 44100 Hz, you'd use a charge factor of 0.996.
This summarizes differences I've found among the models tested.
Wave RAM access:
Wave channel re-trigger without disabling first (via NR30):
Length counters and power off:
Length clocking on NRx4:
Volume changes on NRx2 write: