JUKEBOX.SI

Revision as of 14:06, 18 April 2019 by MattKC (talk | contribs)

JUKEBOX.SI is an SI file containing 60 audio tracks of music and radio dialogue. With a few exceptions[citation needed], all of Lego Island's music is stored in JUKEBOX.SI.

Details

All audio in JUKEBOX.SI is uncompressed mono PCM. Most of the tracks are in 11025 Hz and 16-bit, but a handful are 22050 Hz and 8-bit. Each track starts with a header specifying its format (sample rate, sample size, byte rate, etc.)

Due to SI files' standardization with RIFF, each track's header is Microsoft WAV compatible and can be transplanted directly into a WAV file's header. However due to SI files' chunked/interleaved nature, the PCM data cannot be directly transplanted without noticeable clicks and glitches in the output.

JUKEBOX.SI does not actually interleave its audio with any other data, making extraction easier than most SI files. However the audio is still separated into chunks and each chunk's 22-byte header (which is naturally not PCM data) is what causes the aforementioned clicks/glitches when trying to extract the data directly. These chunk headers will need to be removed to extract a clean copy of the audio.

Technical Information

Music appears to begin with a MxDa and is split into chunks of MxCh. The MxDa header contains information about the PCM audio in the MxCh chunks. The first MxCh appears to be information about the remainder of the chunks in the MxDa structure.

All bytes are little endian as is normal for RIFF-based files.

Extracting Audio

  • Audio streams can be located in JUKEBOX.SI by searching for " WAV" (note the prepending space).
    • A few bytes before the " WAV" will be the original filename of the WAV file prior to being imported into the SI file if you wish to retrieve that too.
  • A few bytes later will be "LIST" which appears to specify an array (or "list") of chunks that make up one audio track. The next 4 bytes will by a 32-bit integer for the total size of this "LIST", in other words the total amount of upcoming bytes of the SI file that belong to this particular audio track.
  • The first MxCh after the "LIST" will contain WAV-compatible header data, most of which can be transplanted directly into a WAV file (see below for details).
  • Every MxCh after this one will contain PCM audio data (formatted according to the header data in the first MxCh). Each MxCh has a 22 byte header that will need to be stripped out when extracting. After the 4 byte "MxCh" identifier, the header contains a 4 byte integer of the total amount of bytes that the chunk takes up (minus 8 bytes for the "MxCh" identifier and chunk size integer). All data after this 22 byte header is PCM audio that will be exactly "chunk size - 14" bytes in size (14 is the size of the 22 byte header minus the first 8 bytes).
  • Each MxCh's data can be dumped until you reach the end of the "LIST" size extracted above. At that point the end of the track has been reached and the process must be repeated to extract the next track.

Header

NOTE: This information is incomplete and requires more research and information.

As mentioned above, the first MxCh in a "LIST" contains solely header data. Most of this data is completely compatible with the specification for WAV.

Bytes Offset Description
MxDa 0 Identifier
MxCh 4 Chunk Header
Chunk Size 8 4-byte Integer
Sub-Chunk Size 22 4-byte Integer - The remaining size of this chunk after this value
Audio Format 26 2-byte Integer - 1 = PCM, others indicate some form of compression
Number of Channels 28 2-byte Integer - 1 = Mono, 2 = Stereo
Sample Rate 30 4-byte Integer
Byte Rate 34 4-byte Integer - is equal to Sample Rate * Number of Channels * BitsPerSample/8
Bytes per Sample 36 2-byte Integer - is equal to Number of Channels * BitsPerSample/8
Bits per Sample per Channel 38 2-byte Integer - 8 = 8-bit, 16 = 16-bit, etc.

Transplanting the Header

Use this WAV File Format Header Specification and you'll be able to determine the 16 bytes from "Audio Format" to "Bits per Sample per Channel" are identical. This makes up most of the WAV header data (apart from file and chunk size which cannot be determined from here) and can be directly transplanted to make extraction easier and ensure the sample rate and sample size are correct in the extraction.

Note that the MxCh header contains a few more bytes after "Bits per Sample per Channel" and therefore its "Sub-Chunk Size" is larger than the average WAV file's. These extra bytes should be ignored and not transplanted, though if they are the "Sub-Chunk Size" should be translated too (or at least increased to accommodate for them).