This is partial documentation for the AOL file cabinet database as used by the MacOS version of AOL. The intent of this is to have enough information to recover all of the e-mail inside the file so that it can be imported into another program. Using this info, I was able to recover over three megs of mail out of an old cabinet file, by generating an mbox file and copying the result to my IMAP server.
First of all, this file format is completely different from the format used by the PC version of AOL. Second, the basic format of this file doesn't seem to have changed since at least version 3.0 of the Mac AOL software. I was able to read an old file cabinet file from AOL 3.0 using the AOL 9.0 software.
Information leading to e-mail is marked in boldface.
Update: 2005-12-30 - In the time since I wrote this, there is now an official way to not only get mail out of your old cabinet files, but you can also send and receive AOL mail directly from within Mail.app! If you want to bring your AOL mail into this millenium, you should get AOL Service Assistant right now!
00000009 AOL Version? 0021A3F0 Free List? 00004440 Master List? 00000015 ??? 000007B1 ??? 00000400 ???
The data from 0018 to 00FF is unitialized and will possibly even contain the MacBinary header from some random downloaded file.
Only the Master List is important for resucing data from this file.
The data blocks start out with 8 words:
4B41 Flags - I can not figure out what this means, but the high nibble sometimes changes depending on the block type. 0000 Apparently always = 0000 0001 Apparently always = 0001 0002 The number of entries in this block (not always used, depending on the block type)
Note that there is no way to know the exact size of a block without the context of what kind of block it is. Some blocks contain nothing but a list of block addresses (in which case the size is the number of entries * 4), sometimes it is a list of block addresses with object IDs (in which case the size is the number of entries * 8). Some entry types are very complicated.
In smaller entry types, the low nibble (the "1" in 4B41 above) seems to correspond to the number of 8-byte blocks following the header.
There are apparently no more than 32 entries in a block, so when a block full of pointers gets full, it must re-balance them into a new block.
Not all blocks in the file have headers. Some are just raw data. These should have a data length in the referring block.
00004440: 4B41 0000 0001 0002 - header for a block with two entries 00000100 - pointer to an index block 000044C8 - pointer to an index block
So far, I know that the second index block leads to my saved e-mail. The first index block apparently leads to Favorites lists.
0002 ??? 04 ??? 0000002B object ID? 00000000 object ID? 0000DC00 pointer to list of object IDs? 00000000 ???
The 04 may be the count of records within each entry, but all entries have four records. Each record is 16 bytes long:
00000007 object ID? 00100B58 block pointer 0000002C object ID? 6E756C6C the word "null", which must mean that this field is null (duh)
So far, I know that the block pointer in the first record of the second entry leads to my saved e-mail. Everything else seems to be for mail folder names.
00100B58: 4A01 0000 0001 0002 0003EA10 00194ED8
00194ED8: 0A02 0000 0001 001E ... 00244718 ...
00244718: 8B02 0000 0001 0020 ... 00242E48 000003DE ...
00242E48: C202 0000 0001 0000 - block header with zero entries 03B4 - object ID of folder which contains this message 00244A30 - subject line text for display in overview lists 0000000F - length of subject line text 00000133 - ??? 0000 - ??? 002EE560 - pointer to message body data 00000C16 - length of message body data 00245F28 - pointer to message sender/recipient text for display in overview lists 00000027 - length of message sender/recipient text B0103248 - message date? this is apparently in some wierd non-Unix epoch 00000000 - ??? 00000000 - ??? 00000000 - ??? 00000000 - ??? 00000002 - ???
0004 - record type 0000000E - length of record data 52653A20 4E6F2053 75626A65 6374 - record data, in this case 'Re: No Subject'
Here are some record types:
0004 - subject line 0005 - sender, e-mail address + real name 0006 - date (same as in message header block) - this field may not always be present, especially in mail you sent! 0007 - possibly the same date with a different epoch 0009 - To/CC address, apparently starts with an extra byte to indicate type. 00 seems to be To:, 01 and 02 both seem to be CC:, but I don't know what the difference between them is. There may be multiple of these records, all consecutive, and sorted by type. 000A - message text (see next section) 000C - attachment AAAAAAAA - file size BBBBBBBB - ??? CCCCCCCC - ??? DD - file name length EEEEEEEE - file name (may include "(xxxxxx bytes)" at end) 0012 - sender, shortened to 30 or so characters with "..." at end 0013 - ??? (contains no data) 0014 - sender, e-mail address only 0015 - recipient, apparently your e-mail address
The message text record ALWAYS comes last.
AAAA - hex word of record type BBBBBBBB - hex longword of record data length (including 03) CCCCCCCC - hex longword, depends on record type (in message text records, this is the offset of the current block in the message text) XXXX - data, of length BBBBBBBB - 1 03 - end of block
Record types go from FFFF to at least as high as 0013. Some of them contain formatting (0008 usually contains the word "Courier", for instance, indicating it has something to do with font selection), but the one we want is 0002. This contains message text records.
0002 - hex word of record type BBBBBBBB - hex longword of record data length CCCCCCCC - hex longword of offset within entire message 00 - 00 byte DDDDDD - 1-3 hex ASCII bytes containing length of text (usually BBBB-5) 2C - comma XXXX - message text, apparently always broken between lines 03 - end of block
Once all the 0002 blocks are parsed out, and the junk before the comma skipped, and the 03 ignored, you have your e-mail!
AAAAAAAA - object ID or pointer to C-string text BBBBBBBB - object ID or pointer to C-string text CC - length of title DDDDDDDD - title of item
I have not done much research into these.
It's ugly and will probably need some work to be useful. At the very least, you will need to change the "#define cabName" to point to your own cabinet file.
(NOTE: the "../" in cabName was because Xcode builds and runs your binary in a subdirectory.)