I'm currently trying to fully understand how CHD files and their respective versions work and how to verify and convert them correctly.
I have two different versions of area51.chd: One from a 0.78 reference set (CHD v3), one from 0.161 (CHD v5). If I look at those two with chdman 0.199 I get the following output:
0.78:
Code: Select all
chdman - MAME Compressed Hunks of Data (CHD) manager 0.199 (mame0199)
Input file: area51.chd
File Version: 3
Logical size: 1,281,982,464 bytes
Hunk Size: 4,096 bytes
Total Hunks: 312,984
Unit Size: 512 bytes
Total Units: 2,503,872
Compression: zlib (Deflate)
CHD size: 542,757,759 bytes
Ratio: 42.3%
SHA1: 9ea749404c9a5d44f407cdb8803293ec0d61410d
Metadata: Tag='GDDD' Index=0 Length=35 bytes
CYLS:2484,HEADS:16,SECS:63,BPS:512.
0.161:
Code: Select all
chdman - MAME Compressed Hunks of Data (CHD) manager 0.199 (mame0199)
Input file: area51.chd
File Version: 5
Logical size: 1,281,982,464 bytes
Hunk Size: 4,096 bytes
Total Hunks: 312,984
Unit Size: 512 bytes
Total Units: 2,503,872
Compression: lzma (LZMA), zlib (Deflate), huff (Huffman), flac (FLAC)
CHD size: 497,632,790 bytes
Ratio: 38.8%
SHA1: 3b303bc37e206a6d7339352c869f050d04186f11
Data SHA1: 9ea749404c9a5d44f407cdb8803293ec0d61410d
Metadata: Tag='GDDD' Index=0 Length=35 bytes
CYLS:2484,HEADS:16,SECS:63,BPS:512.
The SHA1 checksum of the old file is identical to the "Data SHA1" checksum of the newer one.
Extracting both CHDs with
Code: Select all
chdman extractraw -i area51.chd -o area51.raw
For testing purposes I tried to do the conversion from CHD v3 to v5 myself, again using chdman 0.199:
Code: Select all
chdman copy -i area51.chd -o ./test/area51.chd
Now the interesting part that I don't quite understand - here's chdmans info output for the newly converted v5 file:
Code: Select all
chdman - MAME Compressed Hunks of Data (CHD) manager 0.199 (mame0199)
Input file: area51.chd
File Version: 5
Logical size: 1,281,982,464 bytes
Hunk Size: 4,096 bytes
Total Hunks: 312,984
Unit Size: 512 bytes
Total Units: 2,503,872
Compression: lzma (LZMA), zlib (Deflate), huff (Huffman), flac (FLAC)
CHD size: 497,632,790 bytes
Ratio: 38.8%
SHA1: 4b2fc53072606d1f400a35d58b8018bdb184a459
Data SHA1: 9ea749404c9a5d44f407cdb8803293ec0d61410d
Metadata: Tag='GDDD' Index=0 Length=35 bytes
CYLS:2484,HEADS:16,SECS:63,BPS:512.
Code: Select all
SHA1: 3b303bc37e206a6d7339352c869f050d04186f11
Data SHA1: 9ea749404c9a5d44f407cdb8803293ec0d61410d
"Data SHA1" matches, "SHA1" is different. Why?
I found some information in the comments on https://github.com/mamedev/mame/blob/ma ... util/chd.h and compared the two files with a hex editor.
There's a difference between Bytes 85 and 105:
Code: Select all
"Official" 0.161: 3B 30 3B C3 7E 20 6A 6D 73 39 35 2C 86 9F 05 0D 04 18 6F 11
updated from 0.78 by me: 4B 2F C5 30 72 60 6D 1F 40 0A 35 D5 8B 80 18 BD B1 84 A4 59
Code: Select all
"Official" 0.161: 01
updated from 0.78 by me: 00
I don't quite understand what "comined raw+meta SHA1" means - is it the checksum of the raw, uncompressed data in combination with the header and some kind of other meta information? If so, it would explain why the checksums don't match - because of that one different bit in Byte 129.
But what is that byte? If I understand the comments correctly and didn't miscount the bytes, it's "UINT48 datastart", "offset of first block". Why would that be different and does it have any effect at all? I'm unfortunately not familiar enough with C/C++ to figure this out by myself.
I also figured out that if I take a different approach by going v3 => v4 => v5 (original 0.78 v3 CHD => chdman 0.145 ("update") (creates v4) => chdman 0.199 ("copy") (creates v5)) instead of directly going v3 => v5, I end up with an identical CHD file to the "original" 0.161.
Which one of those is the correct and/or recommended way?
And a bonus question:
As there is information about "combined raw+meta SHA1 of parent" in the header: Are there actually split CHDs in existence or is that just an implemented feature which was/is never actually used?