Converting between CHD versions & question about "combined raw+meta SHA1"

cpw83 · Wed Jul 04, 2018 12:43 pm

Hi,

I'm currently trying to fully understand how CHD files and their respective versions work and how to verify and convert them correctly.

I have two different versions of area51.chd: One from a 0.78 reference set (CHD v3), one from 0.161 (CHD v5). If I look at those two with chdman 0.199 I get the following output:

0.78:

Code: Select all

chdman - MAME Compressed Hunks of Data (CHD) manager 0.199 (mame0199)
Input file:   area51.chd
File Version: 3
Logical size: 1,281,982,464 bytes
Hunk Size:    4,096 bytes
Total Hunks:  312,984
Unit Size:    512 bytes
Total Units:  2,503,872
Compression:  zlib (Deflate)
CHD size:     542,757,759 bytes
Ratio:        42.3%
SHA1:         9ea749404c9a5d44f407cdb8803293ec0d61410d
Metadata:     Tag='GDDD'  Index=0  Length=35 bytes
              CYLS:2484,HEADS:16,SECS:63,BPS:512.

0.161:

Code: Select all

chdman - MAME Compressed Hunks of Data (CHD) manager 0.199 (mame0199)
Input file:   area51.chd
File Version: 5
Logical size: 1,281,982,464 bytes
Hunk Size:    4,096 bytes
Total Hunks:  312,984
Unit Size:    512 bytes
Total Units:  2,503,872
Compression:  lzma (LZMA), zlib (Deflate), huff (Huffman), flac (FLAC)
CHD size:     497,632,790 bytes
Ratio:        38.8%
SHA1:         3b303bc37e206a6d7339352c869f050d04186f11
Data SHA1:    9ea749404c9a5d44f407cdb8803293ec0d61410d
Metadata:     Tag='GDDD'  Index=0  Length=35 bytes
              CYLS:2484,HEADS:16,SECS:63,BPS:512.

The SHA1 checksum of the old file is identical to the "Data SHA1" checksum of the newer one.

Extracting both CHDs with

Code: Select all

 chdman extractraw -i area51.chd -o area51.raw

results in two identical files, both with a SHA1 checksum of 9ea749404c9a5d44f407cdb8803293ec0d61410d.

For testing purposes I tried to do the conversion from CHD v3 to v5 myself, again using chdman 0.199:

Code: Select all

chdman copy -i area51.chd -o ./test/area51.chd

Now the interesting part that I don't quite understand - here's chdmans info output for the newly converted v5 file:

Code: Select all

chdman - MAME Compressed Hunks of Data (CHD) manager 0.199 (mame0199)
Input file:   area51.chd
File Version: 5
Logical size: 1,281,982,464 bytes
Hunk Size:    4,096 bytes
Total Hunks:  312,984
Unit Size:    512 bytes
Total Units:  2,503,872
Compression:  lzma (LZMA), zlib (Deflate), huff (Huffman), flac (FLAC)
CHD size:     497,632,790 bytes
Ratio:        38.8%
SHA1:         4b2fc53072606d1f400a35d58b8018bdb184a459
Data SHA1:    9ea749404c9a5d44f407cdb8803293ec0d61410d
Metadata:     Tag='GDDD'  Index=0  Length=35 bytes
              CYLS:2484,HEADS:16,SECS:63,BPS:512.

In comparison to the "original" 0.161 CHD v5:

Code: Select all

SHA1:         3b303bc37e206a6d7339352c869f050d04186f11
Data SHA1:    9ea749404c9a5d44f407cdb8803293ec0d61410d

"Data SHA1" matches, "SHA1" is different. Why?

I found some information in the comments on https://github.com/mamedev/mame/blob/ma ... util/chd.h and compared the two files with a hex editor.

There's a difference between Bytes 85 and 105:

Code: Select all

"Official" 0.161: 		3B 30 3B C3 7E 20 6A 6D 73 39 35 2C 86 9F 05 0D 04 18 6F 11

updated from 0.78 by me:	4B 2F C5 30 72 60 6D 1F 40 0A 35 D5 8B 80 18 BD B1 84 A4 59

According to the source code comments in chd.h this is the "combined raw+meta SHA1" checksum, basically what chdman just calls "SHA1" for v5 CHDs. The only other difference between the two files is Byte 129:

Code: Select all

"Official" 0.161:			01

updated from 0.78 by me:		00

so other than the different SHA1 in the header there is only one byte (respectively only one bit) difference.

I don't quite understand what "comined raw+meta SHA1" means - is it the checksum of the raw, uncompressed data in combination with the header and some kind of other meta information? If so, it would explain why the checksums don't match - because of that one different bit in Byte 129.

But what is that byte? If I understand the comments correctly and didn't miscount the bytes, it's "UINT48 datastart", "offset of first block". Why would that be different and does it have any effect at all? I'm unfortunately not familiar enough with C/C++ to figure this out by myself.

I also figured out that if I take a different approach by going v3 => v4 => v5 (original 0.78 v3 CHD => chdman 0.145 ("update") (creates v4) => chdman 0.199 ("copy") (creates v5)) instead of directly going v3 => v5, I end up with an identical CHD file to the "original" 0.161.

Which one of those is the correct and/or recommended way?

And a bonus question:

As there is information about "combined raw+meta SHA1 of parent" in the header: Are there actually split CHDs in existence or is that just an implemented feature which was/is never actually used?