Svn checksum mismatch while updating eclipse
Another option might be to implement a checksum that's blocksize-independent, for when block sizes are different.Currently the file checksum works by taking the CRC32 for every 512 byte chunk of the block, combining these with MD5 into a single checksum for the block, then combining these with MD5 into a single checksum for the file.
If instead the client could directly retrieve the list of CRC32s from the datanode then it could combine them into a blocksize-independent checksum (so long as block Size is a multiple of bytes Per Checksum and bytes Per Checksum is the same between the filesystems, which is ordinarily the case).
I wasn't convinced that combining CRC32-checksums together to form a higher-level checksum could be correct.
(Thanks for the explanation.) Yep, that should take care of #2 (above), but not #1.
When copying files between 2 clusters with different default block-sizes, one sees that the copy fails with a checksum-mismatch, even though the files have identical contents. Console output: https://builds.apache.org/job/Pre Commit-MAPREDUCE-Build/3412//console This message is automatically generated. Rather it would be safer to advise people to, when they attempt to copy files with different block sizes, to either specify -pb or -skip Crc.
The reason is that on HDFS, a file's checksum is unfortunately a function of the block-size of the file. = Block Size(), since the checksums are guaranteed to differ in this case. Edit: I've modified the fix to warn the user (instead of skipping the checksum-check). The code now fails the copy, and suggests that the user either use -pb to preserve block-size, or consider -skip Crc (and forgo copy validation entirely). Here are the results of testing the latest attachment against trunk revision . So better documentation, warnings and error messages might suffice.