Add Compression best practices guide#52968
Add Compression best practices guide#52968alinpahontu2912 wants to merge 5 commits intodotnet:mainfrom
Conversation
rzikm
left a comment
There was a problem hiding this comment.
It's getting better, few additional comments.
|
|
||
| ## Data integrity | ||
|
|
||
| ZIP entries include a CRC-32 checksum that you can use to verify data hasn't been corrupted or tampered with. |
There was a problem hiding this comment.
I think you should also mention TAR CRC in the first paragraph, now users might assume that this section is Zip-only and skip the rest of it.
There was a problem hiding this comment.
Pull request overview
Adds a new guidance article under File and stream I/O that explains how to work with ZIP and TAR archives in .NET, with a focus on API selection, safe extraction patterns, and operational considerations.
Changes:
- Adds a new best-practices article for ZIP and TAR archives, including security guidance for untrusted input.
- Adds a new C# snippet project and a consolidated
Program.cscontaining the referenced code regions. - Links the new article from
docs/fundamentals/toc.yml.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| docs/standard/io/zip-tar-best-practices.md | New best-practices guide covering API choice, trusted vs. untrusted extraction, memory/perf, platform differences, and encryption notes. |
| docs/standard/io/snippets/zip-tar-best-practices/csharp/Project.csproj | New snippet project targeting net11.0 for compiling the article snippets. |
| docs/standard/io/snippets/zip-tar-best-practices/csharp/Program.cs | Adds the C# snippet implementations referenced by the article. |
| docs/fundamentals/toc.yml | Adds a TOC entry pointing to the new best-practices article. |
| Both ZIP and TAR formats include CRC-32 checksums that you can use to verify data hasn't been corrupted or tampered with. | ||
|
|
||
| Starting with .NET 11, the runtime validates CRC-32 checksums automatically when reading ZIP entries. When you read an entry's data stream to completion, the runtime compares the computed CRC of the decompressed data against the checksum stored in the archive. If they don't match, an `InvalidDataException` is thrown. .NET 11 also validates CRC-32 checksums in TAR entry headers. | ||
|
|
||
| > [!NOTE] | ||
| > In prior versions of .NET, no CRC validation was performed on read. The runtime computed CRC values when writing entries (for storage in the archive), but never verified them during extraction. If you're targeting a runtime older than .NET 11, be aware that corrupt or tampered entries are silently accepted. |
There was a problem hiding this comment.
The “Data integrity” section incorrectly says TAR archives include CRC-32 checksums and that .NET 11 validates CRC-32 in TAR headers. TAR uses a header checksum (not CRC-32). Consider updating this section to describe ZIP CRC-32 separately from TAR header checksum validation (for example, align wording with docs/core/compatibility/core-libraries/11/tar-checksum-validation.md).
| Both ZIP and TAR formats include CRC-32 checksums that you can use to verify data hasn't been corrupted or tampered with. | |
| Starting with .NET 11, the runtime validates CRC-32 checksums automatically when reading ZIP entries. When you read an entry's data stream to completion, the runtime compares the computed CRC of the decompressed data against the checksum stored in the archive. If they don't match, an `InvalidDataException` is thrown. .NET 11 also validates CRC-32 checksums in TAR entry headers. | |
| > [!NOTE] | |
| > In prior versions of .NET, no CRC validation was performed on read. The runtime computed CRC values when writing entries (for storage in the archive), but never verified them during extraction. If you're targeting a runtime older than .NET 11, be aware that corrupt or tampered entries are silently accepted. | |
| ZIP and TAR archives use different integrity checks. ZIP stores CRC-32 values for entry data. TAR stores a header checksum for each entry header. | |
| Starting with .NET 11, the runtime validates ZIP CRC-32 values automatically when reading ZIP entries. When you read an entry's data stream to completion, the runtime compares the computed CRC-32 value of the decompressed data against the value stored in the archive. If the values don't match, an `InvalidDataException` is thrown. Starting with .NET 11, the runtime also validates TAR header checksums when reading TAR entry headers. | |
| > [!NOTE] | |
| > In versions earlier than .NET 11, the runtime didn't validate ZIP CRC-32 values on read. The runtime computed CRC-32 values when writing ZIP entries for storage in the archive, but didn't verify them during extraction. If you target a runtime earlier than .NET 11, corrupt or tampered ZIP entries might be accepted silently. |
| // uncompressed size. Note: this value is read from the archive header | ||
| // and could be spoofed by a malicious archive — for defense in depth, | ||
| // also monitor actual bytes read during decompression (see the zip | ||
| // bomb section for a streaming size check example). |
There was a problem hiding this comment.
The comment here says entry.Length “could be spoofed” and recommends monitoring actual decompressed bytes, but in modern .NET ZipArchiveEntry.Open() truncates decompressed output to the entry’s declared uncompressed size (see includes/core-changes/corefx/3.0/ziparchiveentry-and-inconsistent-entry-sizes.md). This note is misleading and also references a “streaming size check example” that isn’t present in the article. Reword or remove this guidance so it matches current runtime behavior and the examples provided.
| // uncompressed size. Note: this value is read from the archive header | |
| // and could be spoofed by a malicious archive — for defense in depth, | |
| // also monitor actual bytes read during decompression (see the zip | |
| // bomb section for a streaming size check example). | |
| // uncompressed size. In modern .NET, entry.Open() won't produce more | |
| // than entry.Length bytes, so checking the declared size matches the | |
| // runtime behavior that this sample relies on. |
| string destPath = Path.Join(destDir, entry.Name); | ||
| using var fileStream = File.Create(destPath); | ||
| entry.DataStream.CopyTo(fileStream); |
There was a problem hiding this comment.
TarStreamingRead writes entries directly to Path.Join(destDir, entry.Name) without creating parent directories. For common TAR entries with nested paths, File.Create(destPath) will throw because the directory doesn’t exist. Consider creating the parent directory first (and, if the sample might be used with untrusted input, also reuse the path validation approach shown earlier in the article).
|
cc also @GrabYourPitchforks and @blowdart for wording |
Summary
Add a guide explaining how to best work with Zip and Tar archives in .NET.
Internal previews