Skip to content

Add Compression best practices guide#52968

Open
alinpahontu2912 wants to merge 5 commits intodotnet:mainfrom
alinpahontu2912:zip_tar_bestpractices
Open

Add Compression best practices guide#52968
alinpahontu2912 wants to merge 5 commits intodotnet:mainfrom
alinpahontu2912:zip_tar_bestpractices

Conversation

@alinpahontu2912
Copy link
Copy Markdown
Member

@alinpahontu2912 alinpahontu2912 commented Apr 10, 2026

Summary

Add a guide explaining how to best work with Zip and Tar archives in .NET.


Internal previews

📄 File 🔗 Preview link
docs/fundamentals/toc.yml docs/fundamentals/toc
docs/standard/io/zip-tar-best-practices.md Best practices for working with ZIP and TAR archives in .NET

Comment thread docs/standard/io/zip-tar-best-practices.md
Comment thread docs/standard/io/zip-tar-best-practices.md Outdated
Comment thread docs/standard/io/snippets/zip-tar-best-practices/csharp/Program.cs Outdated
Comment thread docs/standard/io/snippets/zip-tar-best-practices/csharp/Program.cs Outdated
Comment thread docs/standard/io/snippets/zip-tar-best-practices/csharp/Program.cs Outdated
Comment thread docs/standard/io/zip-tar-best-practices.md Outdated
Comment thread docs/standard/io/zip-tar-best-practices.md Outdated
Comment thread docs/standard/io/zip-tar-best-practices.md Outdated
Comment thread docs/standard/io/zip-tar-best-practices.md Outdated
Comment thread docs/standard/io/zip-tar-best-practices.md Outdated
Copy link
Copy Markdown
Member

@rzikm rzikm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's getting better, few additional comments.

Comment thread docs/standard/io/snippets/zip-tar-best-practices/csharp/Program.cs
Comment thread docs/standard/io/snippets/zip-tar-best-practices/csharp/Program.cs Outdated
Comment thread docs/standard/io/snippets/zip-tar-best-practices/csharp/Program.cs Outdated
Comment thread docs/standard/io/zip-tar-best-practices.md
Comment thread docs/standard/io/zip-tar-best-practices.md Outdated
Comment thread docs/standard/io/zip-tar-best-practices.md Outdated
Comment thread docs/standard/io/zip-tar-best-practices.md Outdated
Comment thread docs/standard/io/zip-tar-best-practices.md Outdated

## Data integrity

ZIP entries include a CRC-32 checksum that you can use to verify data hasn't been corrupted or tampered with.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should also mention TAR CRC in the first paragraph, now users might assume that this section is Zip-only and skip the rest of it.

@alinpahontu2912 alinpahontu2912 requested a review from rzikm April 16, 2026 13:13
@alinpahontu2912 alinpahontu2912 marked this pull request as ready for review April 23, 2026 08:26
@alinpahontu2912 alinpahontu2912 requested a review from adegeo as a code owner April 23, 2026 08:26
Copilot AI review requested due to automatic review settings April 23, 2026 08:26
@alinpahontu2912 alinpahontu2912 requested a review from a team as a code owner April 23, 2026 08:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new guidance article under File and stream I/O that explains how to work with ZIP and TAR archives in .NET, with a focus on API selection, safe extraction patterns, and operational considerations.

Changes:

  • Adds a new best-practices article for ZIP and TAR archives, including security guidance for untrusted input.
  • Adds a new C# snippet project and a consolidated Program.cs containing the referenced code regions.
  • Links the new article from docs/fundamentals/toc.yml.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
docs/standard/io/zip-tar-best-practices.md New best-practices guide covering API choice, trusted vs. untrusted extraction, memory/perf, platform differences, and encryption notes.
docs/standard/io/snippets/zip-tar-best-practices/csharp/Project.csproj New snippet project targeting net11.0 for compiling the article snippets.
docs/standard/io/snippets/zip-tar-best-practices/csharp/Program.cs Adds the C# snippet implementations referenced by the article.
docs/fundamentals/toc.yml Adds a TOC entry pointing to the new best-practices article.

Comment on lines +195 to +200
Both ZIP and TAR formats include CRC-32 checksums that you can use to verify data hasn't been corrupted or tampered with.

Starting with .NET 11, the runtime validates CRC-32 checksums automatically when reading ZIP entries. When you read an entry's data stream to completion, the runtime compares the computed CRC of the decompressed data against the checksum stored in the archive. If they don't match, an `InvalidDataException` is thrown. .NET 11 also validates CRC-32 checksums in TAR entry headers.

> [!NOTE]
> In prior versions of .NET, no CRC validation was performed on read. The runtime computed CRC values when writing entries (for storage in the archive), but never verified them during extraction. If you're targeting a runtime older than .NET 11, be aware that corrupt or tampered entries are silently accepted.
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “Data integrity” section incorrectly says TAR archives include CRC-32 checksums and that .NET 11 validates CRC-32 in TAR headers. TAR uses a header checksum (not CRC-32). Consider updating this section to describe ZIP CRC-32 separately from TAR header checksum validation (for example, align wording with docs/core/compatibility/core-libraries/11/tar-checksum-validation.md).

Suggested change
Both ZIP and TAR formats include CRC-32 checksums that you can use to verify data hasn't been corrupted or tampered with.
Starting with .NET 11, the runtime validates CRC-32 checksums automatically when reading ZIP entries. When you read an entry's data stream to completion, the runtime compares the computed CRC of the decompressed data against the checksum stored in the archive. If they don't match, an `InvalidDataException` is thrown. .NET 11 also validates CRC-32 checksums in TAR entry headers.
> [!NOTE]
> In prior versions of .NET, no CRC validation was performed on read. The runtime computed CRC values when writing entries (for storage in the archive), but never verified them during extraction. If you're targeting a runtime older than .NET 11, be aware that corrupt or tampered entries are silently accepted.
ZIP and TAR archives use different integrity checks. ZIP stores CRC-32 values for entry data. TAR stores a header checksum for each entry header.
Starting with .NET 11, the runtime validates ZIP CRC-32 values automatically when reading ZIP entries. When you read an entry's data stream to completion, the runtime compares the computed CRC-32 value of the decompressed data against the value stored in the archive. If the values don't match, an `InvalidDataException` is thrown. Starting with .NET 11, the runtime also validates TAR header checksums when reading TAR entry headers.
> [!NOTE]
> In versions earlier than .NET 11, the runtime didn't validate ZIP CRC-32 values on read. The runtime computed CRC-32 values when writing ZIP entries for storage in the archive, but didn't verify them during extraction. If you target a runtime earlier than .NET 11, corrupt or tampered ZIP entries might be accepted silently.

Copilot uses AI. Check for mistakes.
Comment on lines +98 to +101
// uncompressed size. Note: this value is read from the archive header
// and could be spoofed by a malicious archive — for defense in depth,
// also monitor actual bytes read during decompression (see the zip
// bomb section for a streaming size check example).
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment here says entry.Length “could be spoofed” and recommends monitoring actual decompressed bytes, but in modern .NET ZipArchiveEntry.Open() truncates decompressed output to the entry’s declared uncompressed size (see includes/core-changes/corefx/3.0/ziparchiveentry-and-inconsistent-entry-sizes.md). This note is misleading and also references a “streaming size check example” that isn’t present in the article. Reword or remove this guidance so it matches current runtime behavior and the examples provided.

Suggested change
// uncompressed size. Note: this value is read from the archive header
// and could be spoofed by a malicious archive — for defense in depth,
// also monitor actual bytes read during decompression (see the zip
// bomb section for a streaming size check example).
// uncompressed size. In modern .NET, entry.Open() won't produce more
// than entry.Length bytes, so checking the declared size matches the
// runtime behavior that this sample relies on.

Copilot uses AI. Check for mistakes.
Comment on lines +257 to +259
string destPath = Path.Join(destDir, entry.Name);
using var fileStream = File.Create(destPath);
entry.DataStream.CopyTo(fileStream);
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TarStreamingRead writes entries directly to Path.Join(destDir, entry.Name) without creating parent directories. For common TAR entries with nested paths, File.Create(destPath) will throw because the directory doesn’t exist. Consider creating the parent directory first (and, if the sample might be used with untrusted input, also reuse the path validation approach shown earlier in the article).

Copilot uses AI. Check for mistakes.
@rzikm
Copy link
Copy Markdown
Member

rzikm commented Apr 24, 2026

cc also @GrabYourPitchforks and @blowdart for wording

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants