Deterministic, offline, map-based anonymization of IT infrastructure data in text files.
logmask is a Python CLI tool designed for MSP operational security. When engineers need to paste logs, configs, or transcripts into external tools (Claude, vendor support portals, community forums) they need to strip infrastructure identifiers first. logmask scans files for those identifiers, builds a persistent translation map, and performs single-pass replacement using an Aho-Corasick automaton. The tool is bidirectional: anonymize out, reveal back.
MSP engineers routinely share diagnostic data with third parties. Every log file, every config export, every support ticket risks leaking internal infrastructure topology: IP ranges, hostnames, domain names, user principal names, Active Directory SIDs. Manual redaction is slow, inconsistent, and error-prone.
logmask treats anonymization as a deterministic mapping problem rather than a search-and-destroy exercise. A persistent CSV map links each real identifier to a fake replacement. The same input with the same map always produces byte-identical output. Maps are human-readable and auditable; engineers can open them in Excel, hand-edit entries, and share them across teams.
The replacement engine uses an Aho-Corasick automaton for single-pass, longest-match-wins substitution. This means overlapping identifiers (a hostname embedded in a FQDN, a subnet containing individual IPs) are handled correctly without multiple passes or ordering dependencies.
| Constraint | Detail |
|---|---|
| No build toolchain on endpoints | All dependencies install via pip install from pre-built wheels. No C/Rust compilation. No admin elevation. |
| Windows-first | Primary target: Entra-joined Windows 10/11 endpoints. Works in standard user context. |
| Offline execution | Zero network calls at runtime. No cloud APIs, no telemetry, no update checks. |
| Deterministic | Same input + same map = byte-identical output. Every time. |
| Human-readable maps | CSV format. Engineers can open, audit, and hand-edit maps in Excel/Notepad. |
| Milestone | Status | Description |
|---|---|---|
| Core build | β Complete | All modules implemented, parsers working |
| Unit tests | β Complete | Parsers, map engine, replacer, roundtrip |
| Post-review fixes | β Complete | Bug documentation, code review applied |
| Known bug fixes | β¬ Planned | IPv4 collision check, FQDN suffix leakage, lazy-loading consistency |
| Scanner/CLI tests | β¬ Planned | Unit test coverage for scanner.py and cli.py |
| Real-world validation | β¬ Planned | Testing against production log corpus |
| PyPI release | β¬ Future | Package and publish |
The tool processes text files with comprehensive identifier detection across eight pattern types. The core replacement engine is correct and deterministic. Known issues are documented in AGENTS.md and marked inline in source.
| Type | Pattern Target | Example |
|---|---|---|
ipv4 |
RFC1918 private IPs | 10.0.1.50, 192.168.100.10 |
cidr |
Subnet notation | 192.168.1.0/24, 10.0.0.0/16 |
hostname |
NetBIOS and FQDNs | SQL-PROD-03, server.contoso.local |
upn |
User Principal Names | jsmith@contoso.com |
guid |
Entra object IDs, Azure GUIDs | a1b2c3d4-e5f6-7890-abcd-ef1234567890 |
sid |
Windows Security Identifiers | S-1-5-21-123-456-789-1001 |
mac |
MAC addresses | AA:BB:CC:11:22:33, 11-22-33-44-55-66 |
unc |
UNC paths | \\\\FILESVR\\Finance$ |
Five modules, no framework. Parsers are internal callables in a dictionary registry.
| Component | Module | Purpose |
|---|---|---|
| CLI | cli.py |
argparse: 6 commands (init, scan, anonymize, reveal, map show, map add) |
| Scanner | scanner.py |
Discovery engine: runs parsers, deduplicates, filters collisions |
| Parsers | parsers/ |
Registry of detection functions, one per identifier type |
| Map Engine | map_engine.py |
CSV map CRUD, scope merge (global + project), fake value generation |
| Replacer | replacer.py |
Aho-Corasick automaton build + single-pass replace + reveal |
| Models | models.py |
Frozen dataclasses: DetectedIdentifier, MapEntry, Config |
Translation maps are CSV files with two scope levels that merge at runtime:
| Scope | Location | Purpose |
|---|---|---|
| Global | %USERPROFILE%\.logmask\global_map.csv |
MSP-wide constants (jump servers, monitoring hosts, corporate domain) |
| Project | ./.logmask/project_map.csv |
Client-specific identifiers for this diagnostic bundle |
Project map overrides global map on original_value key collision. Merge happens at runtime load, never mutates either source file.
logmask-python-library/
βββ π assets/ # Repository images
βββ π docs/
β βββ π documentation-standards/ # Templates, tagging strategy
β βββ π logmask-buidl-spec-v1.md # Authoritative build specification
βββ π internal-files/ # Working documents
βββ π shared/ # Cross-project utilities
βββ π spec/ # Specifications
βββ π src/logmask/ # Source (PEP 621 src layout)
β βββ π cli.py # argparse CLI
β βββ π scanner.py # Discovery engine
β βββ π map_engine.py # CSV map CRUD, scope merge, fake generation
β βββ π replacer.py # Aho-Corasick automaton + single-pass replace
β βββ π models.py # Frozen dataclasses (data contracts)
β βββ π parsers/ # Detection registry
βββ π staging/ # Staged work
βββ π tests/ # Unit tests
βββ π work-logs/ # Development history
βββ π AGENTS.md # Agent context
βββ π CLAUDE.md # Pointer to AGENTS.md
βββ π pyproject.toml # PEP 621 project config
βββ π LICENSE # MIT (code)
βββ π LICENSE-DATA # CC BY 4.0 (documentation)
- Python 3.10 or higher
- pip (no admin elevation required)
- Windows 10/11 (primary target) or any OS with Python 3.10+
# Clone the repository
git clone https://github.com/radioastronomyio/logmask-python-library.git
cd logmask-python-library
# Install in development mode
pip install -e ".[dev]"All dependencies have pre-built Windows wheels on PyPI; no compiler toolchain required:
| Package | Purpose |
|---|---|
pyahocorasick >= 2.3.0 |
Aho-Corasick automaton (C extension) |
pandas |
CSV map load/merge/write |
rich |
Terminal table output |
# Initialize a project (creates .logmask/ with empty map)
logmask init --client "Acme Corp"
# Scan files for infrastructure identifiers
logmask scan ./logs --ext .log .txt .json
# Anonymize: replace real values with fakes
logmask anonymize ./logs --out ./anonymized_logs
# Reveal: reverse the anonymization
logmask reveal ./anonymized_logs --out ./revealed_logs
# Inspect the translation map
logmask map show --scope project# Run all tests
pytest
# Run with coverage
pytest --cov=src/logmask
# Run specific test file
pytest tests/test_parsers.pyCode is licensed under the MIT License. See LICENSE for details.
Documentation and non-code content is licensed under CC BY 4.0. See LICENSE-DATA for details.
- pyahocorasick for efficient multi-pattern matching
- Anthropic for Claude and the agent ecosystem that motivated this tool
Last Updated: 2026-03-29 | v0.1.0 Alpha | Core Build Complete

