Skip to content

radioastronomyio/logmask-python-library

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

πŸ”’ logmask

Python License Docs Status

logmask banner

Deterministic, offline, map-based anonymization of IT infrastructure data in text files.

logmask is a Python CLI tool designed for MSP operational security. When engineers need to paste logs, configs, or transcripts into external tools (Claude, vendor support portals, community forums) they need to strip infrastructure identifiers first. logmask scans files for those identifiers, builds a persistent translation map, and performs single-pass replacement using an Aho-Corasick automaton. The tool is bidirectional: anonymize out, reveal back.


πŸ”­ Overview

The Problem

MSP engineers routinely share diagnostic data with third parties. Every log file, every config export, every support ticket risks leaking internal infrastructure topology: IP ranges, hostnames, domain names, user principal names, Active Directory SIDs. Manual redaction is slow, inconsistent, and error-prone.

The Approach

logmask treats anonymization as a deterministic mapping problem rather than a search-and-destroy exercise. A persistent CSV map links each real identifier to a fake replacement. The same input with the same map always produces byte-identical output. Maps are human-readable and auditable; engineers can open them in Excel, hand-edit entries, and share them across teams.

The replacement engine uses an Aho-Corasick automaton for single-pass, longest-match-wins substitution. This means overlapping identifiers (a hostname embedded in a FQDN, a subnet containing individual IPs) are handled correctly without multiple passes or ordering dependencies.

Critical Constraints

Constraint Detail
No build toolchain on endpoints All dependencies install via pip install from pre-built wheels. No C/Rust compilation. No admin elevation.
Windows-first Primary target: Entra-joined Windows 10/11 endpoints. Works in standard user context.
Offline execution Zero network calls at runtime. No cloud APIs, no telemetry, no update checks.
Deterministic Same input + same map = byte-identical output. Every time.
Human-readable maps CSV format. Engineers can open, audit, and hand-edit maps in Excel/Notepad.

πŸ“Š Project Status

Milestone Status Description
Core build βœ… Complete All modules implemented, parsers working
Unit tests βœ… Complete Parsers, map engine, replacer, roundtrip
Post-review fixes βœ… Complete Bug documentation, code review applied
Known bug fixes ⬜ Planned IPv4 collision check, FQDN suffix leakage, lazy-loading consistency
Scanner/CLI tests ⬜ Planned Unit test coverage for scanner.py and cli.py
Real-world validation ⬜ Planned Testing against production log corpus
PyPI release ⬜ Future Package and publish

Current Capabilities (v0.1.0)

The tool processes text files with comprehensive identifier detection across eight pattern types. The core replacement engine is correct and deterministic. Known issues are documented in AGENTS.md and marked inline in source.


🎯 Identifier Types

Type Pattern Target Example
ipv4 RFC1918 private IPs 10.0.1.50, 192.168.100.10
cidr Subnet notation 192.168.1.0/24, 10.0.0.0/16
hostname NetBIOS and FQDNs SQL-PROD-03, server.contoso.local
upn User Principal Names jsmith@contoso.com
guid Entra object IDs, Azure GUIDs a1b2c3d4-e5f6-7890-abcd-ef1234567890
sid Windows Security Identifiers S-1-5-21-123-456-789-1001
mac MAC addresses AA:BB:CC:11:22:33, 11-22-33-44-55-66
unc UNC paths \\\\FILESVR\\Finance$

πŸ—οΈ Architecture

Five modules, no framework. Parsers are internal callables in a dictionary registry.

Architecture Infographic

Components

Component Module Purpose
CLI cli.py argparse: 6 commands (init, scan, anonymize, reveal, map show, map add)
Scanner scanner.py Discovery engine: runs parsers, deduplicates, filters collisions
Parsers parsers/ Registry of detection functions, one per identifier type
Map Engine map_engine.py CSV map CRUD, scope merge (global + project), fake value generation
Replacer replacer.py Aho-Corasick automaton build + single-pass replace + reveal
Models models.py Frozen dataclasses: DetectedIdentifier, MapEntry, Config

Map Architecture

Translation maps are CSV files with two scope levels that merge at runtime:

Scope Location Purpose
Global %USERPROFILE%\.logmask\global_map.csv MSP-wide constants (jump servers, monitoring hosts, corporate domain)
Project ./.logmask/project_map.csv Client-specific identifiers for this diagnostic bundle

Project map overrides global map on original_value key collision. Merge happens at runtime load, never mutates either source file.


πŸ“ Repository Structure

logmask-python-library/
β”œβ”€β”€ πŸ“‚ assets/                      # Repository images
β”œβ”€β”€ πŸ“‚ docs/
β”‚   β”œβ”€β”€ πŸ“‚ documentation-standards/ # Templates, tagging strategy
β”‚   └── πŸ“„ logmask-buidl-spec-v1.md # Authoritative build specification
β”œβ”€β”€ πŸ“‚ internal-files/              # Working documents
β”œβ”€β”€ πŸ“‚ shared/                      # Cross-project utilities
β”œβ”€β”€ πŸ“‚ spec/                        # Specifications
β”œβ”€β”€ πŸ“‚ src/logmask/                 # Source (PEP 621 src layout)
β”‚   β”œβ”€β”€ πŸ“„ cli.py                   # argparse CLI
β”‚   β”œβ”€β”€ πŸ“„ scanner.py               # Discovery engine
β”‚   β”œβ”€β”€ πŸ“„ map_engine.py            # CSV map CRUD, scope merge, fake generation
β”‚   β”œβ”€β”€ πŸ“„ replacer.py              # Aho-Corasick automaton + single-pass replace
β”‚   β”œβ”€β”€ πŸ“„ models.py                # Frozen dataclasses (data contracts)
β”‚   └── πŸ“‚ parsers/                 # Detection registry
β”œβ”€β”€ πŸ“‚ staging/                     # Staged work
β”œβ”€β”€ πŸ“‚ tests/                       # Unit tests
β”œβ”€β”€ πŸ“‚ work-logs/                   # Development history
β”œβ”€β”€ πŸ“„ AGENTS.md                    # Agent context
β”œβ”€β”€ πŸ“„ CLAUDE.md                    # Pointer to AGENTS.md
β”œβ”€β”€ πŸ“„ pyproject.toml               # PEP 621 project config
β”œβ”€β”€ πŸ“„ LICENSE                      # MIT (code)
└── πŸ“„ LICENSE-DATA                 # CC BY 4.0 (documentation)

πŸš€ Getting Started

Prerequisites

  • Python 3.10 or higher
  • pip (no admin elevation required)
  • Windows 10/11 (primary target) or any OS with Python 3.10+

Installation

# Clone the repository
git clone https://github.com/radioastronomyio/logmask-python-library.git
cd logmask-python-library

# Install in development mode
pip install -e ".[dev]"

All dependencies have pre-built Windows wheels on PyPI; no compiler toolchain required:

Package Purpose
pyahocorasick >= 2.3.0 Aho-Corasick automaton (C extension)
pandas CSV map load/merge/write
rich Terminal table output

Quick Start

# Initialize a project (creates .logmask/ with empty map)
logmask init --client "Acme Corp"

# Scan files for infrastructure identifiers
logmask scan ./logs --ext .log .txt .json

# Anonymize: replace real values with fakes
logmask anonymize ./logs --out ./anonymized_logs

# Reveal: reverse the anonymization
logmask reveal ./anonymized_logs --out ./revealed_logs

# Inspect the translation map
logmask map show --scope project

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=src/logmask

# Run specific test file
pytest tests/test_parsers.py

πŸ“„ License

Code is licensed under the MIT License. See LICENSE for details.

Documentation and non-code content is licensed under CC BY 4.0. See LICENSE-DATA for details.


πŸ™ Acknowledgments

  • pyahocorasick for efficient multi-pattern matching
  • Anthropic for Claude and the agent ecosystem that motivated this tool

Last Updated: 2026-03-29 | v0.1.0 Alpha | Core Build Complete

About

Deterministic, offline, map-based anonymization of IT infrastructure data in text files

Resources

License

Unknown, Unknown licenses found

Licenses found

Unknown
LICENSE
Unknown
LICENSE-DATA

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

 
 
 

Contributors