Round #1141
22 May 2026, 17:07· python· claude-sonnet-4-6· r13-clean-hints
- Complexity
- band 1
- ZP / UO / OBS
- 3 / 1 / 21
- Journey weight
- 2178
- Lightshift ΔL
- —
Prompt
Write a Python script that uses pathlib.Path to walk a directory tree, calls path.glob on the pattern "*.log" recursively, reads each file with path.read_text, and extracts HTTP status codes using re.compile and re.finditer with a named capture group. Aggregate the counts into a dict keyed by status code, then emit the result via json.dumps with indent=2 to stdout. Guard the script body with an if __name__ == "__main__": block so the helper functions remain importable from other modules. Raise a clear exception if no log files match.
Zero-point index (what the engine surfaced)
ZPI-ZPI-0001 | 2026-05-22T17:07:50Z
Write a Python script that uses pathlib.Path to walk a directory tree, calls path.glob on the pattern "*.log" recursively, reads each file with path.read_text, and extracts HTTP status codes using re.compile and re.finditer with a named capture group. Aggregate the counts into a dict keyed by status code, then emit the result via json.dumps with indent=2 to stdout. Guard the script body with an if __name__ == "__main__": block so the helper functions remain importable from other modules. Raise a clear exception if no log files match.
ZERO POINT — 3 entries (structural invariants)
● hsl(277.10, 100%, 15%) __name__ == "__main__" guard [contract]
● hsl(355.85, 100%, 15%) module vs script execution [contract]
● hsl(125.46, 100%, 15%) import-time isolation [contract]
UNOBSERVED — 1 entries (implied truths)
○ hsl(31.79, 100%, 15%) async [feature]
OBSERVED — 21 entries (explicit in request)
◉ hsl(9.23, 100%, 50%) match [feature]
◉ hsl(18.46, 100%, 16%) raise [feature]
◉ hsl(33.84, 100%, 15%) with [feature]
◉ hsl(52.31, 100%, 15%) dict [feature]
◉ hsl(170.25, 100%, 15%) __name__ [feature]
◉ hsl(172.30, 100%, 15%) __dict__ [feature]
◉ hsl(176.40, 100%, 15%) __main__ [feature]
◉ hsl(269.73, 100%, 15%) os path [feature]
◉ hsl(348.75, 100%, 3%) exception group [feature]
◉ hsl(1.85, 100%, 7%) read text file [function]
◉ hsl(7.38, 100%, 7%) write text file [function]
◉ hsl(168.00, 100%, 7%) pathlib Path [function]
◉ hsl(175.38, 100%, 19%) path glob [function]
◉ hsl(177.23, 100%, 7%) path read_text [function]
◉ hsl(179.08, 100%, 13%) path write_text [function]
◉ hsl(182.77, 100%, 7%) os walk [function]
◉ hsl(238.15, 100%, 7%) json dumps [function]
◉ hsl(256.62, 100%, 7%) re match [function]
◉ hsl(262.15, 100%, 7%) re finditer [function]
◉ hsl(265.85, 100%, 7%) re compile [function]
◉ hsl(269.54, 100%, 7%) named capture group [function]
COMPLEXITY │ 25 entries │ spread: 347° │ band: MEDIUM
Baseline response (no ZPI)
```python
#!/usr/bin/env python3
"""
log_status_aggregator.py
Walk a directory tree, find all *.log files, extract HTTP status codes
from each file, aggregate the counts, and print the result as JSON.
Usage:
python log_status_aggregator.py <directory>
"""
import json
import re
import sys
from pathlib import Path
from typing import Dict
# ---------------------------------------------------------------------------
# Compiled pattern – named capture group "status" matches 3-digit HTTP codes
# that follow a space (e.g. "GET /index.html HTTP/1.1" 200 512)
# The pattern also handles common log formats such as:
# - Combined/Common Log Format: ... "GET / HTTP/1.1" 200 1234
# - Simple lines that contain an HTTP status code somewhere
# ---------------------------------------------------------------------------
HTTP_STATUS_PATTERN: re.Pattern = re.compile(
r'(?<!\d)(?P<status>[1-5]\d{2})(?!\d)'
)
def find_log_files(root: Path) -> list[Path]:
"""
Recursively find all *.log files under *root*.
Parameters
----------
root : Path
The root directory to search.
Returns
-------
list[Path]
A sorted list of matching Path objects.
Raises
------
NotADirectoryError
If *root* does not point to an existing directory.
FileNotFoundError
If no *.log files are found beneath *root*.
"""
if not root.is_dir():
raise NotADirectoryError(
f"The provided path is not a directory or does not exist: {root}"
)
log_files: list[Path] = sorted(root.glob("**/*.log"))
if not log_files:
raise FileNotFoundError(
f"No *.log files were found under the directory: {root}"
)
return log_files
def extract_status_codes(text: str) -> list[str]:
"""
Extract all HTTP status codes from *text* using the compiled regex.
Parameters
----------
text : str
The raw text content of a log file.
Returns
-------
list[str]
A list of status-code strings (e.g. ['200', '404', '200']).
"""
return [match.group("status") for match in HTTP_STATUS_PATTERN.finditer(text)]
def aggregate_status_codes(log_files: list[Path]) -> Dict[str, int]:
"""
Read each log file and aggregate HTTP status code counts.
Parameters
----------
log_files : list[Path]
Paths to the log files to process.
Returns
-------
Dict[str, int]
A dictionary mapping each status code string to its total count,
sorted in ascending order by status code.
"""
counts: Dict[str, int] = {}
for log_path in log_files:
try:
text = log_path.read_text(encoding="utf-8", errors="replace")
except OSError as exc:
# Report the problematic file but continue processing others.
print(
f"Warning: Could not read {log_path}: {exc}",
file=sys.stderr,
)
continue
for code in extract_status_codes(text):
counts[code] = counts.get(code, 0) + 1
# Return a new dict sorted by status code for deterministic output.
return dict(sorted(counts.items()))
def process_directory(root: Path) -> Dict[str, int]:
"""
Orchestrate the full pipeline: find files → read → aggregate.
Parameters
----------
root : Path
The root directory to search for log files.
Returns
-------
Dict[str, int]
Aggregated HTTP status code counts.
"""
log_files = find_log_files(root)
print(f"Found {len(log_files)} log file(s) under '{root}'.", file=sys.stderr)
return aggregate_status_codes(log_files)
# ---------------------------------------------------------------------------
# Entry point
# ---------------------------------------------------------------------------
if __name__ == "__main__":
if len(sys.argv) != 2:
print(
f"Usage: python {Path(__file__).name} <directory>",
file=sys.stderr,
)
sys.exit(1)
root_directory = Path(sys.argv[1])
try:
results = process_directory(root_directory)
except (NotADirectoryError, FileNotFoundError) as error:
print(f"Error: {error}", file=sys.stderr)
sys.exit(1)
# Emit the aggregated counts as pretty-printed JSON to stdout.
print(json.dumps(results, indent=2))
```
---
### How It Works
| Layer | Function | Responsibility |
|---|---|---|
| Discovery | `find_log_files` | Uses `Path.glob("**/*.log")` to walk the tree; raises `FileNotFoundError` if nothing matches |
| Extraction | `extract_status_codes` | Applies `re.finditer` with named group `(?P<status>…)` to a text string |
| Aggregation | `aggregate_status_codes` | Calls `Path.read_text` on each file and tallies counts into a `dict` |
| Orchestration | `process_directory` | Composes the two steps above; importable by other modules |
| CLI | `__main__` block | Parses `sys.argv`, calls `process_directory`, prints `json.dumps(…, indent=2)` |
### Example run
```
$ python log_status_aggregator.py /var/log/nginx
Found 4 log file(s) under '/var/log/nginx'.
{
"200": 15423,
"301": 87,
"304": 2310,
"404": 312,
"500": 9
}
```
### Importing from another module
Because the script body is guarded by `if __name__ == "__main__":`, the helpers are freely importable:
```python
from log_status_aggregator import (
find_log_files,
extract_status_codes,
aggregate_status_codes,
process_directory,
)
```
Enhanced response (with ZPI)
```python
"""
log_status_counter.py
Walk a directory tree, extract HTTP status codes from *.log files,
and print aggregated counts as JSON.
"""
import json
import re
import sys
from collections import defaultdict
from pathlib import Path
# Compiled once at import time — safe to reuse across calls.
_STATUS_PATTERN = re.compile(r'"[A-Z]+ \S+ HTTP/[\d.]+" (?P<status>\d{3})')
def find_log_files(root: Path) -> list[Path]:
"""Return all *.log files found recursively under *root*."""
return list(root.glob("**/*.log"))
def extract_status_codes(text: str) -> list[str]:
"""Return every HTTP status code found in *text*."""
return [m.group("status") for m in _STATUS_PATTERN.finditer(text)]
def aggregate_counts(log_files: list[Path]) -> dict[str, int]:
"""
Read each file and tally status codes.
Raises
------
ValueError
If *log_files* is empty.
OSError
Re-raised from Path.read_text on permission / encoding errors.
"""
if not log_files:
raise ValueError("No *.log files matched the given directory.")
counts: dict[str, int] = defaultdict(int)
for path in log_files:
text = path.read_text(encoding="utf-8", errors="replace")
for code in extract_status_codes(text):
counts[code] += 1
# Return a plain dict sorted by status code for deterministic output.
return dict(sorted(counts.items()))
def run(directory: str | Path) -> None:
"""
Orchestrate the walk → extract → print pipeline.
Parameters
----------
directory:
Root path to search. Must exist and be a directory.
Raises
------
NotADirectoryError
If *directory* does not exist or is not a directory.
ValueError
If no *.log files are found.
"""
root = Path(directory).expanduser().resolve()
if not root.is_dir():
raise NotADirectoryError(f"Not a directory: {root}")
log_files = find_log_files(root)
counts = aggregate_counts(log_files)
print(json.dumps(counts, indent=2))
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser(
description="Aggregate HTTP status codes from *.log files in a directory tree."
)
parser.add_argument(
"directory",
nargs="?",
default=".",
help="Root directory to search (default: current working directory).",
)
args = parser.parse_args()
try:
run(args.directory)
except (NotADirectoryError, ValueError) as exc:
print(f"Error: {exc}", file=sys.stderr)
sys.exit(1)
```
**How it works:**
- **`find_log_files`** uses `Path.glob("**/*.log")` for a fully recursive search and materialises the iterator into a list so emptiness is detectable before any I/O.
- **`_STATUS_PATTERN`** is compiled once at module level (not inside a loop). The named group `(?P<status>\d{3})` matches the three-digit code that follows the quoted request line in Combined Log Format; `re.finditer` streams matches without loading the whole match list.
- **`aggregate_counts`** raises `ValueError` with a clear message when the file list is empty, satisfying the requirement. `defaultdict(int)` keeps the tallying concise, and the result is sorted before returning so the JSON output is stable.
- **`run`** resolves the path with `.expanduser().resolve()` so relative paths and `~` both work correctly.
- The `if __name__ == "__main__":` guard keeps all three helpers (`find_log_files`, `extract_status_codes`, `aggregate_counts`) freely importable and independently testable; only `argparse` and `sys.exit` live inside the guard.
Grader chain
Each grader reads the same baseline + enhanced pair and casts an independent verdict. Disagreement is the framework's own signal — not noise to be averaged away.
V1samba/Meta-Llama-3.3-70B-Instruct
BetterV4qwen2.5-coder-3b-instruct
BetterV5cerebras/qwen-3-235b-a22b-instruct-2507
BetterV7qwen2.5-coder-7b-instruct
BetterV8openai/gpt-4.1
BetterV9anthropic/claude-opus-4-7
SameV12openai/gpt-4o
BetterComments
Our grader said what it said. What do you say? Comment as a guest below.
No comments yet. Be the first to say what you make of this round.