Canonical Benchmark Specification¶

Purpose¶

The 0.1.0 scientific benchmark for pyEUVTools should not depend solely on a previously generated GX-style response SAV whose provenance is incomplete.

The canonical benchmark set should instead be a family of raw IDL AIA temperature-response artifacts produced directly from aia_get_response with fully recorded generation settings.

Effective SSW state model¶

For compatibility work, the important distinction is between requested keyword state and effective SSW response state.

Requested states a user can ask for:

no evenorm, no chiantifix
evenorm only
chiantifix requested without evenorm
evenorm and chiantifix

Effective SSW states that matter scientifically:

raw: no evenorm, no chiantifix
evenorm: evenorm applied, chiantifix not applied
evenorm_chiantifix: both applied

The nominal chiantifix-only request is not an independent scientific state. In SSW, requesting chiantifix for a temperature response forces evenorm on, so that request normalizes to the evenorm_chiantifix state.

Benchmark ladder¶

The benchmark ladder should be built in this order:

raw baseline: no evenorm, no chiantifix
evenorm-only benchmark
evenorm_chiantifix benchmark
optional chiantifix-only request artifact as a behavior check demonstrating that SSW normalizes it to evenorm_chiantifix

Canonical IDL calls¶

Raw baseline:

response = aia_get_response(timedepend_date=obs_time_vms, /temperature, /dn, evenorm=0, chiantifix=0)

evenorm only:

response = aia_get_response(timedepend_date=obs_time_vms, /temperature, /dn, /evenorm, chiantifix=0)

evenorm_chiantifix:

response = aia_get_response(timedepend_date=obs_time_vms, /temperature, /dn, /evenorm, /chiantifix)

The benchmark script may wrap these calls for logging, metadata capture, and file writing, but the scientific source object for parity should be the direct IDL response structure in each case.

The repository draft script is:

scripts/idl/GenerateCanonicalAIABenchmark.pro

Initial benchmark date¶

The initial benchmark date for 0.1.0 should be pinned to:

2025-11-26T15:34:31.400

This keeps the first raw benchmark aligned with the existing SunCAST test-model epoch already used elsewhere in the workspace, while replacing the incomplete legacy fixture provenance with a direct and reproducible IDL source artifact.

Required benchmark artifacts¶

For 0.1.0, the no-correction raw benchmark artifact is the primary baseline.

The evenorm-only and evenorm_chiantifix artifacts are strongly recommended follow-on references for isolating correction-layer differences.

The derived GX-style compatibility artifact is explicitly deferred until after the first scientific release target is met.

1. Raw baseline artifact¶

This is the primary scientific baseline. It should contain the direct IDL aia_get_response output for the no-correction state plus a provenance structure.

2. Correction-layer follow-on artifacts¶

These artifacts should use the same observation time and generator contract, but with correction keywords varied one layer at a time:

evenorm
evenorm_chiantifix

An optional chiantifix-request artifact may also be generated to document the fact that SSW normalizes this request to the evenorm_chiantifix effective state.

Recommended saved variables:

raw_response: the direct IDL response returned by aia_get_response
metadata: a provenance structure with the required fields listed below

Deferred artifact: GX-style compatibility layer¶

GX-style derived artifacts may be added later, but they are not part of the minimum benchmark contract for validating the Python implementation or issuing the first public release.

Required provenance fields¶

The metadata structure for a benchmark artifact must record at least:

instrument
obs_time
timedepend_date
evenorm
chiantifix
requested_state
effective_state
idl_version
ssw_root or another stable SSW context identifier
generator
generation_time_utc
source_effarea_file
source_emissivity_file
response_units
warnings_observed

Strongly recommended additional fields:

effarea_version
emiss_version
source_model
notes
benchmark_role, for example raw_reference

Warning capture¶

The benchmark generation process should record whether IDL emitted floating-point warnings such as divide-by-zero, underflow, overflow, or illegal operand.

These warnings do not automatically invalidate the benchmark, but they are part of the provenance and must be recorded so the benchmark can be regenerated and audited consistently.

Benchmark decision rule¶

For 0.1.0, scientific parity should be evaluated against the raw no-correction benchmark artifact first.

After the raw baseline is matched, the evenorm layer should be validated, and only then should the chiantifix layer be added.

The derived GX-style artifact is a downstream compatibility target, not the primary scientific reference.

Python parity target¶

The intended order of implementation is:

Reproduce the raw IDL AIA temperature-response structure in Python.
Add the evenorm correction layer and validate it separately.
Add the chiantifix correction layer in the same post-fold place where SSW applies it.
Document any remaining scientific mismatches explicitly.
Release 0.1.0 once the raw benchmark parity target and provenance requirements are satisfied.
Add GX-style compatibility artifacts later as a follow-on milestone.

Interim status¶

The currently vendored canonical baseline artifact is the raw no-correction flavor and lives in:

benchmark-data/aia/20251126T153431/aia_raw_response_20251126T153431_raw.sav

The previously vendored evenorm_chiantifix artifact remains useful as the next correction-layer reference and still lives alongside it:

benchmark-data/aia/20251126T153431/aia_raw_response_20251126T153431.sav

The older pyGXrender-test-data AIA response fixture remains useful for legacy structural checks, but it is no longer the primary scientific release benchmark.

Expected generator contract¶

The canonical IDL benchmark script should:

Accept a fixed observation time.
Call aia_get_response with explicit temperature, DN, and correction-state settings.
Save the raw response structure.
Save full provenance metadata, including requested and applied state.
Emit or record checksums for the benchmark artifacts.

The first draft of this contract is implemented in:

scripts/idl/GenerateCanonicalAIABenchmark.pro

Repository policy¶

Because the canonical AIA benchmark fixture is small, it is reasonable to vendor benchmark artifacts directly in pyEUVTools for reproducible testing, as long as provenance and checksums are tracked alongside them.

Generated comparison outputs, caches, timing logs, and benchmark summaries are not part of that canonical benchmark-data contract. Those derived workflow artifacts should live under backend-specific directories in benchmark-results/ so preserved fiasco reference runs and later hybrid runs do not overwrite one another by default.

The initial vendored raw baseline checksum is tracked alongside the artifact in:

benchmark-data/aia/20251126T153431/aia_raw_response_20251126T153431_raw.sha256