DatasetModel

DatasetModel

============

In-memory model orchestrating:

  • The dataset AP matrix (systems × topics).

  • Streaming NSGA-II execution for BEST/WORST and sampling for AVERAGE.

  • Compact caches so aggregation/info queries don’t retain large populations.

Output artifacts

  • -Fun: objective rows (K, correlation) streamed during runs.

  • -Var: genotype rows as Base64 masks ("B64:<...>") streamed with -Fun.

  • BEST/WORST → only when the per-K representative improves (monotone).

  • AVERAGE → exactly one row per K.

  • -Top: top solutions per cardinality (≤ 10 per K), maintained via block replacement.

Branches

  • AVERAGE:

  • For each K = 1..N, sample numberOfRepetitions subsets.

  • Correlate vs full-set mean vector, stream exactly one FUN/VAR row.

  • Percentiles computed from the same samples. TOP unused.

  • BEST/WORST:

  • NSGA-II objective encoding (internal):

    • BEST → obj0 = +K, obj1 = -corr

    • WORST → obj0 = -K, obj1 = +corr

  • Per generation: select representative per K (natural scale), stream on improvement, and update a persistent TOP pool.

RAM-focused design

  • After load/expansion, call sealData to drop boxed AP maps and keep dense PrecomputedData.

  • During runs, cache only:

  • per-K representative correlation: corrByK

  • per-K representative mask as Base64: repMaskB64ByK

  • TOP pool uses lightweight entries (TopEntry) without holding full solutions.

Constructors

Link copied to clipboard
constructor()

Properties

Link copied to clipboard

Boxed AP rows only during load/expansion. Cleared by sealData.

Link copied to clipboard

Wall-clock computing time in milliseconds (last run).

Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard

Full-set mean AP per system (boxed for API compatibility).

Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard

AVERAGE branch percentiles (materialized at the end).

Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard

Kept for API compatibility; no longer populated to avoid per-topic maps.

Link copied to clipboard

Functions

Link copied to clipboard

Drop per-run caches (representative masks, correlation map, TOP pools). Call after writers have closed and aggregations have consumed the model.

Link copied to clipboard

Drop percentile lists (often large) once final CSV/Parquet tables have been written.

Link copied to clipboard
fun expandSystems(expansionCoefficient: Int, trueNumberOfSystems: Int, randomizedAveragePrecisions: Map<String, DoubleArray>, randomizedSystemLabels: Array<String>)

Expand systems by either reverting to an original prefix (if fewer than the truth) or by appending randomized systems with their AP rows and labels.

Link copied to clipboard
fun expandTopics(expansionCoefficient: Int, randomizedAveragePrecisions: Map<String, DoubleArray>, randomizedTopicLabels: Array<String>)

Expand topics by appending randomized columns to AP rows and labels.

Link copied to clipboard

Find cached correlation for a cardinality (if present).

Link copied to clipboard
fun isTopicInASolutionOfCardinality(topicLabel: String, cardinality: Double): Boolean

O(1) presence lookup using the cached representative mask per K (decoded on demand).

Link copied to clipboard
fun loadData(datasetPath: String)

Load dataset AP matrix from a CSV file.

Link copied to clipboard

Return the presence mask for K encoded as "B64:<base64>" (always sized to current numberOfTopics).

Link copied to clipboard

Return a copy of the representative presence mask for K, or null if none cached.

Link copied to clipboard
fun retrieveMaskForCardinalitySized(cardinality: Double, expectedSize: Int): BooleanArray

Return the presence mask for K sized exactly to expectedSize.

Link copied to clipboard
fun sealData()

Seal the dataset before solving.

Link copied to clipboard
fun solve(parameters: Parameters, out: SendChannel<ProgressEvent>? = null): Triple<List<BinarySolution>, List<BinarySolution>, Triple<String, String, Long>>

Run the experiment (AVERAGE | BEST | WORST) and stream progress events to consumers.