NewBestSub/it.uniud.newbestsub.dataset/DatasetModel

DatasetModel

class DatasetModel

DatasetModel

============

In-memory model orchestrating:

The dataset AP matrix (systems × topics).
Streaming NSGA-II execution for BEST/WORST and sampling for AVERAGE.
Compact caches so aggregation/info queries don’t retain large populations.

Output artifacts

-Fun: objective rows (K, correlation) streamed during runs.
-Var: genotype rows as Base64 masks ("B64:<...>") streamed with -Fun.
BEST/WORST → only when the per-K representative improves (monotone).
AVERAGE → exactly one row per K.
-Top: top solutions per cardinality (≤ 10 per K), maintained via block replacement.

Branches

AVERAGE:
For each K = 1..N, sample numberOfRepetitions subsets.
Correlate vs full-set mean vector, stream exactly one FUN/VAR row.
Percentiles computed from the same samples. TOP unused.
BEST/WORST:
NSGA-II objective encoding (internal):

BEST → obj0 = +K, obj1 = -corr
WORST → obj0 = -K, obj1 = +corr

Per generation: select representative per K (natural scale), stream on improvement, and update a persistent TOP pool.

RAM-focused design

After load/expansion, call sealData to drop boxed AP maps and keep dense PrecomputedData.
During runs, cache only:
per-K representative correlation: corrByK
per-K representative mask as Base64: repMaskB64ByK
TOP pool uses lightweight entries (TopEntry) without holding full solutions.

Constructors

DatasetModel

constructor()

Properties

averagePrecisions

var averagePrecisions: LinkedHashMap<String, DoubleArray>

Boxed AP rows only during load/expansion. Cleared by sealData.

computingTime

var computingTime: Long

Wall-clock computing time in milliseconds (last run).

correlationMethod

var correlationMethod: String

currentExecution

var currentExecution: Int

datasetName

var datasetName: String

expansionCoefficient

var expansionCoefficient: Int

meanAveragePrecisions

var meanAveragePrecisions: DoubleArray

Full-set mean AP per system (boxed for API compatibility).

numberOfIterations

var numberOfIterations: Int

numberOfRepetitions

var numberOfRepetitions: Int

numberOfSystems

var numberOfSystems: Int

numberOfTopics

var numberOfTopics: Int

percentiles

var percentiles: LinkedHashMap<Int, List<Double>>

AVERAGE branch percentiles (materialized at the end).

populationSize

var populationSize: Int

systemLabels

var systemLabels: Array<String>

targetToAchieve

var targetToAchieve: String

topicDistribution

var topicDistribution: LinkedHashMap<String, MutableMap<Double, Boolean>>

Kept for API compatibility; no longer populated to avoid per-topic maps.

topicLabels

var topicLabels: Array<String>

Functions

clearAfterSerialization

fun clearAfterSerialization()

Drop per-run caches (representative masks, correlation map, TOP pools). Call after writers have closed and aggregations have consumed the model.

clearPercentiles

fun clearPercentiles()

Drop percentile lists (often large) once final CSV/Parquet tables have been written.

expandSystems

fun expandSystems(expansionCoefficient: Int, trueNumberOfSystems: Int, randomizedAveragePrecisions: Map<String, DoubleArray>, randomizedSystemLabels: Array<String>)

Expand systems by either reverting to an original prefix (if fewer than the truth) or by appending randomized systems with their AP rows and labels.

expandTopics

fun expandTopics(expansionCoefficient: Int, randomizedAveragePrecisions: Map<String, DoubleArray>, randomizedTopicLabels: Array<String>)

Expand topics by appending randomized columns to AP rows and labels.

findCorrelationForCardinality

fun findCorrelationForCardinality(cardinality: Double): Double?

Find cached correlation for a cardinality (if present).

isTopicInASolutionOfCardinality

fun isTopicInASolutionOfCardinality(topicLabel: String, cardinality: Double): Boolean

O(1) presence lookup using the cached representative mask per K (decoded on demand).

loadData

fun loadData(datasetPath: String)

Load dataset AP matrix from a CSV file.

retrieveMaskB64ForCardinality

fun retrieveMaskB64ForCardinality(cardinality: Double): String

Return the presence mask for K encoded as "B64:<base64>" (always sized to current numberOfTopics).

retrieveMaskForCardinality

fun retrieveMaskForCardinality(cardinality: Double): BooleanArray?

Return a copy of the representative presence mask for K, or null if none cached.

retrieveMaskForCardinalitySized

fun retrieveMaskForCardinalitySized(cardinality: Double, expectedSize: Int): BooleanArray

Return the presence mask for K sized exactly to expectedSize.

sealData

fun sealData()

Seal the dataset before solving.

solve

fun solve(parameters: Parameters, out: SendChannel<ProgressEvent>? = null): Triple<List<BinarySolution>, List<BinarySolution>, Triple<String, String, Long>>

Run the experiment (AVERAGE | BEST | WORST) and stream progress events to consumers.