ParquetView

ParquetView

Streaming‑first Parquet writer for FUN, VAR, and TOP tables, plus generic “final” tables (Aggregated/Info). Mirrors CSVView semantics.

Responsibilities

  • Resolve deterministic output paths via ViewPaths.

  • Stream or snapshot write:

  • FUN: (K, Correlation)

  • VAR: (K, TopicsB64) where topics are a Base64‑packed boolean mask.

  • TOP (BEST/WORST only): (K, Correlation, TopicsB64).

  • Write arbitrary final tables (header‑driven UTF‑8 schema).

Platform

  • Compression: -Dnbs.parquet.codec=SNAPPY|GZIP|UNCOMPRESSED; default SNAPPY (non‑Windows), GZIP (Windows).

  • SNAPPY tempdir auto‑provisioning when needed.

  • Windows without winutils.exe: falls back to a pure‑NIO OutputFile.

Ordering

  • On close: FUN/VAR rows sorted by (K asc, corr asc) except WORST (K asc, corr desc).

Safety

  • Toggle: -Dnbs.parquet.enabled=false makes methods no‑ops.

  • Internal buffers cleared on closeStreams.

Constructors

Link copied to clipboard
constructor()

Functions

Link copied to clipboard

Flush all buffered rows to Parquet and clear internal state.

Link copied to clipboard
fun getAggregatedDataParquetPath(model: DatasetModel, isTargetAll: Boolean = false): String

Build the aggregated data Parquet path.

Link copied to clipboard
fun getInfoParquetPath(model: DatasetModel, isTargetAll: Boolean = false): String

Build the info Parquet path.

Link copied to clipboard

Buffer one streamed FUN/VAR row (Topics normalized to B64:) for later write.

Link copied to clipboard

Merge/replace per‑K TOP blocks from incoming CSV lines.

Link copied to clipboard
fun printSnapshot(model: DatasetModel, allSolutions: List<BinarySolution>, topSolutions: List<BinarySolution>, actualTarget: String)

Write a full snapshot of FUN/VAR (and TOP when applicable) to Parquet.

Link copied to clipboard
fun writeTable(rows: List<Array<String>>, outPath: String)

Write a header‑driven UTF‑8 table to Parquet.