Parquet View
ParquetView
Streaming‑first Parquet writer for FUN, VAR, and TOP tables, plus generic “final” tables (Aggregated/Info). Mirrors CSVView semantics.
Responsibilities
Resolve deterministic output paths via ViewPaths.
Stream or snapshot write:
FUN:
(K, Correlation)
VAR:
(K, TopicsB64)
where topics are a Base64‑packed boolean mask.TOP (BEST/WORST only):
(K, Correlation, TopicsB64)
.Write arbitrary final tables (header‑driven UTF‑8 schema).
Platform
Compression:
-Dnbs.parquet.codec=SNAPPY|GZIP|UNCOMPRESSED
; default SNAPPY (non‑Windows), GZIP (Windows).SNAPPY tempdir auto‑provisioning when needed.
Windows without
winutils.exe
: falls back to a pure‑NIO OutputFile.
Ordering
On close: FUN/VAR rows sorted by
(K asc, corr asc)
except WORST(K asc, corr desc)
.
Safety
Toggle:
-Dnbs.parquet.enabled=false
makes methods no‑ops.Internal buffers cleared on closeStreams.
Functions
Flush all buffered rows to Parquet and clear internal state.
Build the aggregated data Parquet path.
Build the info Parquet path.
Buffer one streamed FUN/VAR row (Topics normalized to B64:
) for later write.
Merge/replace per‑K TOP blocks from incoming CSV lines.
Write a full snapshot of FUN/VAR (and TOP when applicable) to Parquet.
Write a header‑driven UTF‑8 table to Parquet.