Build the per-owner RawFileTable CSV (provider file inventory, post-archive safe).
buildRawFileTable.RdFor each station, reads raw.owner/record.json and emits one row
per provider file (one per ComponentID x FileID). Handles both states:
raw.owner/record.jsonpresent on disk (not yet archived).raw.owner.tar.gzarchive (extracts record.json via streamingtar -xzOfto avoid touching disk).
Details
Schema:
PGA is intentionally not emitted here. Pre-parse PGAs from
record.json are in heterogeneous provider units; canonical
post-parse PGAs (in mm/s^2) live in RawIntensityTable.<Owner>.csv.
Missing provider fields in the canonical schema are emitted as typed
NA columns instead of changing the output schema.
isArray = nComponentID > 3 (heuristic per legacy convention).
Examples
root <- file.path(tempdir(), "gmsp-raw-file-example")
index <- file.path(tempdir(), "gmsp-raw-file-index")
unlink(c(root, index), recursive = TRUE)
dir.create(file.path(root, "AAA", "E1", "S1", "raw.owner"),
recursive = TRUE)
dir.create(index)
record <- list(
Event = list(EventID = "E1"),
Station = list(StationID = "S1"),
Record = list(
list(ComponentID = "H1", FileID = "H1.txt", NP = 4, dt = 0.01,
Fs = 100, Units = "cm", HP = NA, LP = NA),
list(ComponentID = "H2", FileID = "H2.txt", NP = 4, dt = 0.01,
Fs = 100, Units = "cm", HP = NA, LP = NA),
list(ComponentID = "UP", FileID = "UP.txt", NP = 4, dt = 0.01,
Fs = 100, Units = "cm", HP = NA, LP = NA)
)
)
jsonlite::write_json(
record,
file.path(root, "AAA", "E1", "S1", "raw.owner", "record.json"),
auto_unbox = TRUE
)
suppressMessages(buildRawFileTable(root, index, owners = "AAA"))
data.table::fread(file.path(index, "RawFileTable.AAA.csv"))
#> OwnerID EventID StationID ComponentID FileID NP dt Fs Units HP
#> <char> <char> <char> <char> <char> <int> <num> <int> <char> <lgcl>
#> 1: AAA E1 S1 H1 H1.txt 4 0.01 100 cm NA
#> 2: AAA E1 S1 H2 H2.txt 4 0.01 100 cm NA
#> 3: AAA E1 S1 UP UP.txt 4 0.01 100 cm NA
#> LP isArray
#> <lgcl> <lgcl>
#> 1: NA FALSE
#> 2: NA FALSE
#> 3: NA FALSE