Parquet Schema Viewer: Inspect Columns and Types
Show the column tree, physical types, logical types, and repetition of any Parquet file.
Upload a Parquet file and the page reads its footer to reconstruct the schema. Each column appears as a node in a tree, annotated with the physical type from the spec (INT32, BYTE_ARRAY, and so on), its logical type when present (STRING, TIMESTAMP, DECIMAL), and the repetition (REQUIRED, OPTIONAL, REPEATED). Nested groups indent their children, so structs and lists are easy to read.
The Parquet schema is the contract between the writer and any reader. It tells you what columns exist, what physical bytes encode each value, and how a value is wrapped when it can be null, repeated, or part of a struct. The spec defines physical types (BOOLEAN, INT32, INT64, INT96, FLOAT, DOUBLE, BYTE_ARRAY, FIXED_LEN_BYTE_ARRAY) and a separate layer of logical annotations like STRING, DATE, TIMESTAMP, DECIMAL, and UUID that explain how those bytes should be interpreted.
This viewer parses only the footer, which is the trailing metadata block of every Parquet file. That keeps the operation fast even on large files because no row data is decoded. The schema tree is walked depth-first using num_children to assemble parents and their leaves, mirroring how the Dremel record-shredding model lays out repeated and optional fields.
Use it before writing a Spark or DuckDB query against an unfamiliar dataset to confirm exact column names, to catch surprises like an int64 nanosecond timestamp where you expected milliseconds, or to verify whether a field is OPTIONAL versus REQUIRED before relying on its nullability.
- 1
Open the file
Pick a .parquet file. Only the footer is read, so even multi-gigabyte files load in seconds.
- 2
Read the tree
The root sits at the top. Each column shows its name, physical type, logical or converted type, and repetition.
- 3
Drill into nested fields
Struct and list columns are indented under their parents. DECIMAL columns also show precision and scale.
Confirm column names before querying
Avoid the back-and-forth of writing a SELECT, hitting a typo error, and reopening the file in a notebook.
Check timestamp units
Some writers store TIMESTAMP as nanoseconds, others as microseconds. The logical annotation tells you which.
Validate a producer's contract
When a pipeline emits Parquet for downstream consumers, verify the schema matches the agreed shape.
Pick the right Arrow type
When building a reader, the physical and logical type pair maps cleanly to an Arrow datatype.
Does this read any row data?
No. Only the footer is parsed. Row groups stay untouched, which is why even large files open quickly.
Is the file uploaded?
No. Parsing happens in your browser via hyparquet. The file is never sent over the network.
What is the difference between physical and logical type?
Physical type is the on-disk encoding, like INT64. Logical type is the semantic meaning, like TIMESTAMP_MICROS or DECIMAL(18,4). One physical type can back many logical types.
Why do some columns show no logical type?
Numeric columns like a raw INT64 or DOUBLE often have no logical annotation. The physical type already tells you everything you need.
What does REPEATED mean?
It marks a field that can occur multiple times per row, which is how Parquet encodes lists. Combined with a child OPTIONAL element, you get a nullable list of values.