JSON to Parquet Converter: Build .parquet from JSON Arrays
Convert a JSON array of objects into a Parquet file with inferred column types, locally in your browser.
Paste or upload a JSON array. The converter samples the first 100 records to derive the column list and pick a type per column: DOUBLE for numbers, BOOLEAN for booleans, STRING for everything else. It then writes a Snappy-compressed Parquet file with hyparquet-writer that loads cleanly in pandas, DuckDB, Spark, and Arrow.
API responses, MongoDB exports, log dumps, and configuration snapshots usually arrive as JSON. The moment you want to load that data into an analytical engine or share it as a compact, well-typed file, Parquet is the right destination: columnar layout, Snappy compression, and a schema that downstream tools respect.
The converter parses the input, requires a top-level array, and unions the keys from the first 100 objects to build a column list. For each column it picks a type from a small set: DOUBLE if all observed values are numbers, BOOLEAN if they're all booleans, STRING otherwise. Mixed columns fall back to STRING so the file always writes successfully; nested objects and arrays in a STRING column are serialized with JSON.stringify and can be re-parsed on the read side. Missing keys become nulls.
Writing is handled by hyparquet-writer, which assembles row groups, applies the default Snappy codec, and produces a Parquet 2.x file. The output downloads with the input filename's extension swapped to .parquet. Because everything happens in the browser tab, sensitive responses or PII never leave your machine.
- 1
Load the JSON
Upload a .json file or paste the array text into the box. The input must be a top-level array of objects.
- 2
Convert
Click Convert to Parquet. The first 100 records define the schema and each column gets a DOUBLE, BOOLEAN, or STRING type.
- 3
Download .parquet
The output card shows row count, column count, and file size. Download to save the .parquet file.
Persist an API dump
Fetch a paginated API to a JSON array, then convert the result to Parquet for cheap, queryable storage.
Compact MongoDB exports
mongoexport produces JSON. Converting to Parquet shrinks the file and makes it readable in analytical tooling.
Prep test data for a data lake
Seed an S3 bucket with a Parquet file built from a JSON fixture you already have in version control.
Hand off telemetry events
Convert collected event JSON to Parquet so a data analyst can load it into DuckDB with a single read_parquet call.
Does my JSON go to a server?
No. Parsing, schema inference, and Parquet writing all run in this browser tab. There is no upload step.
What types are inferred?
Numbers become DOUBLE, booleans become BOOLEAN, everything else becomes STRING. Nested objects and arrays in a STRING column are serialized with JSON.stringify.
Why only the first 100 records for schema?
Walking every record before writing would double the memory cost on large inputs. A 100-row sample is enough for most homogeneous datasets; mixed types fall back to STRING which always writes.
Does it support NDJSON?
The expected input is a top-level JSON array. For NDJSON, wrap your file: prefix with '[' and join the records with commas. A future revision may accept newline-delimited input directly.
Can I round-trip back to the same JSON?
For columns kept as DOUBLE or BOOLEAN, yes. STRING columns that hold serialized JSON parse back with JSON.parse on the read side, so nested structures survive the round-trip with one extra step.