Dataset README Generator

Generate a Markdown README for a CSV or JSON dataset, with schema, sample rows, and basic statistics.

First use downloads ~1.2 MB - SheetJS reads and writes Excel (.xlsx, .xls), OpenDocument (.ods), and CSV files. Loaded only on tools that work with spreadsheets. Cached after first use.

Drop a CSV or JSON file here or click to choose

Accepts .csv, .json, and .tsv

Dataset name

Source (optional)

Description

About Dataset README

Drop in a CSV or JSON file and get back a ready-to-publish README.md describing the dataset. The generator infers column types from a sample, counts nulls, picks an example value per column, and renders sample rows so anyone reading the README knows what to expect before they download the data.

A good dataset README answers three questions at a glance: what fields are in the file, what they look like, and how big the file is. This tool produces that README from any CSV or JSON file in seconds.

For CSV files, SheetJS parses the header row and the first 100 data rows. Each column gets a type inferred from its samples (number, date, boolean, or string), a representative non-empty value, and a count of how many of those sampled rows had the column empty. For JSON files, the array is parsed natively, the union of all keys across the first 100 entries becomes the schema, and the same type and null inference runs over the sampled values.

The generated Markdown follows a familiar shape: a title from the name you provide, optional description and source sections, a statistics block (row count, column count, file size), a schema table, and a sample-rows table built from the first five entries. Copy it straight into a Kaggle dataset description, a Hugging Face dataset card, a GitHub repo README, or your team's data catalog. Edit the name, description, and source fields in place to refine the output.

How to use the Dataset README

1
Choose a CSV or JSON file
The file is parsed in your browser. CSV files use SheetJS; JSON files are expected to be an array of objects at the top level.
2
Add a name, description, and source
Fill in the dataset name and a short description. Optionally add a source URL or citation. These fields appear in the generated Markdown.
3
Copy or download README.md
The README updates live with schema table, sample rows, and statistics. Copy the Markdown to the clipboard or download it as a file.

Common use cases

Publish a dataset on Kaggle or Hugging Face

Drop the file in, fill in the description, and paste the generated Markdown straight into the dataset card editor.

Document an internal data drop

Generate a quick README beside a CSV in a shared folder so teammates know its columns and shape without opening the file.

Spot schema issues early

Check inferred types and null counts to catch columns that mix numbers and strings, or that are mostly empty, before importing the data anywhere.

Bootstrap a data catalog entry

Paste the schema table into an internal wiki or data catalog so the dataset shows up with field-level docs from day one.

Frequently asked questions

Is my data sent anywhere?

No. The file is read with FileReader and parsed in your browser using SheetJS for CSV or the built-in JSON parser. Only the first 100 rows are sampled, and nothing is uploaded. Closing the tab discards everything.

How accurate is the type inference?

Types are inferred from up to 100 sampled values per column. A column is called a number when at least 90% of non-empty samples parse as numeric, and similarly for dates and booleans. Mixed columns fall back to string.

Why only 100 rows?

Sampling keeps the tool fast on large files and good enough for documentation. The README still reports the true total row count from the full parse, only inference and sample rows come from the first 100.

Does it support Parquet or Excel files?

Not yet. The current build accepts CSV, TSV, and JSON arrays of objects. Parquet and Excel may be added later once the in-browser readers settle on a stable API.

Can I edit the generated Markdown?

Yes. The output preview is a regular text area. Copy it out, paste into your editor, and adjust anything: rewrite the description, reorder columns, add example queries, or tweak the wording before publishing.

developerformatterconverter

Dataset README Generator

Choose a CSV or JSON file

Add a name, description, and source

Copy or download README.md

Publish a dataset on Kaggle or Hugging Face

Document an internal data drop

Spot schema issues early

Bootstrap a data catalog entry