CSV Encoding Detector and UTF-8 Converter
Detect the character encoding of a CSV file and convert it to UTF-8 with a byte order mark.
Upload a CSV and the tool inspects the raw bytes to guess the encoding using chardet, decodes with TextDecoder, and re-encodes the content as UTF-8 with a BOM so spreadsheet software opens it correctly.
CSV files exported from older systems often use legacy single byte encodings such as Windows-1252, ISO-8859-1, Shift_JIS, or GB18030. When those files are opened in software that expects UTF-8, accented or non Latin characters appear as garbled bytes (mojibake). This tool reads the raw bytes of an uploaded file, runs them through chardet to identify the most likely encoding, decodes the bytes with the browser's TextDecoder, and re-encodes the text as UTF-8.
The download includes a UTF-8 byte order mark. Microsoft Excel relies on the BOM to recognize UTF-8 files and display non Latin characters correctly. Tools that read UTF-8 without a BOM (most command line tools, Python, awk) will skip the BOM bytes automatically, so the output works in both worlds.
If the automatic detection picks the wrong encoding (it can happen with very short files or files that look ambiguous), the override dropdown lets you re-decode with a specific charset. The byte counts before and after conversion are shown so you can verify that decoding produced sensible output: a sudden change in size that looks suspicious usually indicates the wrong source encoding was used.
- 1
Upload the file
Choose a CSV file. The raw bytes are loaded into the browser as a Uint8Array, ready for inspection.
- 2
Detect the encoding
chardet analyses byte patterns and returns the most likely encoding with a confidence score. The top match is used to decode the file.
- 3
Download as UTF-8
The decoded text is encoded as UTF-8 with a byte order mark and offered as a download. The original file on disk is never modified.
Fix Excel exports with broken accents
Convert Windows-1252 exports from older Excel installations into UTF-8 so accented French, Spanish, or German names display correctly.
Prepare CSVs for Postgres COPY
Postgres COPY expects a known client encoding. Convert files to UTF-8 first so the import does not fail on the first non ASCII byte.
Open Japanese CSVs in modern editors
Files saved as Shift_JIS from legacy POS software become readable when transcoded to UTF-8, which is the default for VS Code and most editors.
Override an incorrect guess
Very short files can confuse encoding detection. Pick the correct encoding from the override list to redo the conversion.
Is my file uploaded anywhere?
No. Detection and conversion both run in your browser using chardet and the standard TextDecoder API. The file never leaves your device.
Why does the output include a BOM?
Microsoft Excel uses the UTF-8 byte order mark to recognize UTF-8 encoded CSV files. Command line tools and most other software skip the BOM automatically, so the file works in both.
What if detection picks the wrong encoding?
Use the override dropdown to re-decode with a specific encoding. chardet relies on statistical patterns, which can be inconclusive for very short or unusual files.
Which encodings can the browser decode?
TextDecoder supports the encodings listed in the Encoding Standard, including UTF-8, UTF-16, all ISO-8859 variants, the Windows-125x family, Shift_JIS, EUC-JP, EUC-KR, GB18030, Big5, and KOI8-R.
Will conversion change my data?
Conversion only changes the byte representation of characters, not the characters themselves. If the source encoding is correct, the visible content of every cell is preserved exactly.