Filewisp

How to fix CSV mojibake (garbled text): Excel, UTF-8, and Shift_JIS

Open a CSV in Excel and the text comes out garbled — the cause is a mismatch between character encodings (UTF-8 vs Shift_JIS). Here is why it happens, how to fix it on the spot, and how to hand off data that does not break.

Garbled text is an encoding mismatch

A CSV is just a text file, but it does not always clearly carry which character encoding it was written in. When the side that wrote the file and the side that opens it assume different encodings, the same bytes get read as different characters — and you get garbled text.

For many languages the two usual suspects are UTF-8 and a legacy local encoding (for Japanese, Shift_JIS / cp932). Open a UTF-8 CSV in a tool that assumes the legacy encoding (or vice versa) and only the non-ASCII text breaks. ASCII looks fine while accented or non-Latin characters garble — that is the telltale sign.

The classic 'breaks in Excel' case

For Excel, line up the encoding before opening rather than fighting it after

Older Excel on Windows tries to read a double-clicked CSV using the legacy local encoding. So a UTF-8 CSV exported by a web service or program comes out garbled. That is Excel's default behavior, not a corrupted file.

There are two ways around it. One is to import via Data → 'From Text/CSV' and specify UTF-8 on the way in. The other is to re-save the file in a form Excel reads happily — UTF-8 with a BOM, or the legacy encoding.

Fixing UTF-8 vs Shift_JIS

To fix it locally, open the CSV in an editor like Notepad or VS Code and re-save it with an explicit encoding. If you want Excel to open it cleanly by double-click, saving as UTF-8 with a BOM is the safest choice and works in most environments.

If instead you are feeding a legacy system that expects Shift_JIS, you need to export in that encoding. Confirm which one the recipient expects first, and you save yourself several rounds of re-saving.

Delimiters, line breaks, and quotes break too

Beyond encoding, CSVs love to drift columns. When a cell contains a comma or a line break that is not properly wrapped in quotes ("), the column count stops matching. Address fields and free-text notes are common offenders.

Sometimes the delimiter is a tab or semicolon rather than a comma. If everything lands in one column, or columns shift badly, suspect the delimiter and quoting — not just the encoding.

Route through JSON or Parquet to clean up

Passing through JSON makes encoding and delimiter mess easier to straighten out

Rather than hand-patching a broken CSV, it is often faster to route it through another format. Convert CSV to JSON to inspect the structure, or convert to Parquet for a data platform so the types and encoding are pinned down.

Our CSV to JSON and CSV to Parquet tools run in the browser, so even CSVs with customer or internal data convert without leaving your device — handy when you just want to see what is inside.

Bottom line: line up the encoding first

CSV garbling is almost always a UTF-8 vs Shift_JIS mismatch. Match the encoding the recipient expects, and for Excel re-save as UTF-8 with a BOM. If columns drift, check the delimiter and quoting too.

CSV ⇄ JSON and CSV ⇄ Parquet conversions are free, signup-free, and in-browser on Filewisp — useful when you need to fix the shape without corrupting the contents.