Parquet vs CSV for AWS, BigQuery, and Spark
Why Parquet cuts AWS Athena and BigQuery costs, when CSV still makes more sense, and how to convert between the two formats directly in your browser.
Why Parquet matters on AWS
AWS Athena charges per terabyte scanned. If your data sits in S3 as CSV, every query reads every row of every column — even the ones you don't need. Parquet stores data column by column and compresses each one independently, so Athena only reads what the query actually asks for. The same dataset in Parquet format is often 5–10x smaller than CSV, and that difference shows up directly on your AWS bill.
If you're managing tables with AWS Glue, keeping data in Parquet also makes ETL jobs faster and cheaper. Glue reads less data, processes it faster, and you pay for fewer DPU-hours.
Redshift, BigQuery, and Spark
BigQuery also charges based on bytes scanned, so the same logic applies — Parquet files mean smaller scans and lower per-query costs. Redshift Spectrum works similarly when reading from S3-backed external tables.
For Spark or EMR batch jobs, Parquet input reduces the time spent deserializing data. If you're running daily jobs against large S3 datasets, switching from CSV to Parquet is one of the easier wins.
When CSV is still the right call
Parquet is a binary format — you can't open it in Excel or a text editor. If you need to share data with someone who isn't going to query it through Athena or Spark, CSV is usually more practical. Analysts who work in spreadsheets, BI tools, or simple import wizards will have an easier time with a CSV.
CSV also wins for quick manual inspection. If you pulled a Parquet file from S3 and just want to check a few values, converting it to CSV and opening it locally is often faster than spinning up a query.
Convert in your browser, no setup needed
Both the Parquet-to-CSV and CSV-to-Parquet converters on this site run entirely in your browser — nothing is uploaded to a server. That makes them safe to use with production data or anything proprietary.
Grab a Parquet file from S3, convert it to CSV here, and open it in Excel — done in under a minute. Or take a local CSV and convert it to Parquet before uploading to S3, so your Athena queries are cheaper from day one.