CSV Format Reference (RFC 4180)

A technical reference for the CSV file format — structure, quoting rules, encoding, dialect differences, and common errors.

What Is CSV?

CSV (Comma-Separated Values) is a plain-text format for tabular data. Each line represents a row; commas separate columns within each row. It predates the web by decades — the format traces back to early mainframe data exchange in the 1970s — but it remains a universal interchange format because of its simplicity: every spreadsheet, database, and data analysis tool can read and write it.

Despite its age, CSV never had a formal standard until October 2005, when Yakov Shafranovich published RFC 4180, "Common Format and MIME Type for Comma-Separated Values (CSV) Files." RFC 4180 is an Informational RFC, not a standard-track document, but it codified the de facto conventions that implementations had converged on over decades.

CSV Structure

An RFC 4180-compliant CSV file has these components:

name,email,role
Alice,[email protected],admin
Bob,[email protected],editor
Carol,[email protected],viewer

Delimiters

Despite "comma" in the name, CSV files use a variety of field separators. The delimiter is not stored in the file itself — the parser must either be told or must guess.

DelimiterNameMIME TypeCommon Use
,Comma (CSV)text/csvUniversal default. RFC 4180 standard.
\tTab (TSV)text/tab-separated-valuesData with commas in values. Bioinformatics, data science pipelines.
;Semicolontext/csvEuropean locales where comma is the decimal separator (e.g., 3,14).
|Pipetext/csvReadable alternative. Common in log files and legacy mainframe exports.

Quoting Rules

RFC 4180 defines specific rules for when and how fields must be quoted:

  1. Fields containing the delimiter must be quoted. A field like Boston, MA needs quotes because the comma would otherwise split it into two columns.
  2. Fields containing double-quotes must be quoted, and every literal double-quote inside must be escaped by doubling it. The value She said "hello" becomes "She said ""hello""" in the CSV.
  3. Fields containing line breaks (CRLF, LF, or CR) must be quoted. A multi-line address stored in one field is valid CSV when quoted.
  4. Fields may always be quoted — quoting a field that doesn't strictly need it is valid and sometimes safer.

Quoting Examples

Raw ValueCSV RepresentationRule Applied
hellohelloNo special chars — unquoted is fine.
Boston, MA"Boston, MA"Contains comma — must quote.
He said "hi""He said ""hi"""Contains quotes — must quote and double them.
Line 1\nLine 2"Line 1\nLine 2"Contains line break — must quote.

Encoding

RFC 4180 does not specify a character encoding. For modern use:

CSV Dialects

"CSV" is not one format. Different applications have different defaults and quirks:

AspectRFC 4180Microsoft ExcelGoogle Sheets
Default delimiterCommaVaries by region (comma US, semicolon EU)Comma
Line endingsCRLFCRLF on WindowsLF
Encoding defaultUnspecifiedLocale code page (requires BOM for UTF-8)UTF-8 (always)
QuotingRFC 4180 rulesRFC 4180 plus auto-detectionRFC 4180 with tolerance for unquoted commas
Header rowOptionalAssumed presentAssumed present

Excel exports for European locales use semicolons by default because the comma is reserved as the decimal separator. Google Sheets always uses commas and UTF-8. The practical implication: a CSV generated by Google Sheets may open as a single column in European Excel unless the user imports it through the Data Import wizard.

Common CSV Errors

Frequently Asked Questions

Why does Excel show my CSV in one column?

Excel shows all data in one column when it fails to detect the correct delimiter. This happens when: (1) you open by double-clicking and Excel's default delimiter (semicolon in European locales, comma in US) doesn't match your file; (2) the file has no .csv extension; or (3) the CSV uses a non-standard delimiter like pipe. Fix it by using Excel's Data Import wizard: Data → From Text/CSV, then select the correct delimiter in the preview window.

Should I use comma or semicolon as my CSV delimiter?

Use comma unless you have a specific reason not to. Comma is the RFC 4180 standard and works in every spreadsheet application worldwide. Use semicolon only when targeting European Excel users — in locales where comma is the decimal separator (3,14 for π), Excel defaults to semicolon-delimited CSVs. If your data contains commas within field values, use tab-delimited (TSV) instead of semicolon — it's more universally compatible.

How do I handle commas inside my data fields?

Wrap the entire field in double quotes. A value like Boston, MA 02108 must appear as "Boston, MA 02108" in the CSV file. The outer quotes tell the parser that commas inside belong to the value, not the column structure. If the field value itself contains double quotes, double them: She said "hello" becomes "She said ""hello""".

Related Guides