CSV Format Reference (RFC 4180)
A technical reference for the CSV file format — structure, quoting rules, encoding, dialect differences, and common errors.
What Is CSV?
CSV (Comma-Separated Values) is a plain-text format for tabular data. Each line represents a row; commas separate columns within each row. It predates the web by decades — the format traces back to early mainframe data exchange in the 1970s — but it remains a universal interchange format because of its simplicity: every spreadsheet, database, and data analysis tool can read and write it.
Despite its age, CSV never had a formal standard until October 2005, when Yakov Shafranovich published RFC 4180, "Common Format and MIME Type for Comma-Separated Values (CSV) Files." RFC 4180 is an Informational RFC, not a standard-track document, but it codified the de facto conventions that implementations had converged on over decades.
CSV Structure
An RFC 4180-compliant CSV file has these components:
- Records: Each line in the file is one record (row). Records are separated by CRLF (
\r\n). The last record may or may not end with CRLF. - Header: The first record is typically a header row containing field names — one name per column. A header is optional but strongly recommended.
- Fields: Within each record, fields are separated by commas. Spaces are considered part of the field value and should not appear around the comma delimiter.
- Consistent field count: Each record should contain the same number of fields as the header row. RFC 4180 does not strictly require this, but virtually all parsers either reject or silently mishandle inconsistent counts.
name,email,role
Alice,[email protected],admin
Bob,[email protected],editor
Carol,[email protected],viewer
Delimiters
Despite "comma" in the name, CSV files use a variety of field separators. The delimiter is not stored in the file itself — the parser must either be told or must guess.
| Delimiter | Name | MIME Type | Common Use |
|---|---|---|---|
, | Comma (CSV) | text/csv | Universal default. RFC 4180 standard. |
\t | Tab (TSV) | text/tab-separated-values | Data with commas in values. Bioinformatics, data science pipelines. |
; | Semicolon | text/csv | European locales where comma is the decimal separator (e.g., 3,14). |
| | Pipe | text/csv | Readable alternative. Common in log files and legacy mainframe exports. |
Quoting Rules
RFC 4180 defines specific rules for when and how fields must be quoted:
- Fields containing the delimiter must be quoted. A field like
Boston, MAneeds quotes because the comma would otherwise split it into two columns. - Fields containing double-quotes must be quoted, and every literal double-quote inside must be escaped by doubling it. The value
She said "hello"becomes"She said ""hello"""in the CSV. - Fields containing line breaks (CRLF, LF, or CR) must be quoted. A multi-line address stored in one field is valid CSV when quoted.
- Fields may always be quoted — quoting a field that doesn't strictly need it is valid and sometimes safer.
Quoting Examples
| Raw Value | CSV Representation | Rule Applied |
|---|---|---|
hello | hello | No special chars — unquoted is fine. |
Boston, MA | "Boston, MA" | Contains comma — must quote. |
He said "hi" | "He said ""hi""" | Contains quotes — must quote and double them. |
Line 1\nLine 2 | "Line 1\nLine 2" | Contains line break — must quote. |
Encoding
RFC 4180 does not specify a character encoding. For modern use:
- UTF-8 is the recommended encoding. It covers all Unicode characters and is the default for web applications, Google Sheets, and modern Excel (2019+).
- BOM for Excel compatibility: Microsoft Excel on Windows requires a Byte Order Mark (
\uFEFF) at the beginning of a UTF-8 CSV file to recognize it as UTF-8. Without the BOM, Excel assumes a locale-specific 8-bit encoding (Windows-1252 for US/UK, other code pages elsewhere) and non-ASCII characters render as gibberish. - ASCII fallback: If you only use characters in the ASCII range (0-127), any encoding works — UTF-8, ASCII, Latin-1, or Windows-1252 all produce identical bytes. This is why English-only CSV files rarely have encoding problems.
CSV Dialects
"CSV" is not one format. Different applications have different defaults and quirks:
| Aspect | RFC 4180 | Microsoft Excel | Google Sheets |
|---|---|---|---|
| Default delimiter | Comma | Varies by region (comma US, semicolon EU) | Comma |
| Line endings | CRLF | CRLF on Windows | LF |
| Encoding default | Unspecified | Locale code page (requires BOM for UTF-8) | UTF-8 (always) |
| Quoting | RFC 4180 rules | RFC 4180 plus auto-detection | RFC 4180 with tolerance for unquoted commas |
| Header row | Optional | Assumed present | Assumed present |
Excel exports for European locales use semicolons by default because the comma is reserved as the decimal separator. Google Sheets always uses commas and UTF-8. The practical implication: a CSV generated by Google Sheets may open as a single column in European Excel unless the user imports it through the Data Import wizard.
Common CSV Errors
- Unescaped commas in unquoted fields. The most frequent CSV bug. A field containing
Boston, MAwithout quotes silently becomes two columns. All downstream columns shift right and the row has one too many fields. - Inconsistent column counts. Row 5 has 8 fields while the header has 7. Some parsers truncate, some pad with empty cells, some throw errors. There is no standard recovery behavior.
- Trailing delimiters. A trailing comma at the end of a line (
Alice,30,) creates a ghost empty column. This often happens when code builds CSV lines by joining with commas without checking for trailing elements. - Mixed line endings. A file with LF endings on some lines and CRLF on others. Common when CSV files are concatenated from different sources or edited on both Windows and Unix systems. Most modern parsers handle this gracefully, but older tools may see stray CR characters in field values.
- BOM on the wrong line. The BOM must be the very first bytes of the file. If a CSV has UTF-8 BOM on line 10, it's useless — Excel has already guessed the encoding by then.
Frequently Asked Questions
Why does Excel show my CSV in one column?
Excel shows all data in one column when it fails to detect the correct delimiter. This happens when: (1) you open by double-clicking and Excel's default delimiter (semicolon in European locales, comma in US) doesn't match your file; (2) the file has no .csv extension; or (3) the CSV uses a non-standard delimiter like pipe. Fix it by using Excel's Data Import wizard: Data → From Text/CSV, then select the correct delimiter in the preview window.
Should I use comma or semicolon as my CSV delimiter?
Use comma unless you have a specific reason not to. Comma is the RFC 4180 standard and works in every spreadsheet application worldwide. Use semicolon only when targeting European Excel users — in locales where comma is the decimal separator (3,14 for π), Excel defaults to semicolon-delimited CSVs. If your data contains commas within field values, use tab-delimited (TSV) instead of semicolon — it's more universally compatible.
How do I handle commas inside my data fields?
Wrap the entire field in double quotes. A value like Boston, MA 02108 must appear as "Boston, MA 02108" in the CSV file. The outer quotes tell the parser that commas inside belong to the value, not the column structure. If the field value itself contains double quotes, double them: She said "hello" becomes "She said ""hello""".