Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Python CSV

CSV

What is a CSV file?

  • CSV stands for Comma Separated Values

  • A CSV file is similar to the values you might put in an Excel file. Though in Excel each cell has both a value and a format (and maybe more) attributes. A CSV file only contains values.

  • A CSV file has rows and in each row there are values separated by a comma.

  • In some cases the separator is some other character. e.g. a semic-colon (;), a pipeline (|) or a TAB character. (The last one is also referred to a TSV file where TSV stands for TAB Separated Values.

  • There are a number of other variations, so the csv-reading and writing librariers usually provide options to handle these variations.

  • Sometimes all the lines hold values. Sometimes the first line acts as the list of column-names.

CSV file without title row

  • Some of the figures in Snow White in Hungarian.

{% embed include file="src/examples/csv/snowwhite.csv)

CSV file with header

  • This CSV file contains information about the members of the Monthy Python show.
  • The first row contains the titles of the columns.

{% embed include file="src/examples/csv/monty_python.csv)

Read CSV file into lists

import sys
import csv

if len(sys.argv) != 2:
    sys.stderr.write("Usage: {} FILENAME\n".format(sys.argv[0]))
    exit()

filename = sys.argv[1]
with open(filename) as fh:
    rd = csv.reader(fh)

    for row in rd:
        print(row)

python examples/csv/read_csv.py  example/snowwhite.csv

CSV with newlines missing closing quote

{% embed include file="src/examples/csv/with_newlines_error.csv)

CSV to dictionary

  • DictReader
import sys
import csv

if len(sys.argv) != 2:
    sys.stderr.write("Usage: {} FILENAME\n".format(sys.argv[0]))
    exit()

filename = sys.argv[1]

with open(filename) as fh:
    rd = csv.DictReader(fh, delimiter=',')
    for row in rd:
        print(row)
{'lname': 'Chapman', 'born': '8 January 1941', 'fname': 'Graham'}
{'lname': 'Idle', 'born': '29 March 1943', 'fname': 'Eric'}
{'lname': 'Gilliam', 'born': '22 November 1940', 'fname': 'Terry'}
{'lname': 'Jones', 'born': '1 February 1942', 'fname': 'Terry'}
{'lname': 'Cleese', 'born': '27 October 1939', 'fname': 'John'}
{'lname': 'Palin', 'born': '5 May 1943', 'fname': 'Michael'}

CSV Attributes

  • delimiter
  • doublequote
  • escapechar
  • lineterminator
  • quotechar
  • quoting
  • skipinitialspace
  • strict

CSV dialects

  • list_dialects

The csv module defines a number of "dialects", sets of attributes.

import csv

for dialect_name in csv.list_dialects():
    print(dialect_name)
    dialect = csv.get_dialect(dialect_name)
    for attribute_name in [
            'delimiter',
            'doublequote',
            'escapechar',
            'lineterminator',
            'quotechar',
            'quoting',
            'skipinitialspace',
            'strict',
        ]:
        attr = getattr(dialect, attribute_name)
        if attr == '\t':
            attr = '\\t'
        if attr == '\r\n':
            attr = '\\r\\n'
        print("  {:16} '{}'".format(attribute_name, attr))

excel
  delimiter        ','
  doublequote      'True'
  escapechar       'None'
  lineterminator   '\r\n'
  quotechar        '"'
  quoting          '0'
  skipinitialspace 'False'
  strict           'False'
excel-tab
  delimiter        '\t'
  doublequote      'True'
  escapechar       'None'
  lineterminator   '\r\n'
  quotechar        '"'
  quoting          '0'
  skipinitialspace 'False'
  strict           'False'
unix
  delimiter        ','
  doublequote      'True'
  escapechar       'None'
  lineterminator   '
'
  quotechar        '"'
  quoting          '1'
  skipinitialspace 'False'
  strict           'False'

Dialects of CSV files. See also: csv

Reading CSV the naive way

  • split

  • This is not recommended as it will fail in some cases. See next page!

Tudor;Vidor;10;Hapci
Szundi;Morgo;7;Szende
Kuka;Hofeherke;100;Kiralyno
Boszorkany;Herceg;9;Meselo
import sys, csv

if len(sys.argv) != 2:
    sys.stderr.write("Usage: {} FILENAME\n".format(sys.argv[0]))
    exit()

filename = sys.argv[1]
count = 0
with open(filename) as fh:
    for line in fh:
        line = line.rstrip("\n")
        row = line.split(';')
        #print(row)
        count += int(row[2])

print("Total: {}".format(count))
python examples/csv/read_csv_split.py examples/csv/plain.csv

CSV with quotes and newlines

Tudor;Vidor;10;Hapci
Szundi;Morgo;7;Szende
Kuka;"Hofeherke; alma";100;Kiralyno
Boszorkany;Herceg;9;Meselo
Tudor;Vidor;10;Hapci
Szundi;Morgo;7;Szende
Kuka;"Hofeherke; 
alma";100;Kiralyno
Boszorkany;Herceg;9;Meselo

Reading a CSV file

  • csv
  • reader
import sys
import csv

if len(sys.argv) != 2:
    sys.stderr.write("Usage: {} FILENAME\n".format(sys.argv[0]))
    exit()

filename = sys.argv[1]
count = 0
with open(filename) as fh:
    rd = csv.reader(fh,
        delimiter=';',
        #strict=True,
    )

    for row in rd:
        print(row)
        count += int(row[2])

print("Total: {}".format(count))

python examples/csv/read_csv.py examples/csv/plain.csv

Exercise: CSV as dictionary of dictionaries

Create a script called monty_python_dictionary_of_dictionaries.py that given a file like the CSV file of Monty Python troupe (examples/csv/monty_python.csv" %}, will create a dictionary where we can look up information about them based on the first name. For example:


filename = 'examples/csv/monty_python.csv'

people = read_csv_file(filename)

print(people["Graham"]["lname"])  # Champman
print(people["John"]["born"])     # 27 October 1939
print(people["Michael"])
     # {'lname': 'Palin', 'born': '5 May 1943', 'fname': 'Michael'}
print(people["Terry"]["lname"])  # Gilliam

Exercise: CSV as dictionary of tuples of dictionaries

Create a script called monty_python_dictionary_of_tuples.py that given a file like the CSV file of Monty Python troupe (examples/csv/monty_python.csv), will create a dictionary where we can look up information about them based on the first name and last name. For example:

filename = 'examples/csv/monty_python.csv'
people = read_csv_file(filename)
#print(people)

print(people[("Graham", "Chapman")])
  # {'fname': 'Graham', 'lname': 'Chapman', 'born': '8 January 1941'}

print(people[("Michael", "Palin")])
  # {'fname': 'Michael', 'lname': 'Palin', 'born': '5 May 1943'}

Exercise: count row length in csv files

  • Write a script called csv_column_count.py that given a CSV file will tell if all the rows have the same length or if some of them are different.
  • Show which ones are different.
  • Try it on examples/csv/plain.csv and on examples/csv/uneven.csv

Solution: CSV as dictionary of dictionaries

import csv
import sys

def read_csv_file(filename):
    name_of = {}
    with open(filename) as fh:
        rd = csv.DictReader(fh, delimiter=',')
        for row in rd:
            name_of[ row['fname'] ] = row
    print(name_of)
    return name_of

filename = 'examples/csv/monty_python.csv'
if len(sys.argv) == 2:
    filename = sys.argv[1]

people = read_csv_file(filename)
print(people["Graham"]["lname"])  # Champman
print(people["John"]["born"])     # 27 October 1939
print(people["Michael"])
     # {'lname': 'Palin', 'born': '5 May 1943', 'fname': 'Michael'}
print(people["Terry"]["lname"])  # Gilliam

Solution: CSV as dictionary of tuples of dictionaries

Create a script called monty_python_dictionary_of_tuples.py that given a file like the CSV file of Monty Python troupe (examples/csv/monty_python.csv" %}, will create a dictionary where we can look up information about them based on the first name and last name. For example:

import csv
import sys

def read_csv_file(filename):
    name_of = {}
    with open(filename) as fh:
        rd = csv.DictReader(fh, delimiter=',')
        for row in rd:
            name_of[ (row['fname'], row['lname']) ] = row
    return name_of

filename = 'examples/csv/monty_python.csv'
if len(sys.argv) == 2:
    filename = sys.argv[1]

people = read_csv_file(filename)
#print(people)

print(people[("Graham", "Chapman")])
  # {'fname': 'Graham', 'lname': 'Chapman', 'born': '8 January 1941'}

print(people[("Michael", "Palin")])
  # {'fname': 'Michael', 'lname': 'Palin', 'born': '5 May 1943'}

Solution: count row length in csv files

import csv
import sys
from collections import defaultdict

def check_rows(filename):
    rows = []
    widthes = defaultdict(int)
    with open(filename) as fh:
        rd = csv.reader(fh, delimiter=';')
        for row in rd:
            width = len(row)
            rows.append(width)
            widthes[width] += 1

    #print(widthes)
    if len(widthes.keys()) > 1:
        print("Not all the rows have the same number of cells")
        cell_counts = sorted(widthes.keys(), key=lambda x: widthes[x], reverse=True)
        print(f"Most common number of cells is {cell_counts[0]} with {widthes[ cell_counts[0] ]} rows")
        for count in cell_counts[1:]:
            print(f"  Cells: {count}")
            print(f"  Rows:")
            for row, cells in enumerate(rows):
                if cells == count:
                    print(f"    {row}")
    else:
        values = list(widthes.values())
        print(f"All rows have the same number of cells: {values[0]}")

if len(sys.argv) != 2:
    exit(f"Usage: {sys.argv[0]} FILENAME")

filename = sys.argv[1]

check_rows(filename)