Python CSV
CSV
What is a CSV file?
-
CSV stands for Comma Separated Values
-
A CSV file is similar to the values you might put in an Excel file. Though in Excel each cell has both a
valueand aformat(and maybe more) attributes. A CSV file only contains values. -
A CSV file has rows and in each row there are values separated by a comma.
-
In some cases the separator is some other character. e.g. a semic-colon (
;), a pipeline (|) or a TAB character. (The last one is also referred to a TSV file where TSV stands for TAB Separated Values. -
There are a number of other variations, so the csv-reading and writing librariers usually provide options to handle these variations.
-
Sometimes all the lines hold values. Sometimes the first line acts as the list of column-names.
CSV file without title row
- Some of the figures in Snow White in Hungarian.
{% embed include file="src/examples/csv/snowwhite.csv)
CSV file with header
- This CSV file contains information about the members of the Monthy Python show.
- The first row contains the titles of the columns.
{% embed include file="src/examples/csv/monty_python.csv)
Read CSV file into lists
import sys
import csv
if len(sys.argv) != 2:
sys.stderr.write("Usage: {} FILENAME\n".format(sys.argv[0]))
exit()
filename = sys.argv[1]
with open(filename) as fh:
rd = csv.reader(fh)
for row in rd:
print(row)
python examples/csv/read_csv.py example/snowwhite.csv
CSV with newlines missing closing quote
{% embed include file="src/examples/csv/with_newlines_error.csv)
CSV to dictionary
- DictReader
import sys
import csv
if len(sys.argv) != 2:
sys.stderr.write("Usage: {} FILENAME\n".format(sys.argv[0]))
exit()
filename = sys.argv[1]
with open(filename) as fh:
rd = csv.DictReader(fh, delimiter=',')
for row in rd:
print(row)
{'lname': 'Chapman', 'born': '8 January 1941', 'fname': 'Graham'}
{'lname': 'Idle', 'born': '29 March 1943', 'fname': 'Eric'}
{'lname': 'Gilliam', 'born': '22 November 1940', 'fname': 'Terry'}
{'lname': 'Jones', 'born': '1 February 1942', 'fname': 'Terry'}
{'lname': 'Cleese', 'born': '27 October 1939', 'fname': 'John'}
{'lname': 'Palin', 'born': '5 May 1943', 'fname': 'Michael'}
CSV Attributes
- delimiter
- doublequote
- escapechar
- lineterminator
- quotechar
- quoting
- skipinitialspace
- strict
CSV dialects
- list_dialects
The csv module defines a number of "dialects", sets of attributes.
import csv
for dialect_name in csv.list_dialects():
print(dialect_name)
dialect = csv.get_dialect(dialect_name)
for attribute_name in [
'delimiter',
'doublequote',
'escapechar',
'lineterminator',
'quotechar',
'quoting',
'skipinitialspace',
'strict',
]:
attr = getattr(dialect, attribute_name)
if attr == '\t':
attr = '\\t'
if attr == '\r\n':
attr = '\\r\\n'
print(" {:16} '{}'".format(attribute_name, attr))
excel
delimiter ','
doublequote 'True'
escapechar 'None'
lineterminator '\r\n'
quotechar '"'
quoting '0'
skipinitialspace 'False'
strict 'False'
excel-tab
delimiter '\t'
doublequote 'True'
escapechar 'None'
lineterminator '\r\n'
quotechar '"'
quoting '0'
skipinitialspace 'False'
strict 'False'
unix
delimiter ','
doublequote 'True'
escapechar 'None'
lineterminator '
'
quotechar '"'
quoting '1'
skipinitialspace 'False'
strict 'False'
Dialects of CSV files. See also: csv
Reading CSV the naive way
-
split
-
This is not recommended as it will fail in some cases. See next page!
Tudor;Vidor;10;Hapci
Szundi;Morgo;7;Szende
Kuka;Hofeherke;100;Kiralyno
Boszorkany;Herceg;9;Meselo
import sys, csv
if len(sys.argv) != 2:
sys.stderr.write("Usage: {} FILENAME\n".format(sys.argv[0]))
exit()
filename = sys.argv[1]
count = 0
with open(filename) as fh:
for line in fh:
line = line.rstrip("\n")
row = line.split(';')
#print(row)
count += int(row[2])
print("Total: {}".format(count))
python examples/csv/read_csv_split.py examples/csv/plain.csv
CSV with quotes and newlines
Tudor;Vidor;10;Hapci
Szundi;Morgo;7;Szende
Kuka;"Hofeherke; alma";100;Kiralyno
Boszorkany;Herceg;9;Meselo
Tudor;Vidor;10;Hapci
Szundi;Morgo;7;Szende
Kuka;"Hofeherke;
alma";100;Kiralyno
Boszorkany;Herceg;9;Meselo
Reading a CSV file
- csv
- reader
import sys
import csv
if len(sys.argv) != 2:
sys.stderr.write("Usage: {} FILENAME\n".format(sys.argv[0]))
exit()
filename = sys.argv[1]
count = 0
with open(filename) as fh:
rd = csv.reader(fh,
delimiter=';',
#strict=True,
)
for row in rd:
print(row)
count += int(row[2])
print("Total: {}".format(count))
python examples/csv/read_csv.py examples/csv/plain.csv
Exercise: CSV as dictionary of dictionaries
Create a script called monty_python_dictionary_of_dictionaries.py that given a file like the CSV file of Monty Python troupe (examples/csv/monty_python.csv" %}, will create a dictionary where we can look up information about them based on the first name. For example:
filename = 'examples/csv/monty_python.csv'
people = read_csv_file(filename)
print(people["Graham"]["lname"]) # Champman
print(people["John"]["born"]) # 27 October 1939
print(people["Michael"])
# {'lname': 'Palin', 'born': '5 May 1943', 'fname': 'Michael'}
print(people["Terry"]["lname"]) # Gilliam
Exercise: CSV as dictionary of tuples of dictionaries
Create a script called monty_python_dictionary_of_tuples.py that given a file like the CSV file of Monty Python troupe (examples/csv/monty_python.csv), will create a dictionary where we can look up information about them based on the first name and last name. For example:
filename = 'examples/csv/monty_python.csv'
people = read_csv_file(filename)
#print(people)
print(people[("Graham", "Chapman")])
# {'fname': 'Graham', 'lname': 'Chapman', 'born': '8 January 1941'}
print(people[("Michael", "Palin")])
# {'fname': 'Michael', 'lname': 'Palin', 'born': '5 May 1943'}
Exercise: count row length in csv files
- Write a script called csv_column_count.py that given a CSV file will tell if all the rows have the same length or if some of them are different.
- Show which ones are different.
- Try it on
examples/csv/plain.csvand onexamples/csv/uneven.csv
Solution: CSV as dictionary of dictionaries
import csv
import sys
def read_csv_file(filename):
name_of = {}
with open(filename) as fh:
rd = csv.DictReader(fh, delimiter=',')
for row in rd:
name_of[ row['fname'] ] = row
print(name_of)
return name_of
filename = 'examples/csv/monty_python.csv'
if len(sys.argv) == 2:
filename = sys.argv[1]
people = read_csv_file(filename)
print(people["Graham"]["lname"]) # Champman
print(people["John"]["born"]) # 27 October 1939
print(people["Michael"])
# {'lname': 'Palin', 'born': '5 May 1943', 'fname': 'Michael'}
print(people["Terry"]["lname"]) # Gilliam
Solution: CSV as dictionary of tuples of dictionaries
Create a script called monty_python_dictionary_of_tuples.py that given a file like the CSV file of Monty Python troupe (examples/csv/monty_python.csv" %}, will create a dictionary where we can look up information about them based on the first name and last name. For example:
import csv
import sys
def read_csv_file(filename):
name_of = {}
with open(filename) as fh:
rd = csv.DictReader(fh, delimiter=',')
for row in rd:
name_of[ (row['fname'], row['lname']) ] = row
return name_of
filename = 'examples/csv/monty_python.csv'
if len(sys.argv) == 2:
filename = sys.argv[1]
people = read_csv_file(filename)
#print(people)
print(people[("Graham", "Chapman")])
# {'fname': 'Graham', 'lname': 'Chapman', 'born': '8 January 1941'}
print(people[("Michael", "Palin")])
# {'fname': 'Michael', 'lname': 'Palin', 'born': '5 May 1943'}
Solution: count row length in csv files
import csv
import sys
from collections import defaultdict
def check_rows(filename):
rows = []
widthes = defaultdict(int)
with open(filename) as fh:
rd = csv.reader(fh, delimiter=';')
for row in rd:
width = len(row)
rows.append(width)
widthes[width] += 1
#print(widthes)
if len(widthes.keys()) > 1:
print("Not all the rows have the same number of cells")
cell_counts = sorted(widthes.keys(), key=lambda x: widthes[x], reverse=True)
print(f"Most common number of cells is {cell_counts[0]} with {widthes[ cell_counts[0] ]} rows")
for count in cell_counts[1:]:
print(f" Cells: {count}")
print(f" Rows:")
for row, cells in enumerate(rows):
if cells == count:
print(f" {row}")
else:
values = list(widthes.values())
print(f"All rows have the same number of cells: {values[0]}")
if len(sys.argv) != 2:
exit(f"Usage: {sys.argv[0]} FILENAME")
filename = sys.argv[1]
check_rows(filename)