Python is a powerful programming language that offers many libraries and tools to handle data in various formats. One such format is CSV (Comma-Separated Values), which is commonly used for storing and exchanging tabular data between different applications.
Python provides several libraries for working with CSV files, including the built-in csv
module. In this article, we'll explore how to use the csv
module to read and write CSV files in Python, along with some useful tips and tricks.
Types of CSV Files
There are different types of CSV files, depending on the delimiters used to separate the values. The most common types are:
- Comma-separated values (CSV)
- Tab-separated values (TSV)
- Pipe-separated values (PSV)
In a CSV file, each line represents a row of data, and the values in each row are separated by commas. In a TSV file, the values are separated by tabs, and in a PSV file, they are separated by pipes.
Reading CSV Files
To read a CSV file in Python, we can use the csv.reader
class from the csv
module.
Here's an example:
pythonimport csv
with open('example.csv') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
print(row)
In this example, we're opening a CSV file named example.csv
using a context manager (with
statement) to ensure that the file is properly closed after reading. We then create a csv.reader
object using the csv.reader()
function and iterate over each row in the file using a for
loop.
By default, the csv.reader
class assumes that the values are separated by commas. However, we can specify a different delimiter by passing a delimiter
argument to the csv.reader()
function.
pythonimport csv
with open('example.tsv') as tsvfile:
reader = csv.reader(tsvfile, delimiter='\t')
for row in reader:
print(row)
In this example, we're opening a TSV file named example.tsv
and specifying the tab character as the delimiter using the delimiter='\t'
argument.
Writing CSV Files
To write data to a CSV file in Python, we can use the csv.writer
class from the csv
module.
Here's an example:
pythonimport csv
data = [
['Name', 'Age', 'City'],
['Achinta', '35', 'Kolkata'],
['Diganta', '30', 'Bengaluru'],
['Santi', '40', 'Mumbai']
]
with open('output.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
writer.writerows(data)
In this example, we're creating a list of lists called data
that represents the rows of a CSV file. We then open a new file named output.csv
in write mode using a context manager and create a csv.writer
object using the csv.writer()
function. We then use the writerows()
method to write the data to the file.
By default, the csv.writer
class separates values with commas. However, you can specify a different delimiter by passing a delimiter
argument to the csv.writer()
function.
pythonimport csv
data = [['Name', 'Age', 'Gender'],
['Achinta', 35, 'Male'],
['Diganta', 30, 'Male'],
['Santi', 40, 'Female']]
with open('output.tsv', 'w', newline='') as tsvfile:
writer = csv.writer(file, delimiter='\t')
writer.writerows(data)
In this example, we're creating a TSV file named output.tsv
by specifying the tab character as the delimiter using the delimiter='\t'
argument.
Handling Headers
CSV files often include a header row that provides the names of the columns. When reading a CSV file, we can use the next()
function to skip the header row.
pythonimport csv
with open('example.csv') as csvfile:
reader = csv.reader(csvfile)
headers = next(reader)
for row in reader:
print(row)
In this example, we're reading a CSV file named example.csv
and using the next()
function to skip the header row and store it in a variable called headers
. We then iterate over the remaining rows using a for
loop.
When writing data to a CSV file, we can include the header row by calling the writerow()
method with the header values before writing the data rows.
pythonimport csv
data = [
['Name', 'Age', 'City'],
['Achinta', '35', 'Delhi'],
['Diganta', '30', 'Mumbai'],
['Santi', '40', 'Kolkata']
]
with open('output.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['Name', 'Age', 'City'])
writer.writerows(data[1:])
In this example, we're creating a CSV file named output.csv
and calling the writerow()
method with the header row values ['Name', 'Age', 'City']
before writing the data rows using the writerows()
method.
Handling Quotes and Special Characters
CSV files can also contain special characters and quotes that need to be properly escaped when reading and writing data. The csv
module in Python provides several options for handling these cases.
When reading a CSV file, we can use the csv.reader
class with the quotechar
and escapechar
arguments to handle quotes and special characters.
pythonimport csv
with open('example.csv') as csvfile:
reader = csv.reader(csvfile, quotechar='"', escapechar='\\')
for row in reader:
print(row)
In this example, we're reading a CSV file named example.csv
and specifying the double quote character as the quotechar
and the backslash character as the escapechar
.
When writing data to a CSV file, we can use the csv.writer
class with the quotechar
and escapechar
arguments to properly escape quotes and special characters.
pythonimport csv
data = [
['Name', 'Age', 'City'],
['Achinta', '35', 'Delhi'],
['Diganta', '30', 'Bengaluru'],
['Santi', '40', 'Kolkata']]
with open('output.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile, quotechar='"', escapechar='\\', quoting=csv.QUOTE_MINIMAL)
writer.writerows(data)
In this example, we're creating a CSV file named output.csv
and specifying the double quote character as the quotechar
, the backslash character as the escapechar
, and the quoting=csv.QUOTE_MINIMAL
option to use the minimum required quoting.
Conclusion
Python's csv
module provides a simple and flexible way to handle CSV files in Python. With the examples and techniques covered in this article, we should be able to read and write CSV files in Python, handle different types of delimiters, handle headers, and properly escape quotes and special characters.