In this tutorial, we will see how we can read data from a CSV file and save a pandas data-frame as a CSV (comma separated values) file in pandas.

Read CSV file in Pandas as Data Frame

read_csv() method of pandas will read the data from a comma-separated values file having .csv as a pandas data-frame and also provide some arguments to give some flexibility according to the requirement.

Pandas read_csv

The official documentation provides the syntax below, We will learn the most commonly used among these in the following sections with an example.

pandas.read_csv(filepath, sep=',', header='infer', names=None, index_col=None, 
usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None,
converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None,
skipfooter=0, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, 
skip_blank_lines=True, infer_datetime_format=False, keep_date_col=False, 
date_parser=None,iterator=False, chunksize=None, 
compression='infer', thousands=None, decimal='.', lineterminator=None, quotechar='"', quoting=0, 
doublequote=True, escapechar=None, comment=None, encoding=None, dialect=None, 
error_bad_lines=True, warn_bad_lines=True, delim_whitespace=False, low_memory=True, 
memory_map=False, float_precision=None)

1. Reading a CSV file:

In this example, we will try to read a CSV file using the below arguments along with the file path.

Data.csv
Country,Age,Salary,Purchased
France,44,72000,No
Spain,27,48000,Yes
Germany,30,54000,No
Spain,38,61000,No
Germany,40,,Yes
France,35,58000,Yes
Spain,,52000,No
France,48,79000,Yes
Germany,50,83000,No
France,37,67000,Yes
  • file-path – This is the path to the file in string format.
  • sep – It is the delimiter that tells the symbol to use for splitting the data.
  • header – integer list of rows to be used as the columns. If multiple rows are passed then we will get a multi-column index data.

See the code below where we will use these arguments to read the file.

# importing pandas and giving an alias name
import pandas as pd
# URL of the data
url = "home/user/kunalgupta2616/datasets/master/Data.csv"
# method to be used to read the data
data = pd.read_csv(url,header=[0],sep=',')
print(data)

Output:

 Country Age Salary Purchased
0 France 44.0 72000.0 No
1 Spain 27.0 48000.0 Yes
2 Germany 30.0 54000.0 No
3 Spain 38.0 61000.0 No
4 Germany 40.0 NaN Yes
5 France 35.0 58000.0 Yes
6 Spain NaN 52000.0 No
7 France 48.0 79000.0 Yes
8 Germany 50.0 83000.0 No
9 France 37.0 67000.0 Yes

2. Reading custom no. of rows and columns:

  • usecols – List of column names from data to be read.
  • index_col – This defines the names of row labels, it can be a column from the data or the list of integer or string, None by default.
  • skiprows – list of rows number / No. or rows to be skipped from the top. It is 0-indexed.
  • skipfooter – No. or rows to be skipped from the bottom.
  • skip_blank_lines – If there is any blank line it will be skipped instead of using NaN.
  • nrows – The number of rows to be read from the file.

Let’s see an example code to see some of these parameters.

import pandas as pd
url = "home/user/kunalgupta2616/datasets/master/Data2.csv"
data1 = pd.read_csv(url,usecols=['Country','Age','Purchased'],skiprows = [1,2],nrows=4,index_col='Country')
print(data1)

Output:

         Age Purchased
Country
Germany   30        No
Spain     38        No
Germany   40       Yes
France    35       Yes

3. Parsing column containing Date:

For this example, we will be using employee data of an organization that can be found at this link.

  • parse_dates – List of 0-indexed column numbers that can contain data containing dates.

Let us read top 10 rows of this data and parse a column containing dates using parse_dates argument. To verify that the column is of DateTime type, we will print the dtypes attribute.

import pandas as pd
url = "https://raw.githubusercontent.com/kunalgupta2616/datasets/master/employees.csv"
data2 = pd.read_csv(url,nrows=5,parse_dates=[2])
print(data2.dtypes)

Output:

First Name object
Gender object
Start Date datetime64[ns]
Last Login Time object
Salary int64
Bonus % float64
Senior Management bool
Team object
dtype: object

These are the most commonly used arguments that are used when reading a CSV file in pandas. Let us see how we can save a data frame as a CSV file in pandas.

References

Happy Learning 🙂