In this tutorial, we will see how we can read data from a CSV file and save a pandas data-frame as a CSV (comma separated values) file in pandas
.
Read CSV file in Pandas as Data Frame
read_csv()
method of pandas will read the data from a comma-separated values file having .csv as a pandas data-frame and also provide some arguments to give some flexibility according to the requirement.
Pandas read_csv
The official documentation provides the syntax below, We will learn the most commonly used among these in the following sections with an example.
pandas.read_csv(filepath, sep=',', header='infer', names=None, index_col=None,
usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None,
converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None,
skipfooter=0, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False,
skip_blank_lines=True, infer_datetime_format=False, keep_date_col=False,
date_parser=None,iterator=False, chunksize=None,
compression='infer', thousands=None, decimal='.', lineterminator=None, quotechar='"', quoting=0,
doublequote=True, escapechar=None, comment=None, encoding=None, dialect=None,
error_bad_lines=True, warn_bad_lines=True, delim_whitespace=False, low_memory=True,
memory_map=False, float_precision=None)
1. Reading a CSV file:
In this example, we will try to read a CSV file using the below arguments along with the file path.
Country,Age,Salary,Purchased
France,44,72000,No
Spain,27,48000,Yes
Germany,30,54000,No
Spain,38,61000,No
Germany,40,,Yes
France,35,58000,Yes
Spain,,52000,No
France,48,79000,Yes
Germany,50,83000,No
France,37,67000,Yes
- file-path – This is the path to the file in string format.
- sep – It is the delimiter that tells the symbol to use for splitting the data.
- header – integer list of rows to be used as the columns. If multiple rows are passed then we will get a multi-column index data.
See the code below where we will use these arguments to read the file.
# importing pandas and giving an alias name
import pandas as pd
# URL of the data
url = "home/user/kunalgupta2616/datasets/master/Data.csv"
# method to be used to read the data
data = pd.read_csv(url,header=[0],sep=',')
print(data)
Output:
Country Age Salary Purchased
0 France 44.0 72000.0 No
1 Spain 27.0 48000.0 Yes
2 Germany 30.0 54000.0 No
3 Spain 38.0 61000.0 No
4 Germany 40.0 NaN Yes
5 France 35.0 58000.0 Yes
6 Spain NaN 52000.0 No
7 France 48.0 79000.0 Yes
8 Germany 50.0 83000.0 No
9 France 37.0 67000.0 Yes
2. Reading custom no. of rows and columns:
- usecols – List of column names from data to be read.
- index_col – This defines the names of row labels, it can be a column from the data or the list of integer or string, None by default.
- skiprows – list of rows number / No. or rows to be skipped from the top. It is 0-indexed.
- skipfooter – No. or rows to be skipped from the bottom.
- skip_blank_lines – If there is any blank line it will be skipped instead of using NaN.
- nrows – The number of rows to be read from the file.
Let’s see an example code to see some of these parameters.
import pandas as pd
url = "home/user/kunalgupta2616/datasets/master/Data2.csv"
data1 = pd.read_csv(url,usecols=['Country','Age','Purchased'],skiprows = [1,2],nrows=4,index_col='Country')
print(data1)
Output:
Age Purchased
Country
Germany 30 No
Spain 38 No
Germany 40 Yes
France 35 Yes
3. Parsing column containing Date:
For this example, we will be using employee data of an organization that can be found at this link.
- parse_dates – List of 0-indexed column numbers that can contain data containing dates.
Let us read top 10 rows of this data and parse a column containing dates using parse_dates argument. To verify that the column is of DateTime type, we will print the dtypes attribute.
import pandas as pd
url = "https://raw.githubusercontent.com/kunalgupta2616/datasets/master/employees.csv"
data2 = pd.read_csv(url,nrows=5,parse_dates=[2])
print(data2.dtypes)
Output:
First Name object
Gender object
Start Date datetime64[ns]
Last Login Time object
Salary int64
Bonus % float64
Senior Management bool
Team object
dtype: object
These are the most commonly used arguments that are used when reading a CSV file in pandas. Let us see how we can save a data frame as a CSV file in pandas.
References
Happy Learning 🙂