In this tutorial, we will see how we can read Excel file in pandas
using examples.
Read Excel file in Pandas as Data Frame
read_excel()
method of pandas will read the data from excel files having xls
, xlsx
, xlsm
, xlsb
, odf
, ods
and odt
file extensions as a pandas data-frame and also provide some arguments to give some flexibility according to the requirement.
Pandas read_excel
The official documentation provides the syntax below, We will learn the most commonly used arguments of this method in the following sections with examples.
pandas.read_excel(*args, **kwargs)
1. Reading an Excel file:
In this example, we will try to read an Excel file with the file name and sheet name defined below and then customize the reading using different arguments along with the file path.
File name - Fruit.xlsx Sheet name - sweet or sour
Fruit | Sweetness | Soreness | Fruit Type |
Lemon | 1 | 9 | Sour |
Grapefruit | 2 | 8 | Sour |
Orange | 3 | 7 | Sour |
Raspberry | 2 | 8 | Sour |
Cherry | 6 | 4 | Sweet |
Banana | 9 | 1 | Sweet |
Grapes | 8 | 2 | Sweet |
Watermelon | 9 | 1 | Sweet |
Avacado | 1 | 1 | None |
Strawberry | 5 | 5 | Sour |
- file-path – This is the path to the file in string format.
- sheet_name – Name of the sheet to open, or its 0-indexed integer number.
- header – integer list of rows to be used as the columns. If multiple rows are passed then we will get a multi-column index data.
See the code below where we will use these arguments to read the file.
# importing pandas module
import pandas as pd
# defining File path
fpath = "F:/onlinetutorialspoint/Fruit.xlsx"
# method to be used to read the excel file
data2 = pd.read_excel(fpath,sheet_name='sweet or sour',header=[0])
print(data2)
Output:
Fruit Sweetness Soreness Fruit Type
0 Lemon 1 9 Sour
1 Grapefruit 2 8 Sour
2 Orange 3 7 Sour
3 Raspberry 2 8 Sour
4 Cherry 6 4 Sweet
5 Banana 9 1 Sweet
6 Grapes 8 2 Sweet
7 Watermelon 9 1 Sweet
8 Avacado 1 1 None
9 Strawberry 5 5 Sour
2. Reading custom no. of rows and columns:
2.1. Reading selective column and index
- usecols – List of column names from data to be read.
- index_col – This defines the names of row labels, it can be a column from the data or the list of integer or string, None by default.
Let’s see an example code to read the data by changing a column to index in data.
import pandas as pd
# File path
fpath = "F:/onlinetutorialspoint/Fruit.xlsx"
# method to be used to read the data
data2 = pd.read_excel(fpath,usecols=['Fruit','Sweetness','Soreness'],index_col='Fruit',nrows = 5)
print(data2)
Output:
Sweetness Soreness
Fruit
Lemon 1 9
Grapefruit 2 8
Orange 3 7
Raspberry 2 8
Cherry 6 4
Banana 9 1
Grapes 8 2
Watermelon 9 1
Avacado 1 1
Strawberry 5 5
2.2. Reading selective rows
- skiprows – list of rows number / No. or rows to be skipped from the top. It is 0-indexed.
- skipfooter – No. or rows to be skipped from the bottom.
- nrows – The number of rows to be read from the file.
Let’s see an example code to read only 4 rows from data without reading the top 2 rows.
import pandas as pd
fpath = "F:/onlinetutorialspoint/Fruit.xlsx"
data3 = pd.read_excel(fpath,header=None,nrows=4,skiprows=2)
print(data3)
Output:
0 1 2 3
0 Grapefruit 2 8 Sour
1 Orange 3 7 Sour
2 Raspberry 2 8 Sour
3 Cherry 6 4 Sweet
These are the most commonly used arguments that are used when reading an Excel file in pandas.
References
Happy Learning 🙂