3 Ways of to Merge Excel Files by openpyxl/Pandas

Merging files of Excel by using Pandas and/or openpyxl

In this tutorial, we will show you how to merge two or more Excel files/sheets by using padas (primarily) and then also using openpyxl(somewhat).

For information, if you don’t know already, Pandas is a Python library for analyzing data, created by Wes McKinney.

Pandas has methods that you can use to work with Excel – including merging files – so let us look at it below section.

An example of merging two files by Pandas

In this example, we will use the concat() method of Pandas. For that, we will specify two files that we want to merge and a third file is created with the data of both. First, have a look at two files that we will merge:

File 1: test_1.xlsx

File 2: test_2.xlsx

The code to merge two files:

import pandas as panda_merge
# Reading file 1

file_1=panda_merge.read_excel("test_1.xlsx")
# Reading file 2
file_2=panda_merge.read_excel("test_2.xlsx")

# Merging file 1 and file 2 into file 3

#Providing values in iloc is important

file_3=panda_merge.concat([file_1,file_2.iloc[0:,0:]],axis=1)
# creating a new excel file and save the data

file_3.to_excel("test_3.xlsx",index=False)

Output after code execution (File_3.xlsx):

Notice file_2.iloc[0:,0:]] in above code:

First 0 represents from which row to merge data from the second sheet.
The second0 represents column of the second file.

See another example below to understand its usage more clearly.

Specifying row and column for the merging file

We will merge only the second column and from the fifth row to the end of the test_2.xlsx file to the newly created test_3.xlsx merged file.

All data from test_1.xlsx will be copied:

The code:

import pandas as panda_merge

# Reading file 1
file_1=panda_merge.read_excel("test_1.xlsx")
# Reading file 2
file_2=panda_merge.read_excel("test_2.xlsx")

#Copying second column only with row from 5th to end
file_3=panda_merge.concat([file_1,file_2.iloc[4:,1:]],axis=1)

# Saving the partial data from two files
file_3.to_excel("test_3.xlsx",index=False)

Result:

You can see the difference between this merge and the above example.

Second way – Using Pandas/openpyxl for merging sheets

In this way, the Pandas ExcelWriter method is used to merge two or more sheets.

There, we specify openpyxl as the engine.

To understand, this is pandas.ExcelWriter syntax:

class pandas.ExcelWriter(path, engine=None, date_format=None, datetime_format=None, mode='w', storage_options=None, if_sheet_exists=None, engine_kwargs=None, **kwargs)

For our example, we have these two Excel Files before merging (Same as in the above example):

File 1: test_1.xlsx

File 2: test_2.xlsx

The example below shows using this method for merging two files of Excel:

The code:

import pandas as panda_merge

data_frame=panda_merge.read_excel("test_2.xlsx")

data_frame=data_frame.iloc[:,0:]

with panda_merge.ExcelWriter("test_1.xlsx",mode="a",engine="openpyxl",if_sheet_exists="overlay") as writer:

    data_frame.to_excel(writer, sheet_name="My_sheet1",columns=None, startcol=writer.sheets["My_sheet1"].max_column,index=False)

The output after we executed this code:

Note that this is test_1.xlsx file, unlike the above two examples where we created a new third file.

In the code above:

data_frame reading the test_2.xlsx file – file to be merged.
This line: data_frame=data_frame.iloc[:,0:] specifies which rows/columns to merge. We specified 0, so both columns will be merged. If we specify 1, only the second column will be merged.
By using ExcelWriter() method, the test_1 file is opened in the append mode.
Also notice, we provided the sheet name “My_sheet1”.

Learn more about ExcelWriter() method here.

Merging only one column/fewer rows by openpyxl and pandas example

Just to show you how to specify rows/cols to merge, see the code below. Only difference is single line i.e. iloc.

See the code and output below:

import pandas as panda_merge

data_frame=panda_merge.read_excel("test_2.xlsx")
#Only merge second column and its last rows after 4

data_frame=data_frame.iloc[4:,1:]

with panda_merge.ExcelWriter("test_1.xlsx",mode="a",engine="openpyxl",if_sheet_exists="overlay") as writer:

    data_frame.to_excel(writer, sheet_name="My_sheet1",columns=None, startcol=writer.sheets["My_sheet1"].max_column,index=False)

Result:

You can see in the above image, only the second column (Status), with rows after 4 are merged.

Notice this line in the above code:

data_frame=data_frame.iloc[4:,1:]

Using pandas.DataFrame.merge method to combine two files

If you have some knowledge of working with databases (MySQL, SQL Server etc.) then this merge type is like a join clause (used in databases).

In that merge method, the join is done at columns or indexes. For example, there should be one common column in both sheets that you want to merge.

Take the example of our sample sheets, if we try to join two files as shown in the above examples:

test_1.xlsx

test_2.xlsx

Then pandas.DataFrame.merge will produce an error as there is no common column in both files.

However, for the example, we are adding another column in the test_2.xlsx, i.e. “Product ID”, so it exists in both files to be merged.

After that, these are two files that we will merge by using pandas.DataFrame.merge method:

test_1.xlsx file:

test_2.xlsx file:

So, you can see that the test_2.xlsx is different than used in the above examples i.e. it contains an additional common column “Product ID”.

Now let us perform a merge of these two files in the third one:

The code:

import pandas as pd_merge
# Taking test_1.xlsx and test_2.xlsx data

d_frame1=pd_merge.read_excel("test_1.xlsx")

d_frame2=pd_merge.read_excel("test_2.xlsx")

# merging both files into data frame

d_frame3=d_frame1.merge(d_frame2) 

# Data is saved into a new file - test_3.xlsx

d_frame3.to_excel("test_3.xlsx",index=False)

Result: