Chapter 14 – Excel Spreadsheets

Source: Automate the Boring Stuff with Python, 3rd Edition — Al Sweigart (No Starch Press). This Markdown version was derived from the chapter PDF.

About this chapter

Although spreadsheets are not always thought of as programming tools, almost everyone uses them to organize information into two-dimensional data structures, perform calculations with formulas, and produce output as charts. This chapter and the next integrate Python with two popular spreadsheet applications: Microsoft Excel and Google Sheets.

The openpyxl module lets Python programs read and modify Excel .xlsx files—for example, copying data between workbooks, filtering or editing thousands of rows by criteria, or scanning many budget files for problems. Those are the kinds of repetitive spreadsheet tasks Python handles well.

Excel is proprietary; LibreOffice Calc (libreoffice.org) is a free alternative that uses the same .xlsx format, so openpyxl works with its files too. The book’s screenshots use cloud-based Office 365 Excel.

Important: openpyxl works on files, not on the Excel desktop app or the Excel web app while a file is only online. With Office 365, download a copy (File → Save As → Download a Copy), run your script, then re-upload. With desktop Excel, close the workbook before running the script, then reopen it to see changes.

Install openpyxl with pip (see Appendix A in the book). Documentation: openpyxl.readthedocs.io. Example spreadsheets are available from the book’s page at nostarch.com/automate-boring-stuff-python-3rd-edition.

Reading Excel Files

Workbooks, sheets, and cells

A spreadsheet file is a workbook (.xlsx).
A workbook can contain multiple sheets (worksheets).
The active sheet is the one the user is viewing (or last viewed before closing).
Columns are labeled A, B, C, …; rows are numbered 1, 2, 3, …
A cell is the box at one column/row; it can hold a number or text.

Examples use example3.xlsx in the current working directory (create it or download from the book’s resources). Sheet1 should match the following.

Table 14-1: The example3.xlsx spreadsheet (Sheet1)

	A	B	C
1	4/5/2035 1:34:02 PM	Apples	73
2	4/5/2035 3:41:23 AM	Cherries	85
3	4/6/2035 12:46:51 PM	Pears	14
4	4/8/2035 8:59:43 AM	Oranges	52
5	4/10/2035 2:07:00 AM	Apples	152
6	4/10/2035 6:10:37 PM	Bananas	23
7	4/10/2035 2:40:46 AM	Strawberries	98

Opening a workbook

import openpyxl

wb = openpyxl.load_workbook('example3.xlsx')

type(wb)

openpyxl.load_workbook() returns a Workbook object representing the file (similar in role to a file object for text files).

Getting sheets from the workbook

import openpyxl

wb = openpyxl.load_workbook('example3.xlsx')

wb.sheetnames  # The workbook's sheets' names

['Sheet1', 'Sheet2', 'Sheet3']

sheet = wb['Sheet3']  # Get a sheet from the workbook.

sheet

type(sheet)

sheet.title  # Get the sheet's title as a string.

'Sheet3'

another_sheet = wb.active  # Get the active sheet.

another_sheet

Access a sheet with bracket notation and the sheet name string. wb.active is the active sheet. Use sheet.title for the name string.

Getting cells from the sheets

import openpyxl

wb = openpyxl.load_workbook('example3.xlsx')

sheet = wb['Sheet1']  # Get a sheet from the workbook.

sheet['A1']  # Get a cell from the sheet.

sheet['A1'].value  # Get the value from the cell.

datetime.datetime(2035, 4, 5, 13, 34, 2)

c = sheet['B1']  # Get another cell from the sheet.

c.value

'Apples'

f'Row {c.row}, Column {c.column} is {c.value}'

'Row 1, Column 2 is Apples'

f'Cell {c.coordinate} is {c.value}'

'Cell B1 is Apples'

sheet['C1'].value

A Cell has value, row, column, and coordinate (e.g. 'B1'). Dates in column A are read as datetime values (see Chapter 19).

Use sheet.cell(row=…, column=…) with 1-based row and column indices when letters are awkward (after Z come AA, AB, …).

sheet.cell(row=1, column=2)

sheet.cell(row=1, column=2).value

'Apples'

for i in range(1, 8, 2):  # Go through every other row.

...     print(i, sheet.cell(row=i, column=2).value)
...
1 Apples
3 Pears
5 Apples
7 Strawberries

sheet.max_row and sheet.max_column are integers (the highest used row/column numbers).

Converting between column letters and numbers

import openpyxl

from openpyxl.utils import get_column_letter, column_index_from_string

get_column_letter(1)  # Translate column 1 to a letter.

'A'

get_column_letter(2)

'B'

get_column_letter(27)

'AA'

get_column_letter(900)

'AHP'

wb = openpyxl.load_workbook('example3.xlsx')

sheet = wb['Sheet1']

get_column_letter(sheet.max_column)

'C'

column_index_from_string('A')  # Get A's number.

column_index_from_string('AA')

You do not need a loaded workbook to use these helpers.

Getting rows and columns (slices and `rows` / `columns`)

Slicing a worksheet returns nested tuples of Cell objects (rows from top to bottom, cells left to right within each row):

import openpyxl

wb = openpyxl.load_workbook('example3.xlsx')

sheet = wb['Sheet1']

sheet['A1':'C3']  # Tuple of row tuples of Cell objects for A1 through C3.

for row_of_cell_objects in sheet['A1':'C3']:

...     for cell_obj in row_of_cell_objects:
...         print(cell_obj.coordinate, cell_obj.value)
...     print('--- END OF ROW ---')
...
A1 2035-04-05 13:34:02
B1 Apples
C1 73
--- END OF ROW ---
A2 2035-04-05 03:41:23
B2 Cherries
C2 85
--- END OF ROW ---
A3 2035-04-06 12:46:51
B3 Pears
C3 14
--- END OF ROW ---

sheet.rows and sheet.columns are generators; wrap with list() before indexing. list(sheet.columns)[1] is column B (0-based index). list(sheet.rows) gives one tuple per row, each tuple holding that row’s cells.

A review of workbooks, sheets, and cells

Import openpyxl.
Call openpyxl.load_workbook() to get a Workbook.
Use sheetnames or active to choose a sheet.
Get a Worksheet (e.g. wb['Sheet1']).
Read cells with sheet['A1'], sheet.cell(row=…, column=…), etc.
Use the Cell object’s value (and optionally row, column, coordinate).

Project 9: Gather census statistics

Goal: From censuspopdata.xlsx (2010 US Census), count census tracts and total population per county. Each row is one tract; columns are tract number (A), state (B), county (C), population (D). Sheet name: 'Population by Census Tract'.

Program outline:

Open the workbook with openpyxl.
Aggregate tract/population into a data structure.
Write results to a .py file using pprint.pformat() so the data can be **import**ed later.

Data structure: Nested dicts:

Keys: state abbreviations → county names → {'tracts': n, 'pop': n}.

Example:

python

{'AK': {'Anchorage': {'pop': 291826, 'tracts': 55}, ...}, ...}

Reading rows (row 1 is headers; data starts at row 2):

python

# readCensusExcel.py - Tabulates county population and census tracts

import openpyxl, pprint
print('Opening workbook...')
wb = openpyxl.load_workbook('censuspopdata.xlsx')
sheet = wb['Population by Census Tract']
county_data = {}
print('Reading rows...')
for row in range(2, sheet.max_row + 1):
    state = sheet['B' + str(row)].value
    county = sheet['C' + str(row)].value
    pop = sheet['D' + str(row)].value
    county_data.setdefault(state, {})
    county_data[state].setdefault(county, {'tracts': 0, 'pop': 0})
    county_data[state][county]['tracts'] += 1
    county_data[state][county]['pop'] += int(pop)

print('Writing results...')
result_file = open('census2010.py', 'w')
result_file.write('allData = ' + pprint.pformat(county_data))
result_file.close()
print('Done.')

Using the generated module:

import census2010

census2010.allData['AK']['Anchorage']

{'pop': 291826, 'tracts': 55}

After census2010.py exists, you may not need to run readCensusExcel.py again.

Ideas for similar programs (reading)

Compare data across rows or between multiple workbooks.
Validate spreadsheets (blank rows, bad values) and alert the user.
Use spreadsheet data as input to other Python programs.

Writing Excel documents

Creating and saving workbooks

import openpyxl

wb = openpyxl.Workbook()  # Create a blank workbook.

wb.sheetnames  # The workbook starts with one sheet.

['Sheet']

sheet = wb.active

sheet.title

'Sheet'

sheet.title = 'Spam Bacon Eggs Sheet'  # Change the title.

wb.sheetnames

['Spam Bacon Eggs Sheet']

Persist changes with wb.save('filename.xlsx'). When editing a file you loaded, save under a new filename so the original stays safe. save() fails if the file is open in Excel—close it first.

import openpyxl

wb = openpyxl.load_workbook('example3.xlsx')

sheet = wb['Sheet1']

sheet.title = 'Spam Spam Spam'

wb.save('example3_copy.xlsx')  # Save the workbook.

Creating and removing sheets

import openpyxl

wb = openpyxl.Workbook()

wb.sheetnames

['Sheet']

wb.create_sheet()  # Add a new sheet (default name Sheet1, etc.).

wb.create_sheet(index=0, title='First Sheet')

wb.create_sheet(index=2, title='Middle Sheet')

del wb['Middle Sheet']

del wb['Sheet1']

Call save() after adding or removing sheets.

Writing values to cells

Assignment mirrors dict-style access:

import openpyxl

wb = openpyxl.Workbook()

sheet = wb['Sheet']

sheet['A1'] = 'Hello, world!'

sheet['A1'].value

'Hello, world!'

Project 10: Update a spreadsheet

Data: produceSales3.xlsx—columns: produce (A), cost per pound (B), pounds sold (C), total (D) with formulas like =ROUND(B2*C2, 2).

Task: Fix prices only for Celery, Garlic, and Lemon without a naive find-replace on numbers.

Prices: Celery 1.19, Garlic 3.07, Lemon 1.27.

python

# updateProduce.py - Corrects costs in produce sales spreadsheet

import openpyxl

wb = openpyxl.load_workbook('produceSales3.xlsx')
sheet = wb['Sheet']

PRICE_UPDATES = {'Garlic': 3.07,
                 'Celery': 1.19,
                 'Lemon': 1.27}

for row_num in range(2, sheet.max_row + 1):  # Skip the header row.
    produce_name = sheet.cell(row=row_num, column=1).value
    if produce_name in PRICE_UPDATES:
        sheet.cell(row=row_num, column=2).value = PRICE_UPDATES[produce_name]

wb.save('updatedProduceSales3.xlsx')

Save to a new file first; delete the old one only after verifying the output.

Ideas for similar programs (writing)

Copy data between spreadsheets or from the web, files, or clipboard into Excel.
“Clean” data (e.g. normalize phone numbers with regular expressions).

Setting the font style of cells

python

from openpyxl.styles import Font

import openpyxl

from openpyxl.styles import Font

wb = openpyxl.Workbook()

sheet = wb['Sheet']

italic_24_font = Font(size=24, italic=True)

sheet['A1'].font = italic_24_font

sheet['A1'] = 'Hello, world!'

wb.save('styles3.xlsx')

Assign a Font object to cell.font. Set cell.value after or before styling as needed; the example sets font then value.

Table 14-2: Keyword arguments for `Font` objects

Keyword argument	Data type	Description
`name`	String	Font name, e.g. `'Calibri'`, `'Times New Roman'`
`size`	Integer	Point size
`bold`	Boolean	Bold text
`italic`	Boolean	Italic text

bold_font = Font(name='Times New Roman', bold=True)

sheet['A1'].font = bold_font

sheet['A1'] = 'Bold Times New Roman'

italic_font = Font(size=24, italic=True)

sheet['B3'].font = italic_font

sheet['B3'] = '24 pt Italic'

Formulas

Assign a formula like any string value (starts with =):

sheet['B9'] = '=SUM(B1:B8)'

import openpyxl

wb = openpyxl.Workbook()

sheet = wb['Sheet']

sheet['A1'] = 200

sheet['A2'] = 300

sheet['A3'] = '=SUM(A1:A2)'

wb.save('writeFormula3.xlsx')

openpyxl does not evaluate formulas. Excel evaluates them when you open and save the file. To read computed values, open with data_only=True (after Excel has saved the cached results):

import openpyxl

wb = openpyxl.load_workbook('writeFormula3.xlsx')

wb.active['A3'].value

'=SUM(A1:A2)'

wb = openpyxl.load_workbook('writeFormula3.xlsx', data_only=True)

wb.active['A3'].value

If 500 does not appear, open the file in Excel and save so results are stored in the workbook.

Adjusting rows and columns

Setting row height and column width

import openpyxl

wb = openpyxl.Workbook()

sheet = wb['Sheet']

sheet['A1'] = 'Tall row'

sheet['B2'] = 'Wide column'

sheet.row_dimensions[1].height = 70

sheet.column_dimensions['B'].width = 20

wb.save('dimensions3.xlsx')

Use sheet.row_dimensions[row_number] and sheet.column_dimensions['LETTER']. Default sizes depend on Excel/openpyxl version.

Hiding rows and columns

The chapter notes you can hide rows and columns from view (in addition to freezing panes). In openpyxl, set the hidden attribute on the corresponding RowDimension or ColumnDimension object—for example, sheet.column_dimensions['C'].hidden = True to hide column C, or sheet.row_dimensions[5].hidden = True for a row.

Merging and unmerging cells

sheet.merge_cells('A1:D3')

sheet['A1'] = 'Twelve cells merged together.'

sheet.merge_cells('C5:D5')

sheet['C5'] = 'Two merged cells.'

Set the value on the top-left cell of the merged range. sheet.unmerge_cells('A1:D3') splits merged regions.

Freezing panes

sheet.freeze_panes can be a coordinate string or None. Freezing 'A2' keeps row 1 visible; 'B1' freezes column A; use 'A1' or None to clear. See Table 14-3.

Table 14-3: Frozen pane examples

`freeze_panes` setting	Rows and columns frozen
`sheet.freeze_panes = 'A2'`	Row 1 (no columns)
`sheet.freeze_panes = 'B1'`	Column A (no rows)
`sheet.freeze_panes = 'C1'`	Columns A and B
`sheet.freeze_panes = 'C2'`	Row 1 and columns A–B
`sheet.freeze_panes = 'A1'` or `None`	None

sheet.freeze_panes = 'A2'

wb.save('freezeExample3.xlsx')

Charts

Create a Reference to a rectangular cell range.
Create a Series from that reference.
Create a Chart (e.g. BarChart).
Append the series to the chart.
Add the chart to the worksheet (optional top-left cell).

openpyxl.chart.Reference(worksheet, min_col, min_row, max_col, max_row) selects a rectangle of cells (1-based column and row numbers). The book’s example Reference(sheet, 1, 1, 1, 10) uses column A from row 1 through 10.

import openpyxl

wb = openpyxl.Workbook()

sheet = wb.active

for i in range(1, 11):

...     sheet['A' + str(i)] = i * i
...

ref_obj = openpyxl.chart.Reference(sheet, 1, 1, 1, 10)

series_obj = openpyxl.chart.Series(ref_obj, title='First series')

chart_obj = openpyxl.chart.BarChart()

chart_obj.title = 'My Chart'

chart_obj.append(series_obj)

sheet.add_chart(chart_obj, 'C5')

wb.save('sampleChart3.xlsx')

Other chart types include LineChart, ScatterChart, and PieChart in openpyxl.chart.

Summary

Getting data into the right shape is often harder than processing it. After loading a workbook with openpyxl, you can extract and transform data far faster than by hand.

You can also generate spreadsheets from programs—avoiding manual copy-paste from text, PDFs, or databases. The next chapter covers Google Sheets.

Practice Questions

Assume a Workbook in wb, a Worksheet in sheet where noted.

What does openpyxl.load_workbook() return?
What does wb.sheetnames contain?
How do you get the Worksheet named 'Sheet1'?
How do you get the Worksheet for the workbook’s active sheet?
How do you read the value in cell C5?
How do you set cell C5 to "Hello"?
How do you get a cell’s row and column as integers?
What do sheet.max_column and sheet.max_row hold, and what is their type?
Which function gives the integer index for column 'M'?
If you needed to get the string name for row 14, what function would you need to call?
How do you get Cell objects for the range A1 through F1?
How do you save the workbook as example3.xlsx?
How do you put a formula in a cell?
To read a formula’s result instead of the formula string, what must you do first (and how do you open the workbook)?
How do you set row 5’s height to 100?
How do you hide column C?
What is a freeze pane?
What five calls create a bar chart (objects/methods)?

Practice programs

Multiplication table maker. Write multiplicationTable.py so it takes N from the command line and builds an N×N multiplication table in Excel. Row 1 and column A should hold labels and be bold (see Figure 14-10 in the book).

Blank row inserter. Write blankRowInserter.py with arguments N, M, and a filename. Starting at row N, insert M blank rows (e.g. python blankRowInserter.py 3 2 myProduce.xlsx). Hint: read the sheet, copy rows 1 … N−1 as-is, then write remaining rows shifted down by M (Figure 14-11).

Chapter 14 – Excel Spreadsheets

About this chapter

Reading Excel Files

Workbooks, sheets, and cells

Table 14-1: The example3.xlsx spreadsheet (Sheet1)

Opening a workbook

Getting sheets from the workbook

Getting cells from the sheets

Converting between column letters and numbers

Getting rows and columns (slices and `rows` / `columns`)

A review of workbooks, sheets, and cells

Project 9: Gather census statistics

Ideas for similar programs (reading)

Writing Excel documents

Creating and saving workbooks

Creating and removing sheets

Writing values to cells

Project 10: Update a spreadsheet

Ideas for similar programs (writing)

Setting the font style of cells

Table 14-2: Keyword arguments for `Font` objects

Formulas

Adjusting rows and columns

Setting row height and column width

Hiding rows and columns

Merging and unmerging cells

Freezing panes

Table 14-3: Frozen pane examples

Charts

Summary

Practice Questions

Practice programs

Check your understanding

Go deeper

Chapter 14 – Excel Spreadsheets

About this chapter

Reading Excel Files

Workbooks, sheets, and cells

Table 14-1: The example3.xlsx spreadsheet (Sheet1)

Opening a workbook

Getting sheets from the workbook

Getting cells from the sheets

Converting between column letters and numbers

Getting rows and columns (slices and rows / columns)

A review of workbooks, sheets, and cells

Project 9: Gather census statistics

Ideas for similar programs (reading)

Writing Excel documents

Creating and saving workbooks

Creating and removing sheets

Writing values to cells

Project 10: Update a spreadsheet

Ideas for similar programs (writing)

Setting the font style of cells

Table 14-2: Keyword arguments for Font objects

Formulas

Adjusting rows and columns

Setting row height and column width

Hiding rows and columns

Merging and unmerging cells

Freezing panes

Table 14-3: Frozen pane examples

Charts

Summary

Practice Questions

Practice programs

Check your understanding

Go deeper

Getting rows and columns (slices and `rows` / `columns`)

Table 14-2: Keyword arguments for `Font` objects