Chapter 14 – Excel Spreadsheets

Source: Automate the Boring Stuff with Python, 3rd Edition — Al Sweigart (No Starch Press). This Markdown version was derived from the chapter PDF.

About this chapter

Although spreadsheets are not always thought of as programming tools, almost everyone uses them to organize information into two-dimensional data structures, perform calculations with formulas, and produce output as charts. This chapter and the next integrate Python with two popular spreadsheet applications: Microsoft Excel and Google Sheets.

The openpyxl module lets Python programs read and modify Excel .xlsx files—for example, copying data between workbooks, filtering or editing thousands of rows by criteria, or scanning many budget files for problems. Those are the kinds of repetitive spreadsheet tasks Python handles well.

Excel is proprietary; LibreOffice Calc (libreoffice.org) is a free alternative that uses the same .xlsx format, so openpyxl works with its files too. The book’s screenshots use cloud-based Office 365 Excel.

Important: openpyxl works on files, not on the Excel desktop app or the Excel web app while a file is only online. With Office 365, download a copy (File → Save As → Download a Copy), run your script, then re-upload. With desktop Excel, close the workbook before running the script, then reopen it to see changes.

Install openpyxl with pip (see Appendix A in the book). Documentation: openpyxl.readthedocs.io. Example spreadsheets are available from the book’s page at nostarch.com/automate-boring-stuff-python-3rd-edition.

Reading Excel Files

Workbooks, sheets, and cells

  • A spreadsheet file is a workbook (.xlsx).
  • A workbook can contain multiple sheets (worksheets).
  • The active sheet is the one the user is viewing (or last viewed before closing).
  • Columns are labeled A, B, C, …; rows are numbered 1, 2, 3, …
  • A cell is the box at one column/row; it can hold a number or text.

Examples use example3.xlsx in the current working directory (create it or download from the book’s resources). Sheet1 should match the following.

Table 14-1: The example3.xlsx spreadsheet (Sheet1)

A B C
1 4/5/2035 1:34:02 PM Apples 73
2 4/5/2035 3:41:23 AM Cherries 85
3 4/6/2035 12:46:51 PM Pears 14
4 4/8/2035 8:59:43 AM Oranges 52
5 4/10/2035 2:07:00 AM Apples 152
6 4/10/2035 6:10:37 PM Bananas 23
7 4/10/2035 2:40:46 AM Strawberries 98

Opening a workbook

import openpyxl
wb = openpyxl.load_workbook('example3.xlsx')
type(wb)

openpyxl.load_workbook() returns a Workbook object representing the file (similar in role to a file object for text files).

Getting sheets from the workbook

import openpyxl
wb = openpyxl.load_workbook('example3.xlsx')
wb.sheetnames  # The workbook's sheets' names
['Sheet1', 'Sheet2', 'Sheet3']
sheet = wb['Sheet3']  # Get a sheet from the workbook.
sheet
type(sheet)
sheet.title  # Get the sheet's title as a string.
'Sheet3'
another_sheet = wb.active  # Get the active sheet.
another_sheet

Access a sheet with bracket notation and the sheet name string. wb.active is the active sheet. Use sheet.title for the name string.

Getting cells from the sheets

import openpyxl
wb = openpyxl.load_workbook('example3.xlsx')
sheet = wb['Sheet1']  # Get a sheet from the workbook.
sheet['A1']  # Get a cell from the sheet.
sheet['A1'].value  # Get the value from the cell.
datetime.datetime(2035, 4, 5, 13, 34, 2)
c = sheet['B1']  # Get another cell from the sheet.
c.value
'Apples'
f'Row {c.row}, Column {c.column} is {c.value}'
'Row 1, Column 2 is Apples'
f'Cell {c.coordinate} is {c.value}'
'Cell B1 is Apples'
sheet['C1'].value
73

A Cell has value, row, column, and coordinate (e.g. 'B1'). Dates in column A are read as datetime values (see Chapter 19).

Use sheet.cell(row=…, column=…) with 1-based row and column indices when letters are awkward (after Z come AA, AB, …).

sheet.cell(row=1, column=2)
sheet.cell(row=1, column=2).value
'Apples'
for i in range(1, 8, 2):  # Go through every other row.
...     print(i, sheet.cell(row=i, column=2).value)
...
1 Apples
3 Pears
5 Apples
7 Strawberries

sheet.max_row and sheet.max_column are integers (the highest used row/column numbers).

Converting between column letters and numbers

import openpyxl
from openpyxl.utils import get_column_letter, column_index_from_string
get_column_letter(1)  # Translate column 1 to a letter.
'A'
get_column_letter(2)
'B'
get_column_letter(27)
'AA'
get_column_letter(900)
'AHP'
wb = openpyxl.load_workbook('example3.xlsx')
sheet = wb['Sheet1']
get_column_letter(sheet.max_column)
'C'
column_index_from_string('A')  # Get A's number.
1
column_index_from_string('AA')
27

You do not need a loaded workbook to use these helpers.

Getting rows and columns (slices and rows / columns)

Slicing a worksheet returns nested tuples of Cell objects (rows from top to bottom, cells left to right within each row):

import openpyxl
wb = openpyxl.load_workbook('example3.xlsx')
sheet = wb['Sheet1']
sheet['A1':'C3']  # Tuple of row tuples of Cell objects for A1 through C3.
for row_of_cell_objects in sheet['A1':'C3']:
...     for cell_obj in row_of_cell_objects:
...         print(cell_obj.coordinate, cell_obj.value)
...     print('--- END OF ROW ---')
...
A1 2035-04-05 13:34:02
B1 Apples
C1 73
--- END OF ROW ---
A2 2035-04-05 03:41:23
B2 Cherries
C2 85
--- END OF ROW ---
A3 2035-04-06 12:46:51
B3 Pears
C3 14
--- END OF ROW ---

sheet.rows and sheet.columns are generators; wrap with list() before indexing. list(sheet.columns)[1] is column B (0-based index). list(sheet.rows) gives one tuple per row, each tuple holding that row’s cells.

A review of workbooks, sheets, and cells

  1. Import openpyxl.
  2. Call openpyxl.load_workbook() to get a Workbook.
  3. Use sheetnames or active to choose a sheet.
  4. Get a Worksheet (e.g. wb['Sheet1']).
  5. Read cells with sheet['A1'], sheet.cell(row=…, column=…), etc.
  6. Use the Cell object’s value (and optionally row, column, coordinate).

Project 9: Gather census statistics

Goal: From censuspopdata.xlsx (2010 US Census), count census tracts and total population per county. Each row is one tract; columns are tract number (A), state (B), county (C), population (D). Sheet name: 'Population by Census Tract'.

Program outline:

  • Open the workbook with openpyxl.
  • Aggregate tract/population into a data structure.
  • Write results to a .py file using pprint.pformat() so the data can be **import**ed later.

Data structure: Nested dicts:

  • Keys: state abbreviations → county names → {'tracts': n, 'pop': n}.

Example:

python
{'AK': {'Anchorage': {'pop': 291826, 'tracts': 55}, ...}, ...}

Reading rows (row 1 is headers; data starts at row 2):

python
# readCensusExcel.py - Tabulates county population and census tracts

import openpyxl, pprint
print('Opening workbook...')
wb = openpyxl.load_workbook('censuspopdata.xlsx')
sheet = wb['Population by Census Tract']
county_data = {}
print('Reading rows...')
for row in range(2, sheet.max_row + 1):
    state = sheet['B' + str(row)].value
    county = sheet['C' + str(row)].value
    pop = sheet['D' + str(row)].value
    county_data.setdefault(state, {})
    county_data[state].setdefault(county, {'tracts': 0, 'pop': 0})
    county_data[state][county]['tracts'] += 1
    county_data[state][county]['pop'] += int(pop)

print('Writing results...')
result_file = open('census2010.py', 'w')
result_file.write('allData = ' + pprint.pformat(county_data))
result_file.close()
print('Done.')

Using the generated module:

import census2010
census2010.allData['AK']['Anchorage']
{'pop': 291826, 'tracts': 55}

After census2010.py exists, you may not need to run readCensusExcel.py again.

Ideas for similar programs (reading)

  • Compare data across rows or between multiple workbooks.
  • Validate spreadsheets (blank rows, bad values) and alert the user.
  • Use spreadsheet data as input to other Python programs.

Writing Excel documents

Creating and saving workbooks

import openpyxl
wb = openpyxl.Workbook()  # Create a blank workbook.
wb.sheetnames  # The workbook starts with one sheet.
['Sheet']
sheet = wb.active
sheet.title
'Sheet'
sheet.title = 'Spam Bacon Eggs Sheet'  # Change the title.
wb.sheetnames
['Spam Bacon Eggs Sheet']

Persist changes with wb.save('filename.xlsx'). When editing a file you loaded, save under a new filename so the original stays safe. save() fails if the file is open in Excel—close it first.

import openpyxl
wb = openpyxl.load_workbook('example3.xlsx')
sheet = wb['Sheet1']
sheet.title = 'Spam Spam Spam'
wb.save('example3_copy.xlsx')  # Save the workbook.

Creating and removing sheets

import openpyxl
wb = openpyxl.Workbook()
wb.sheetnames
['Sheet']
wb.create_sheet()  # Add a new sheet (default name Sheet1, etc.).
wb.create_sheet(index=0, title='First Sheet')
wb.create_sheet(index=2, title='Middle Sheet')
del wb['Middle Sheet']
del wb['Sheet1']

Call save() after adding or removing sheets.

Writing values to cells

Assignment mirrors dict-style access:

import openpyxl
wb = openpyxl.Workbook()
sheet = wb['Sheet']
sheet['A1'] = 'Hello, world!'
sheet['A1'].value
'Hello, world!'

Project 10: Update a spreadsheet

Data: produceSales3.xlsx—columns: produce (A), cost per pound (B), pounds sold (C), total (D) with formulas like =ROUND(B2*C2, 2).

Task: Fix prices only for Celery, Garlic, and Lemon without a naive find-replace on numbers.

Prices: Celery 1.19, Garlic 3.07, Lemon 1.27.

python
# updateProduce.py - Corrects costs in produce sales spreadsheet

import openpyxl

wb = openpyxl.load_workbook('produceSales3.xlsx')
sheet = wb['Sheet']

PRICE_UPDATES = {'Garlic': 3.07,
                 'Celery': 1.19,
                 'Lemon': 1.27}

for row_num in range(2, sheet.max_row + 1):  # Skip the header row.
    produce_name = sheet.cell(row=row_num, column=1).value
    if produce_name in PRICE_UPDATES:
        sheet.cell(row=row_num, column=2).value = PRICE_UPDATES[produce_name]

wb.save('updatedProduceSales3.xlsx')

Save to a new file first; delete the old one only after verifying the output.

Ideas for similar programs (writing)

  • Copy data between spreadsheets or from the web, files, or clipboard into Excel.
  • “Clean” data (e.g. normalize phone numbers with regular expressions).

Setting the font style of cells

python
from openpyxl.styles import Font
import openpyxl
from openpyxl.styles import Font
wb = openpyxl.Workbook()
sheet = wb['Sheet']
italic_24_font = Font(size=24, italic=True)
sheet['A1'].font = italic_24_font
sheet['A1'] = 'Hello, world!'
wb.save('styles3.xlsx')

Assign a Font object to cell.font. Set cell.value after or before styling as needed; the example sets font then value.

Table 14-2: Keyword arguments for Font objects

Keyword argument Data type Description
name String Font name, e.g. 'Calibri', 'Times New Roman'
size Integer Point size
bold Boolean Bold text
italic Boolean Italic text
bold_font = Font(name='Times New Roman', bold=True)
sheet['A1'].font = bold_font
sheet['A1'] = 'Bold Times New Roman'
italic_font = Font(size=24, italic=True)
sheet['B3'].font = italic_font
sheet['B3'] = '24 pt Italic'

Formulas

Assign a formula like any string value (starts with =):

sheet['B9'] = '=SUM(B1:B8)'
import openpyxl
wb = openpyxl.Workbook()
sheet = wb['Sheet']
sheet['A1'] = 200
sheet['A2'] = 300
sheet['A3'] = '=SUM(A1:A2)'
wb.save('writeFormula3.xlsx')

openpyxl does not evaluate formulas. Excel evaluates them when you open and save the file. To read computed values, open with data_only=True (after Excel has saved the cached results):

import openpyxl
wb = openpyxl.load_workbook('writeFormula3.xlsx')
wb.active['A3'].value
'=SUM(A1:A2)'
wb = openpyxl.load_workbook('writeFormula3.xlsx', data_only=True)
wb.active['A3'].value
500

If 500 does not appear, open the file in Excel and save so results are stored in the workbook.

Adjusting rows and columns

Setting row height and column width

import openpyxl
wb = openpyxl.Workbook()
sheet = wb['Sheet']
sheet['A1'] = 'Tall row'
sheet['B2'] = 'Wide column'
sheet.row_dimensions[1].height = 70
sheet.column_dimensions['B'].width = 20
wb.save('dimensions3.xlsx')

Use sheet.row_dimensions[row_number] and sheet.column_dimensions['LETTER']. Default sizes depend on Excel/openpyxl version.

Hiding rows and columns

The chapter notes you can hide rows and columns from view (in addition to freezing panes). In openpyxl, set the hidden attribute on the corresponding RowDimension or ColumnDimension object—for example, sheet.column_dimensions['C'].hidden = True to hide column C, or sheet.row_dimensions[5].hidden = True for a row.

Merging and unmerging cells

sheet.merge_cells('A1:D3')
sheet['A1'] = 'Twelve cells merged together.'
sheet.merge_cells('C5:D5')
sheet['C5'] = 'Two merged cells.'

Set the value on the top-left cell of the merged range. sheet.unmerge_cells('A1:D3') splits merged regions.

Freezing panes

sheet.freeze_panes can be a coordinate string or None. Freezing 'A2' keeps row 1 visible; 'B1' freezes column A; use 'A1' or None to clear. See Table 14-3.

Table 14-3: Frozen pane examples

freeze_panes setting Rows and columns frozen
sheet.freeze_panes = 'A2' Row 1 (no columns)
sheet.freeze_panes = 'B1' Column A (no rows)
sheet.freeze_panes = 'C1' Columns A and B
sheet.freeze_panes = 'C2' Row 1 and columns A–B
sheet.freeze_panes = 'A1' or None None
sheet.freeze_panes = 'A2'
wb.save('freezeExample3.xlsx')

Charts

  1. Create a Reference to a rectangular cell range.
  2. Create a Series from that reference.
  3. Create a Chart (e.g. BarChart).
  4. Append the series to the chart.
  5. Add the chart to the worksheet (optional top-left cell).

openpyxl.chart.Reference(worksheet, min_col, min_row, max_col, max_row) selects a rectangle of cells (1-based column and row numbers). The book’s example Reference(sheet, 1, 1, 1, 10) uses column A from row 1 through 10.

import openpyxl
wb = openpyxl.Workbook()
sheet = wb.active
for i in range(1, 11):
...     sheet['A' + str(i)] = i * i
...
ref_obj = openpyxl.chart.Reference(sheet, 1, 1, 1, 10)
series_obj = openpyxl.chart.Series(ref_obj, title='First series')
chart_obj = openpyxl.chart.BarChart()
chart_obj.title = 'My Chart'
chart_obj.append(series_obj)
sheet.add_chart(chart_obj, 'C5')
wb.save('sampleChart3.xlsx')

Other chart types include LineChart, ScatterChart, and PieChart in openpyxl.chart.

Summary

Getting data into the right shape is often harder than processing it. After loading a workbook with openpyxl, you can extract and transform data far faster than by hand.

You can also generate spreadsheets from programs—avoiding manual copy-paste from text, PDFs, or databases. The next chapter covers Google Sheets.

Practice Questions

Practice Questions

Assume a Workbook in wb, a Worksheet in sheet where noted.

  1. What does openpyxl.load_workbook() return?
  2. What does wb.sheetnames contain?
  3. How do you get the Worksheet named 'Sheet1'?
  4. How do you get the Worksheet for the workbook’s active sheet?
  5. How do you read the value in cell C5?
  6. How do you set cell C5 to "Hello"?
  7. How do you get a cell’s row and column as integers?
  8. What do sheet.max_column and sheet.max_row hold, and what is their type?
  9. Which function gives the integer index for column 'M'?
  10. If you needed to get the string name for row 14, what function would you need to call?
  11. How do you get Cell objects for the range A1 through F1?
  12. How do you save the workbook as example3.xlsx?
  13. How do you put a formula in a cell?
  14. To read a formula’s result instead of the formula string, what must you do first (and how do you open the workbook)?
  15. How do you set row 5’s height to 100?
  16. How do you hide column C?
  17. What is a freeze pane?
  18. What five calls create a bar chart (objects/methods)?

Practice programs

Multiplication table maker. Write multiplicationTable.py so it takes N from the command line and builds an N×N multiplication table in Excel. Row 1 and column A should hold labels and be bold (see Figure 14-10 in the book).

Blank row inserter. Write blankRowInserter.py with arguments N, M, and a filename. Starting at row N, insert M blank rows (e.g. python blankRowInserter.py 3 2 myProduce.xlsx). Hint: read the sheet, copy rows 1 … N−1 as-is, then write remaining rows shifted down by M (Figure 14-11).

Check your understanding

Tier 2 depth · Applied coding

0 / 4 correct
  1. Which module lets Python read and modify Excel .xlsx files?

  2. A workbook is open in desktop Excel and you run a script that edits the file. What does the chapter warn?

  3. How do you read the value of a single cell with openpyxl, given a worksheet `sheet`?

  4. What is a key advantage of automating spreadsheets with openpyxl?

Go deeper

More in Additional Resources →
← Web Scraping Google Sheets →