Chapter 14 – Excel Spreadsheets
Source: Automate the Boring Stuff with Python, 3rd Edition — Al Sweigart (No Starch Press). This Markdown version was derived from the chapter PDF.
About this chapter
Although spreadsheets are not always thought of as programming tools, almost everyone uses them to organize information into two-dimensional data structures, perform calculations with formulas, and produce output as charts. This chapter and the next integrate Python with two popular spreadsheet applications: Microsoft Excel and Google Sheets.
The openpyxl module lets Python programs read and modify Excel .xlsx files—for example, copying data between workbooks, filtering or editing thousands of rows by criteria, or scanning many budget files for problems. Those are the kinds of repetitive spreadsheet tasks Python handles well.
Excel is proprietary; LibreOffice Calc (libreoffice.org) is a free alternative that uses the same .xlsx format, so openpyxl works with its files too. The book’s screenshots use cloud-based Office 365 Excel.
Important: openpyxl works on files, not on the Excel desktop app or the Excel web app while a file is only online. With Office 365, download a copy (File → Save As → Download a Copy), run your script, then re-upload. With desktop Excel, close the workbook before running the script, then reopen it to see changes.
Install openpyxl with pip (see Appendix A in the book). Documentation: openpyxl.readthedocs.io. Example spreadsheets are available from the book’s page at nostarch.com/automate-boring-stuff-python-3rd-edition.
Reading Excel Files
Workbooks, sheets, and cells
- A spreadsheet file is a workbook (
.xlsx). - A workbook can contain multiple sheets (worksheets).
- The active sheet is the one the user is viewing (or last viewed before closing).
- Columns are labeled A, B, C, …; rows are numbered 1, 2, 3, …
- A cell is the box at one column/row; it can hold a number or text.
Examples use example3.xlsx in the current working directory (create it or download from the book’s resources). Sheet1 should match the following.
Table 14-1: The example3.xlsx spreadsheet (Sheet1)
| A | B | C | |
|---|---|---|---|
| 1 | 4/5/2035 1:34:02 PM | Apples | 73 |
| 2 | 4/5/2035 3:41:23 AM | Cherries | 85 |
| 3 | 4/6/2035 12:46:51 PM | Pears | 14 |
| 4 | 4/8/2035 8:59:43 AM | Oranges | 52 |
| 5 | 4/10/2035 2:07:00 AM | Apples | 152 |
| 6 | 4/10/2035 6:10:37 PM | Bananas | 23 |
| 7 | 4/10/2035 2:40:46 AM | Strawberries | 98 |
Opening a workbook
import openpyxlwb = openpyxl.load_workbook('example3.xlsx')type(wb)openpyxl.load_workbook() returns a Workbook object representing the file (similar in role to a file object for text files).
Getting sheets from the workbook
import openpyxlwb = openpyxl.load_workbook('example3.xlsx')wb.sheetnames # The workbook's sheets' names['Sheet1', 'Sheet2', 'Sheet3']sheet = wb['Sheet3'] # Get a sheet from the workbook.sheettype(sheet)sheet.title # Get the sheet's title as a string.'Sheet3'another_sheet = wb.active # Get the active sheet.another_sheetAccess a sheet with bracket notation and the sheet name string. wb.active is the active sheet. Use sheet.title for the name string.
Getting cells from the sheets
import openpyxlwb = openpyxl.load_workbook('example3.xlsx')sheet = wb['Sheet1'] # Get a sheet from the workbook.sheet['A1'] # Get a cell from the sheet. | sheet['A1'].value # Get the value from the cell.datetime.datetime(2035, 4, 5, 13, 34, 2)c = sheet['B1'] # Get another cell from the sheet.c.value'Apples'f'Row {c.row}, Column {c.column} is {c.value}''Row 1, Column 2 is Apples'f'Cell {c.coordinate} is {c.value}''Cell B1 is Apples'sheet['C1'].value73A Cell has value, row, column, and coordinate (e.g. 'B1'). Dates in column A are read as datetime values (see Chapter 19).
Use sheet.cell(row=…, column=…) with 1-based row and column indices when letters are awkward (after Z come AA, AB, …).
sheet.cell(row=1, column=2) | sheet.cell(row=1, column=2).value'Apples'for i in range(1, 8, 2): # Go through every other row.... print(i, sheet.cell(row=i, column=2).value)
...
1 Apples
3 Pears
5 Apples
7 Strawberriessheet.max_row and sheet.max_column are integers (the highest used row/column numbers).
Converting between column letters and numbers
import openpyxlfrom openpyxl.utils import get_column_letter, column_index_from_stringget_column_letter(1) # Translate column 1 to a letter.'A'get_column_letter(2)'B'get_column_letter(27)'AA'get_column_letter(900)'AHP'wb = openpyxl.load_workbook('example3.xlsx')sheet = wb['Sheet1']get_column_letter(sheet.max_column)'C'column_index_from_string('A') # Get A's number.1column_index_from_string('AA')27You do not need a loaded workbook to use these helpers.
Getting rows and columns (slices and rows / columns)
Slicing a worksheet returns nested tuples of Cell objects (rows from top to bottom, cells left to right within each row):
import openpyxlwb = openpyxl.load_workbook('example3.xlsx')sheet = wb['Sheet1']sheet['A1':'C3'] # Tuple of row tuples of Cell objects for A1 through C3.for row_of_cell_objects in sheet['A1':'C3']:... for cell_obj in row_of_cell_objects:
... print(cell_obj.coordinate, cell_obj.value)
... print('--- END OF ROW ---')
...
A1 2035-04-05 13:34:02
B1 Apples
C1 73
--- END OF ROW ---
A2 2035-04-05 03:41:23
B2 Cherries
C2 85
--- END OF ROW ---
A3 2035-04-06 12:46:51
B3 Pears
C3 14
--- END OF ROW ---sheet.rows and sheet.columns are generators; wrap with list() before indexing. list(sheet.columns)[1] is column B (0-based index). list(sheet.rows) gives one tuple per row, each tuple holding that row’s cells.
A review of workbooks, sheets, and cells
- Import openpyxl.
- Call
openpyxl.load_workbook()to get a Workbook. - Use
sheetnamesoractiveto choose a sheet. - Get a Worksheet (e.g.
wb['Sheet1']). - Read cells with
sheet['A1'],sheet.cell(row=…, column=…), etc. - Use the Cell object’s
value(and optionallyrow,column,coordinate).
Project 9: Gather census statistics
Goal: From censuspopdata.xlsx (2010 US Census), count census tracts and total population per county. Each row is one tract; columns are tract number (A), state (B), county (C), population (D). Sheet name: 'Population by Census Tract'.
Program outline:
- Open the workbook with openpyxl.
- Aggregate tract/population into a data structure.
- Write results to a
.pyfile usingpprint.pformat()so the data can be **import**ed later.
Data structure: Nested dicts:
- Keys: state abbreviations → county names →
{'tracts': n, 'pop': n}.
Example:
{'AK': {'Anchorage': {'pop': 291826, 'tracts': 55}, ...}, ...}Reading rows (row 1 is headers; data starts at row 2):
# readCensusExcel.py - Tabulates county population and census tracts
import openpyxl, pprint
print('Opening workbook...')
wb = openpyxl.load_workbook('censuspopdata.xlsx')
sheet = wb['Population by Census Tract']
county_data = {}
print('Reading rows...')
for row in range(2, sheet.max_row + 1):
state = sheet['B' + str(row)].value
county = sheet['C' + str(row)].value
pop = sheet['D' + str(row)].value
county_data.setdefault(state, {})
county_data[state].setdefault(county, {'tracts': 0, 'pop': 0})
county_data[state][county]['tracts'] += 1
county_data[state][county]['pop'] += int(pop)
print('Writing results...')
result_file = open('census2010.py', 'w')
result_file.write('allData = ' + pprint.pformat(county_data))
result_file.close()
print('Done.')Using the generated module:
import census2010census2010.allData['AK']['Anchorage']{'pop': 291826, 'tracts': 55}After census2010.py exists, you may not need to run readCensusExcel.py again.
Ideas for similar programs (reading)
- Compare data across rows or between multiple workbooks.
- Validate spreadsheets (blank rows, bad values) and alert the user.
- Use spreadsheet data as input to other Python programs.
Writing Excel documents
Creating and saving workbooks
import openpyxlwb = openpyxl.Workbook() # Create a blank workbook.wb.sheetnames # The workbook starts with one sheet.['Sheet']sheet = wb.activesheet.title'Sheet'sheet.title = 'Spam Bacon Eggs Sheet' # Change the title.wb.sheetnames['Spam Bacon Eggs Sheet']Persist changes with wb.save('filename.xlsx'). When editing a file you loaded, save under a new filename so the original stays safe. save() fails if the file is open in Excel—close it first.
import openpyxlwb = openpyxl.load_workbook('example3.xlsx')sheet = wb['Sheet1']sheet.title = 'Spam Spam Spam'wb.save('example3_copy.xlsx') # Save the workbook.Creating and removing sheets
import openpyxlwb = openpyxl.Workbook()wb.sheetnames['Sheet']wb.create_sheet() # Add a new sheet (default name Sheet1, etc.).wb.create_sheet(index=0, title='First Sheet')wb.create_sheet(index=2, title='Middle Sheet')del wb['Middle Sheet']del wb['Sheet1']Call save() after adding or removing sheets.
Writing values to cells
Assignment mirrors dict-style access:
import openpyxlwb = openpyxl.Workbook()sheet = wb['Sheet']sheet['A1'] = 'Hello, world!'sheet['A1'].value'Hello, world!'Project 10: Update a spreadsheet
Data: produceSales3.xlsx—columns: produce (A), cost per pound (B), pounds sold (C), total (D) with formulas like =ROUND(B2*C2, 2).
Task: Fix prices only for Celery, Garlic, and Lemon without a naive find-replace on numbers.
Prices: Celery 1.19, Garlic 3.07, Lemon 1.27.
# updateProduce.py - Corrects costs in produce sales spreadsheet
import openpyxl
wb = openpyxl.load_workbook('produceSales3.xlsx')
sheet = wb['Sheet']
PRICE_UPDATES = {'Garlic': 3.07,
'Celery': 1.19,
'Lemon': 1.27}
for row_num in range(2, sheet.max_row + 1): # Skip the header row.
produce_name = sheet.cell(row=row_num, column=1).value
if produce_name in PRICE_UPDATES:
sheet.cell(row=row_num, column=2).value = PRICE_UPDATES[produce_name]
wb.save('updatedProduceSales3.xlsx')Save to a new file first; delete the old one only after verifying the output.
Ideas for similar programs (writing)
- Copy data between spreadsheets or from the web, files, or clipboard into Excel.
- “Clean” data (e.g. normalize phone numbers with regular expressions).
Setting the font style of cells
from openpyxl.styles import Fontimport openpyxlfrom openpyxl.styles import Fontwb = openpyxl.Workbook()sheet = wb['Sheet']italic_24_font = Font(size=24, italic=True)sheet['A1'].font = italic_24_fontsheet['A1'] = 'Hello, world!'wb.save('styles3.xlsx')Assign a Font object to cell.font. Set cell.value after or before styling as needed; the example sets font then value.
Table 14-2: Keyword arguments for Font objects
| Keyword argument | Data type | Description |
|---|---|---|
name |
String | Font name, e.g. 'Calibri', 'Times New Roman' |
size |
Integer | Point size |
bold |
Boolean | Bold text |
italic |
Boolean | Italic text |
bold_font = Font(name='Times New Roman', bold=True)sheet['A1'].font = bold_fontsheet['A1'] = 'Bold Times New Roman'italic_font = Font(size=24, italic=True)sheet['B3'].font = italic_fontsheet['B3'] = '24 pt Italic'Formulas
Assign a formula like any string value (starts with =):
sheet['B9'] = '=SUM(B1:B8)'import openpyxlwb = openpyxl.Workbook()sheet = wb['Sheet']sheet['A1'] = 200sheet['A2'] = 300sheet['A3'] = '=SUM(A1:A2)'wb.save('writeFormula3.xlsx')openpyxl does not evaluate formulas. Excel evaluates them when you open and save the file. To read computed values, open with data_only=True (after Excel has saved the cached results):
import openpyxlwb = openpyxl.load_workbook('writeFormula3.xlsx')wb.active['A3'].value'=SUM(A1:A2)'wb = openpyxl.load_workbook('writeFormula3.xlsx', data_only=True)wb.active['A3'].value500If 500 does not appear, open the file in Excel and save so results are stored in the workbook.
Adjusting rows and columns
Setting row height and column width
import openpyxlwb = openpyxl.Workbook()sheet = wb['Sheet']sheet['A1'] = 'Tall row'sheet['B2'] = 'Wide column'sheet.row_dimensions[1].height = 70sheet.column_dimensions['B'].width = 20wb.save('dimensions3.xlsx')Use sheet.row_dimensions[row_number] and sheet.column_dimensions['LETTER']. Default sizes depend on Excel/openpyxl version.
Hiding rows and columns
The chapter notes you can hide rows and columns from view (in addition to freezing panes). In openpyxl, set the hidden attribute on the corresponding RowDimension or ColumnDimension object—for example, sheet.column_dimensions['C'].hidden = True to hide column C, or sheet.row_dimensions[5].hidden = True for a row.
Merging and unmerging cells
sheet.merge_cells('A1:D3')sheet['A1'] = 'Twelve cells merged together.'sheet.merge_cells('C5:D5')sheet['C5'] = 'Two merged cells.'Set the value on the top-left cell of the merged range. sheet.unmerge_cells('A1:D3') splits merged regions.
Freezing panes
sheet.freeze_panes can be a coordinate string or None. Freezing 'A2' keeps row 1 visible; 'B1' freezes column A; use 'A1' or None to clear. See Table 14-3.
Table 14-3: Frozen pane examples
freeze_panes setting |
Rows and columns frozen |
|---|---|
sheet.freeze_panes = 'A2' |
Row 1 (no columns) |
sheet.freeze_panes = 'B1' |
Column A (no rows) |
sheet.freeze_panes = 'C1' |
Columns A and B |
sheet.freeze_panes = 'C2' |
Row 1 and columns A–B |
sheet.freeze_panes = 'A1' or None |
None |
sheet.freeze_panes = 'A2'wb.save('freezeExample3.xlsx')Charts
- Create a Reference to a rectangular cell range.
- Create a Series from that reference.
- Create a Chart (e.g. BarChart).
- Append the series to the chart.
- Add the chart to the worksheet (optional top-left cell).
openpyxl.chart.Reference(worksheet, min_col, min_row, max_col, max_row) selects a rectangle of cells (1-based column and row numbers). The book’s example Reference(sheet, 1, 1, 1, 10) uses column A from row 1 through 10.
import openpyxlwb = openpyxl.Workbook()sheet = wb.activefor i in range(1, 11):... sheet['A' + str(i)] = i * i
...ref_obj = openpyxl.chart.Reference(sheet, 1, 1, 1, 10)series_obj = openpyxl.chart.Series(ref_obj, title='First series')chart_obj = openpyxl.chart.BarChart()chart_obj.title = 'My Chart'chart_obj.append(series_obj)sheet.add_chart(chart_obj, 'C5')wb.save('sampleChart3.xlsx')Other chart types include LineChart, ScatterChart, and PieChart in openpyxl.chart.
Summary
Getting data into the right shape is often harder than processing it. After loading a workbook with openpyxl, you can extract and transform data far faster than by hand.
You can also generate spreadsheets from programs—avoiding manual copy-paste from text, PDFs, or databases. The next chapter covers Google Sheets.
Practice Questions
Practice Questions
Assume a Workbook in wb, a Worksheet in sheet where noted.
- What does
openpyxl.load_workbook()return? - What does
wb.sheetnamescontain? - How do you get the Worksheet named
'Sheet1'? - How do you get the Worksheet for the workbook’s active sheet?
- How do you read the value in cell C5?
- How do you set cell C5 to
"Hello"? - How do you get a cell’s row and column as integers?
- What do
sheet.max_columnandsheet.max_rowhold, and what is their type? - Which function gives the integer index for column
'M'? - If you needed to get the string name for row 14, what function would you need to call?
- How do you get Cell objects for the range A1 through F1?
- How do you save the workbook as example3.xlsx?
- How do you put a formula in a cell?
- To read a formula’s result instead of the formula string, what must you do first (and how do you open the workbook)?
- How do you set row 5’s height to 100?
- How do you hide column C?
- What is a freeze pane?
- What five calls create a bar chart (objects/methods)?
Practice programs
Multiplication table maker. Write multiplicationTable.py so it takes N from the command line and builds an N×N multiplication table in Excel. Row 1 and column A should hold labels and be bold (see Figure 14-10 in the book).
Blank row inserter. Write blankRowInserter.py with arguments N, M, and a filename. Starting at row N, insert M blank rows (e.g. python blankRowInserter.py 3 2 myProduce.xlsx). Hint: read the sheet, copy rows 1 … N−1 as-is, then write remaining rows shifted down by M (Figure 14-11).