Chapter - Git and Version Control

Supplementary chapter prepared for the BWXT Data Science Workforce Training Pilot. This material is original to the program.

About this chapter

Version control is how teams track changes to code over time, work on the same project without overwriting each other, and recover when something breaks. The maturity model lists Git as a core Tier 2 capability that deepens through Tiers 3 and 4 — every data scientist on the team is expected to be comfortable with it.

Git is the version-control tool the industry uses. This chapter covers the mental model and the everyday commands.

Why version control

Imagine a folder full of files named analysis_final.py, analysis_final_v2.py, analysis_final_REALLY_final.py. That is version control done badly. Git replaces it with one folder and a complete, labeled history you can move through. It lets you:

See exactly what changed, when, and who changed it.
Return to any earlier working state.
Work on a new idea without disturbing the code that already works.
Combine your work with a teammate's safely.

The mental model: snapshots

A Git repository (repo) is a project folder plus a hidden history. You work, then take a labeled snapshot called a commit. Each commit points to the one before it, forming a timeline. Three places matter:

Working directory — your actual files as they are right now.
Staging area — the changes you have marked to include in the next commit (git add).
Repository — the committed history (git commit).

text

edit files   ->   git add   ->   git commit
(working dir)    (staging)       (history)

The everyday commands

text

git init                 # start tracking a new project
git clone <url>          # copy an existing repo (including its history)
git status               # what has changed and what is staged
git add <file>           # stage a file for the next commit
git add .                # stage everything changed
git commit -m "message"  # save a snapshot with a description
git log --oneline        # view the history
git push                 # send your commits to the shared remote
git pull                 # bring teammates' commits into your copy

A good commit message says why, briefly: Fix off-by-one in defect crop beats update.

Branches: work without breaking things

A branch is a movable label on a line of work. You create one to develop a feature or try an experiment, leaving the main branch — the known-good code — untouched. When the work is ready, you merge it back.

Each dot is a commit. A feature branch splits off main, gets its own commits, and is merged back when ready — so main always holds working code while you experiment safely.

text

git branch feature-x        # create a branch
git checkout feature-x      # switch to it (or: git switch feature-x)
git merge feature-x         # from main, fold feature-x back in

Remotes, pull requests, and review

A remote is a shared copy of the repo (on GitHub, GitLab, or an internal server). You push your branch to it and open a pull request (PR) — a request to merge your branch, where teammates review the changes before they land. Code review is itself a maturity-model capability; the PR is where it happens.

.gitignore

Some files should never be committed: large datasets, model weights, secrets, virtual environments, __pycache__. List them in a .gitignore file so Git skips them. This keeps the repo small and keeps credentials out of history.

Merge conflicts

When two people change the same lines, Git cannot decide automatically and reports a merge conflict. It marks the spot:

text

<<<<<<< HEAD
your version
=======
their version
>>>>>>> feature-x

You edit the file to the correct combined result, remove the markers, then git add and git commit. Handling conflicts calmly is the Tier 3–4 skill.

Practice Questions

In your own words, what problem does version control solve?
What is the difference between the working directory, the staging area, and the repository?
What do git add and git commit each do?
Why work on a branch instead of committing directly to main?
What is a remote, and what does git push do?
What belongs in a .gitignore file, and why?
What causes a merge conflict, and how do you resolve one?
Write a clear commit message for fixing a bug that cropped weld images one pixel too small.

What you'll be able to do

Key terms in this chapter

Chapter - Git and Version Control

About this chapter

Why version control

The mental model: snapshots

The everyday commands

Branches: work without breaking things

Remotes, pull requests, and review

.gitignore

Merge conflicts

Practice Questions

Check your understanding

Go deeper