Engineering Toolkit Tier 2: Data Science Practitioner Module 3: Programming Week 3–4 ⏱ 4 min read

What you'll be able to do

  • Explain what version control solves and the snapshot model
  • Use the everyday Git commands (add, commit, branch, merge, push, pull)
  • Work on branches and resolve a basic merge conflict
Competencies you'll build
  • Stage and commit changes with clear messages
  • Create and merge a branch
  • Keep secrets and large files out of a repo with .gitignore

Key terms in this chapter

Chapter - Git and Version Control

Supplementary chapter prepared for the BWXT Data Science Workforce Training Pilot. This material is original to the program.

About this chapter

Version control is how teams track changes to code over time, work on the same project without overwriting each other, and recover when something breaks. The maturity model lists Git as a core Tier 2 capability that deepens through Tiers 3 and 4 — every data scientist on the team is expected to be comfortable with it.

Git is the version-control tool the industry uses. This chapter covers the mental model and the everyday commands.

Why version control

Imagine a folder full of files named analysis_final.py, analysis_final_v2.py, analysis_final_REALLY_final.py. That is version control done badly. Git replaces it with one folder and a complete, labeled history you can move through. It lets you:

  • See exactly what changed, when, and who changed it.
  • Return to any earlier working state.
  • Work on a new idea without disturbing the code that already works.
  • Combine your work with a teammate's safely.

The mental model: snapshots

A Git repository (repo) is a project folder plus a hidden history. You work, then take a labeled snapshot called a commit. Each commit points to the one before it, forming a timeline. Three places matter:

  • Working directory — your actual files as they are right now.
  • Staging area — the changes you have marked to include in the next commit (git add).
  • Repository — the committed history (git commit).
text
edit files   ->   git add   ->   git commit
(working dir)    (staging)       (history)

The everyday commands

text
git init                 # start tracking a new project
git clone <url>          # copy an existing repo (including its history)
git status               # what has changed and what is staged
git add <file>           # stage a file for the next commit
git add .                # stage everything changed
git commit -m "message"  # save a snapshot with a description
git log --oneline        # view the history
git push                 # send your commits to the shared remote
git pull                 # bring teammates' commits into your copy

A good commit message says why, briefly: Fix off-by-one in defect crop beats update.

Branches: work without breaking things

A branch is a movable label on a line of work. You create one to develop a feature or try an experiment, leaving the main branch — the known-good code — untouched. When the work is ready, you merge it back.

main feature branch merge
Each dot is a commit. A feature branch splits off main, gets its own commits, and is merged back when ready — so main always holds working code while you experiment safely.
text
git branch feature-x        # create a branch
git checkout feature-x      # switch to it (or: git switch feature-x)
git merge feature-x         # from main, fold feature-x back in

Remotes, pull requests, and review

A remote is a shared copy of the repo (on GitHub, GitLab, or an internal server). You push your branch to it and open a pull request (PR) — a request to merge your branch, where teammates review the changes before they land. Code review is itself a maturity-model capability; the PR is where it happens.

.gitignore

Some files should never be committed: large datasets, model weights, secrets, virtual environments, __pycache__. List them in a .gitignore file so Git skips them. This keeps the repo small and keeps credentials out of history.

Merge conflicts

When two people change the same lines, Git cannot decide automatically and reports a merge conflict. It marks the spot:

text
<<<<<<< HEAD
your version
=======
their version
>>>>>>> feature-x

You edit the file to the correct combined result, remove the markers, then git add and git commit. Handling conflicts calmly is the Tier 3–4 skill.

Practice Questions

Practice Questions

  1. In your own words, what problem does version control solve?
  2. What is the difference between the working directory, the staging area, and the repository?
  3. What do git add and git commit each do?
  4. Why work on a branch instead of committing directly to main?
  5. What is a remote, and what does git push do?
  6. What belongs in a .gitignore file, and why?
  7. What causes a merge conflict, and how do you resolve one?
  8. Write a clear commit message for fixing a bug that cropped weld images one pixel too small.

Check your understanding

Tier 2 depth · Applied coding

0 / 5 correct
  1. In Git's model, what does `git add` do?

  2. You committed locally and want teammates to see your work on the shared remote. Which command?

  3. Why create a branch to develop a new feature?

  4. Which is the better commit message, and why?

  5. What's the difference between the working directory and the repository in Git?

Go deeper

  • Pro Git (book) open access The complete, free Git reference — from basics to internals.
  • GitHub Skills open access Guided, interactive courses on Git and GitHub workflows.
  • pandas documentation open access The user guide and API for the library behind most data work here.
More in Additional Resources →
SQL for Data Science →