Version control lets you track and manage changes to your code in a more sophisticated way than you’re used to.
Code changes a lot, so keeping history is critical, especially when multiple people are working on a project together
Version control tracks changes to a codebase and regulates how people can make updates
Git is the standard protocol for version control and almost every developer uses it
Github provides a managed hosting service for your code repositories
Your company’s applications and internal tools all use version control, and without it, nobody would be able to get any work done. So read this.
The easiest way to understand version control is to look at how you version control your stuff. Let’s imagine you’re building a Very Important Presentation for your Managing Director. What’s your workflow?
Make change to a slide (Move Very Important Box 1px To The Left)
Save changes
Make other changes
Save those changes
What happens if you want to go back to a previous version, because you made a bad change, or your MD told you to scrap all of the stuff he just told you to do? You have a couple of pretty weak options:
Undo (Command Z) – this only works for relatively recent stuff, you have to do it one by one, you lose everything that you’ve built since then, etc.
Save files in versions (presentation_final_1, presentation_final_2) – takes a lot of time, unclear progress, and very inefficient
Version control exists to solve this exact problem. Instead of just saving and losing all of your previous progress, version control makes you commit your changes as new versions while still keeping the old ones: and you can go back to any commit at any point in time. It’s basically as if you saved every group of changes as a new file separately (like option #2 above), but in a much simpler, more efficient way.
Version control is really popular in software engineering, but not in business contexts: that’s because all of those problems we talked about are way worse when you’re developing an application. Modern applications can have tens of thousands of lines of code (often way more), all dependent on each other, and being worked on by multiple people. Imagine building a presentation with 10,000 slides and 30 other analysts. Actually, don’t. I care about you.
This is all a little abstract: let’s dive into Git, the most popular actual version control software, and see how it works in practice.
Git is the piece of software that developers actually use to version control their code. It was originally released in 2005 by Linus Torvalds (the same dude who built Linux), but has basically become the default since then, and comes pre-installed on most operating systems.
Git is software, but it’s not like Excel: it’s command line software, meaning there’s nothing for you to click on or drag. Git runs through commands in your Terminal. There are companies that provide a GUI (graphical user interface) for interacting with Git, but they’re not the standard.
The three major pillars of git are repositories, branches, and commits.
A repository (repo) is like a project: it’s just a place to put all of the code relevant to an application. You can store folders and files inside of it. Each different repo has a different Git setup, so you can’t interact between them.
Sometimes, you’ll want to set up a remote (i.e. not on your laptop) repo so multiple people can work with it: we’ll cover this more when we talk about Github later.
A branch is a specific version of your repository. The master branch is the main one that you’re working off of: if it’s just you working on a project, you might make all of your changes to that master branch.
Sometimes though, if you want to build an entire new feature or something big, you can fork the master branch – make a copy of it – and save your changes onto that new branch. Then, when you’re done, you can merge that new branch into the master one.
When you make and save changes to your code, you can batch any number of them into a commit and include a little message (these should be funny, ideally).
Once you make a commit, you can push that commit (or even a group of commits) to your branch, which makes that commit the most recent version. You can also pull to get the branch’s most recent version (if someone else on your team made changes).
Probably the most powerful part of commits are the ability to revert them: you can go back to any commit on any branch at any point in time. Try doing that in Powerpoint.
Git is one of those things that’s really hard to explain without giving some technical detail, and that detail can get kind of boring. It’s a lot to remember, so don’t sweat it: just know that if you hear any of these terms, your engineers are probably talking about version control.
There’s a lot of other stuff you can do in Git, but these are the basics.
How does Github fit into this? Github provides a hosted repository service (among other things) that lets you put your code in the cloud, which makes collaboration with other team members much, much easier. This adds a bit of complexity to how Git works, because now there’s an extra concept to deal with: the remote branch.
Remote here means not on your laptop (i.e. in the cloud), not to be confused with a TV remote. When your repository is hosted in the cloud, your working paradigm changes: you need to download the repository to your computer, make changes there, and then upload your changes back to the cloud.
If you wanted to do this yourself, you’d need to set up your own server, network it to your team’s computers, and deal with keeping it going. Github does all of that stuff for you, which is clutch. Github also does a lot of other useful shit: one example is that you can create profiles (here’s mine) and share your code with the world.
Now if you want to be really cool with the engineers, you’ll want to know about a pull request. A pull request (PR for short) is a Github feature that you’d use when you want to merge your branch: it lets you tell other users what you’re doing, request them to review it, and some other stuff.
Keep in mind that Github isn’t the only company that offers a service like this (remotes); it’s just the most popular. Gitlab and Bitbucket are two other fairly successful ones.
“Just commit your changes and make a pull request”
Save your code changes, make a commit and an associated message, and then open a pull request so the rest of the team can see and approve / reject those changes.
“We had to revert back to the last commit because of a breaking change”
The last change we made to our code ended up breaking something, so we went back to the previous version.
“Check out the repo on Github”
If you want to see the code, check out the repository it’s hosted in on Github. This statement will often be accompanied by a link.
“You can see some more projects on my Github”
I’ve put some of my code on Github and made it public so anyone can see it.
If you read this and thought “gee, Git seems complicated,” you’re right: it’s a huge meme in developer world that Git is difficult to use and understand, especially for beginners
Git is built to version code, but isn’t very good at versioning data: open source alternatives that focus on Data Science like DVC have been appearing
Git money, Git paid: Github has more than 35M users, 100M repositories, and was acquired by Microsoft in 2018 for $7.5B in 2018