Analyzing a Codebase Over Time
At Logikcull, we’ve been in the midst of a react migration for over 4 years now (soon after I joined in 2019). I wanted to share a cool idea I had to visualize the progress over time.
It’s great if companies store metrics in chart tools like grafana or datadog which can easily be referenced. We didn’t do anything like this for the progress of the react vs backbone migration. But in this case, we could just leverage git to check out the codebase at different points in time!
I wanted a way to count the lines of code of legacy (backbone.js) code, versus react. These are conveniently stored in 2 separate directories, which makes this an easy comparison. The open source library cloc
helped here! As the name implies, it specializes at counting lines of code 🙂.
Then, the idea is to checkout the git repository at different points in time, and run cloc
at each interval. For my comparison, I wanted to look at 4 years of time, with weekly check-ins (208 weeks total). You write this in whatever language you ask ChatGPT for you feel most comfortable in - here’s what a Node script I wrote looks like:
const { execSync } = require("child_process");
const fs = require("fs");
const filePath = "./local/data.csv";
const mainBranchName = "master"; // or main depending on your repo
// expects very specific structure from CLOC
function getSumOfLinesOfCode(output) {
return output.split("\n").filter(Boolean).at(-1).split(",").at(-1);
}
// formats in day/month/year, takes in javascript date objecct
function formatDate(date) {
const month = date.getUTCMonth() + 1;
const day = date.getUTCDate();
const year = date.getUTCFullYear();
return [month, day, year].join("/");
}
(async function () {
const weekInMs = 1000 * 60 * 60 * 24 * 7;
const numWeeks = 212; // num weeks to go back in time and analyze
const now = Date.now();
// checkout master to make sure
await execSync(`git checkout ${mainBranchName}`);
await fs.writeFileSync(filePath, "");
for (let i = numWeeks; i >= 0; i--) {
const date = new Date(now - i * weekInMs);
console.log(date.toTimeString());
await execSync(
`git checkout \`git rev-list -1 --before="${i} weeks ago" ${mainBranchName}\``
);
const output1 = await execSync(
"cloc app/javascript/legacy --include-lang=Typescript,Javascript,EJS --csv"
);
const output2 = await execSync(
"cloc app/javascript/src --include-lang=Typescript,Javascript --exclude-dir=generated --csv"
);
const numLinesLegacy = getSumOfLinesOfCode(output1.toString());
const numLinesSrc = getSumOfLinesOfCode(output2.toString());
fs.appendFileSync(
filePath,
`${formatDate(date)},${numLinesLegacy},${numLinesSrc}\n`
);
console.log({ numLinesLegacy, numLinesSrc });
}
await execSync(`git checkout ${mainBranchName}`);
})();
The script in summary does the following:
- Checks out repo at different weekly intervals in the past, using the
--before
git flag - run
cloc
on codebase and output relevant results to a csv (we needed to exclude certain directories of auto-generated code which could skew the results)
We can then visualize the progress by creating a chart from the csv!
This helps us keep track of our projected rate of migrating off of backbone (you can see it’s slowing down a bit in the past 6 months..) This could be applied to a bunch of different things to retroactively analyze a codebase! Another idea I had was to analyze our javascript bundle size over time, since we don’t store historical results of that.