Building a Custom JavaScript Bundler from Scratch
Tools like Webpack, Rollup, and Vite are often treated as black boxes. We feed them modern JavaScript, Sass, and TypeScript, and they magically spit out optimised, browser-compatible bundles. Treating our build tools as incomprehensible magic limits our ability to debug complex issues, optimise build performance, or configure specific architectural patterns. Understanding how a bundler works under the hood—parsing code into Abstract Syntax Trees (ASTs), resolving module dependencies, and generating a cohesive dependency graph—is a rite of passage for senior engineers.
Deep Dive
The Anatomy of a Bundler
At its core, a JavaScript bundler performs three primary tasks:
- Dependency Resolution: Starting from an entry point (like
index.js), it identifies all the files your code imports. - Parsing & Analysis: It reads these files and converts the raw text into a data structure the computer can understand (an AST) to extract dependencies reliably.
- Packing: It combines all these disparate files into a single bundle that can run in the browser, providing a runtime system to handle
requireorimportcalls.
We will tackle these phases sequentially, implementing a custom bundler we'll call "ToyPack". To do this, we will use the babel ecosystem, specifically @babel/parser for reading code, @babel/traverse for inspecting it, and @babel/core for transpiling it.
Phase 1: Understanding the Abstract Syntax Tree (AST)
Before we write code, we must understand the data structure that powers almost all code transformation tools (including Prettier, ESLint, and TypeScript): the Abstract Syntax Tree.
When you read code, you see strings of text. When a computer reads code, it needs a tree structure that represents the syntactic relationship between elements.
Consider this simple import statement:
import { add } from './math.js';
If we tried to parse this using Regular Expressions (Regex), we would eventually fail. What if the import is commented out? What if it's inside a string? What if it's dynamic? An AST avoids these pitfalls by breaking the code down into Nodes.
Using a tool like AST Explorer, we can see that the line above translates roughly to this JSON structure:
{
"type": "ImportDeclaration",
"source": {
"type": "StringLiteral",
"value": "./math.js"
},
"specifiers": [
{
"type": "ImportSpecifier",
"imported": { "type": "Identifier", "name": "add" },
"local": { "type": "Identifier", "name": "add" }
}
]
}
Why this matters: Our bundler needs to find every file your project depends on. Instead of "Ctrl+F" searching for the word import, we will parse the code into an AST, walk through the tree, and look specifically for nodes of type ImportDeclaration. This is robust and accurate.
Phase 2: The Asset Creator
The first step in our implementation is to create a function that takes the path to a file, reads it, and extracts its dependencies. We call this unit of information an Asset.
We will use:
fs: To read files from the file system.@babel/parser: To generate the AST.@babel/traverse: To walk the AST and collect dependencies.@babel/core: To transpile the code from ES Modules (which browsers support but are tricky to bundle simply) to CommonJS (which is easier for our custom shim to handle).
Implementation: createAsset
const fs = require('fs');
const path = require('path');
const parser = require('@babel/parser');
const traverse = require('@babel/traverse').default;
const { transformFromAst } = require('@babel/core');
let ID = 0;
function createAsset(filename) {
// 1. Read the content of the file as a string
const content = fs.readFileSync(filename, 'utf-8');
// 2. Parse the content into an AST
// We specify 'sourceType: module' because we are using ES Modules (import/export)
const ast = parser.parse(content, {
sourceType: 'module',
});
// 3. Array to hold the relative paths of this file's dependencies
const dependencies = [];
// 4. Traverse the AST to find import declarations
traverse(ast, {
ImportDeclaration: ({ node }) => {
// The value of the source is the path being imported (e.g., './math.js')
dependencies.push(node.source.value);
},
});
// 5. Transpile the code to CommonJS so it runs in standard environments
// This converts 'import x from y' into 'require(y)'
const { code } = transformFromAst(ast, null, {
presets: ['@babel/preset-env'],
});
// 6. Return the Asset object
return {
id: ID++,
filename,
dependencies,
code,
};
}
Key Concept - The Unique ID: Notice let ID = 0. Every file in our bundle gets a unique integer ID. This helps us separate the concept of the file (ID 1) from its filepath (./utils/math.js), which simplifies the runtime logic later.
Phase 3: Building the Dependency Graph
Now that we can analyse a single file, we need to stitch them together. We start at the Entry Point (usually src/index.js) and recursively find all dependencies.
However, recursion can be tricky with deep trees (stack overflow risks) and circular dependencies. A better approach for bundlers is using a Queue (Breadth-First Search).
Implementation: createGraph
function createGraph(entry) {
// 1. Parse the entry file
const mainAsset = createAsset(entry);
// 2. The queue initially contains just the entry asset
const queue = [mainAsset];
// 3. Iterate over the queue. Note: The queue grows as we push new dependencies into it!
for (const asset of queue) {
asset.mapping = {}; // This will map relative paths to unique IDs
// Get the directory of the current module to resolve relative paths
const dirname = path.dirname(asset.filename);
asset.dependencies.forEach(relativePath => {
// Resolve the absolute path of the dependency
// e.g., if asset is 'src/index.js' and imports './message.js',
// absolutePath becomes 'src/message.js'
const absolutePath = path.join(dirname, relativePath);
// 4. Create a child asset for the dependency
const child = createAsset(absolutePath);
// 5. Create the map: './message.js' -> ID 1
asset.mapping[relativePath] = child.id;
// 6. Add the child to the queue so its dependencies are processed too
queue.push(child);
});
}
// The queue is now an array containing every module in the application
return queue;
}
What just happened?
We linearised our application. Instead of a tree structure, we now have a flat array of objects (the queue), where each object knows which files it imports and what ID corresponds to that import. This mapping property is the glue that holds the graph together.
Phase 4: Code Generation (The Runtime Shim)
We have a graph of code, but browsers don't know how to handle require('./file.js') or exports. We need to inject a runtime shim—a tiny piece of code that teaches the browser how to load our modules.
This is the most intimidating part of looking at a Webpack bundle, but it is actually just a self-invoking function (IIFE).
The Architecture of the Shim
Our bundle will look like this:
(function(modules) {
// The module cache
// The require function
})(/* The Module Map */);
We need to generate a string that looks exactly like that.
Implementation: bundle
function bundle(graph) {
let modules = '';
// 1. Construct the Module Map string
// Format: id: [ function(require, module, exports) { code }, { mapping } ]
graph.forEach(mod => {
modules += `${mod.id}: [
function (require, module, exports) {
${mod.code}
},
${JSON.stringify(mod.mapping)},
],`;
});
// 2. The IIFE Wrapper
const result = `
(function(modules) {
// Cache to avoid re-executing modules
// In a real bundler, this prevents circular dependency loops
// but for simplicity we'll skip the persistent cache logic here.
function require(id) {
const [fn, mapping] = modules[id];
// The 'require' function that the module will call
function localRequire(name) {
// Lookup the ID based on the relative path
return require(mapping[name]);
}
const module = { exports: {} };
// Execute the module code
// We pass our custom 'localRequire', the 'module' object, and 'exports'
fn(localRequire, module, module.exports);
return module.exports;
}
// Start the application by requiring the entry point (ID 0)
require(0);
})({${modules}})
`;
return result;
}
Breaking Down the Magic
Let's dissect the bundle function's output:
- Scope Isolation: By wrapping each module's code in a
function (require, module, exports) { ... }, we ensure variables defined in one file don't leak into the global scope or other files. This replicates the Node.js module scope behavior. localRequire: Insideindex.js, we might sayrequire('./message.js'). Our globalmodulesobject uses numeric IDs, not paths. Themappingobject we built in thecreateGraphphase bridges this gap.localRequirelooks up./message.jsin the mapping, finds it is ID1, and calls the globalrequire(1).- Recursive Execution:
require(0)kicks off the chain. ID 0 runs, callsrequire(1), which runs and returns its exports, which ID 0 then uses.
Testing the Bundler
To verify this works, imagine a simple project structure:
src/message.js
export const message = "Hello from the custom bundler!";
src/index.js
import { message } from './message.js';
console.log(message);
build.js (Our Tool)
const graph = createGraph('./src/index.js');
const result = bundle(graph);
console.log(result);
Running node build.js will output a string. If you copy-paste that string into a browser console, you will see "Hello from the custom bundler!" logged. You have successfully built a JavaScript bundler.
Scalability and Performance Considerations
While our "ToyPack" works for simple cases, modern bundlers solve significantly harder problems:
- Circular Dependencies: Our simple recursion handles basics, but complex circular references (A -> B -> A) require a robust caching mechanism within the
requirefunction to return the currently exported object rather than re-running the module. - Tree Shaking: We included the entire
math.jsfile even if we only imported one function. Production bundlers parse the AST to detect unused exports and remove them ("dead code elimination"). - Non-JS Assets: Webpack uses "Loaders" to transform CSS, Images, or TypeScript into JavaScript modules. This involves extending the
createAssetfunction to handle different file extensions. - Caching: Parsing ASTs is slow. Tools like Vite use persistent caching (filesystem cache) or write parsers in faster languages (Go/Rust via esbuild/SWC) to achieve millisecond HMR (Hot Module Replacement) speeds.
Resources
- AST Explorer - Visualize Abstract Syntax Trees instantly.
- Babel Parser Documentation - Detailed API for the parser used in this tutorial.
- Minipack Repository - The inspiration for many educational bundlers.
- Webpack Internals - Learn how professional loaders work.