Building a Custom JavaScript Bundler from Scratch

Tools like Webpack, Rollup, and Vite are often treated as black boxes. We feed them modern JavaScript, Sass, and TypeScript, and they magically spit out optimised, browser-compatible bundles. Treating our build tools as incomprehensible magic limits our ability to debug complex issues, optimise build performance, or configure specific architectural patterns. Understanding how a bundler works under the hood—parsing code into Abstract Syntax Trees (ASTs), resolving module dependencies, and generating a cohesive dependency graph—is a rite of passage for senior engineers.

Deep Dive

The Anatomy of a Bundler

At its core, a JavaScript bundler performs three primary tasks:

  1. Dependency Resolution: Starting from an entry point (like index.js), it identifies all the files your code imports.
  2. Parsing & Analysis: It reads these files and converts the raw text into a data structure the computer can understand (an AST) to extract dependencies reliably.
  3. Packing: It combines all these disparate files into a single bundle that can run in the browser, providing a runtime system to handle require or import calls.

We will tackle these phases sequentially, implementing a custom bundler we'll call "ToyPack". To do this, we will use the babel ecosystem, specifically @babel/parser for reading code, @babel/traverse for inspecting it, and @babel/core for transpiling it.

Phase 1: Understanding the Abstract Syntax Tree (AST)

Before we write code, we must understand the data structure that powers almost all code transformation tools (including Prettier, ESLint, and TypeScript): the Abstract Syntax Tree.

When you read code, you see strings of text. When a computer reads code, it needs a tree structure that represents the syntactic relationship between elements.

Consider this simple import statement:

import { add } from './math.js';

If we tried to parse this using Regular Expressions (Regex), we would eventually fail. What if the import is commented out? What if it's inside a string? What if it's dynamic? An AST avoids these pitfalls by breaking the code down into Nodes.

Using a tool like AST Explorer, we can see that the line above translates roughly to this JSON structure:

{
  "type": "ImportDeclaration",
  "source": {
    "type": "StringLiteral",
    "value": "./math.js"
  },
  "specifiers": [
    {
      "type": "ImportSpecifier",
      "imported": { "type": "Identifier", "name": "add" },
      "local": { "type": "Identifier", "name": "add" }
    }
  ]
}

Why this matters: Our bundler needs to find every file your project depends on. Instead of "Ctrl+F" searching for the word import, we will parse the code into an AST, walk through the tree, and look specifically for nodes of type ImportDeclaration. This is robust and accurate.

Phase 2: The Asset Creator

The first step in our implementation is to create a function that takes the path to a file, reads it, and extracts its dependencies. We call this unit of information an Asset.

We will use:

  • fs: To read files from the file system.
  • @babel/parser: To generate the AST.
  • @babel/traverse: To walk the AST and collect dependencies.
  • @babel/core: To transpile the code from ES Modules (which browsers support but are tricky to bundle simply) to CommonJS (which is easier for our custom shim to handle).

Implementation: createAsset

const fs = require('fs');
const path = require('path');
const parser = require('@babel/parser');
const traverse = require('@babel/traverse').default;
const { transformFromAst } = require('@babel/core');

let ID = 0;

function createAsset(filename) {
  // 1. Read the content of the file as a string
  const content = fs.readFileSync(filename, 'utf-8');

  // 2. Parse the content into an AST
  // We specify 'sourceType: module' because we are using ES Modules (import/export)
  const ast = parser.parse(content, {
    sourceType: 'module',
  });

  // 3. Array to hold the relative paths of this file's dependencies
  const dependencies = [];

  // 4. Traverse the AST to find import declarations
  traverse(ast, {
    ImportDeclaration: ({ node }) => {
      // The value of the source is the path being imported (e.g., './math.js')
      dependencies.push(node.source.value);
    },
  });

  // 5. Transpile the code to CommonJS so it runs in standard environments
  // This converts 'import x from y' into 'require(y)'
  const { code } = transformFromAst(ast, null, {
    presets: ['@babel/preset-env'],
  });

  // 6. Return the Asset object
  return {
    id: ID++,
    filename,
    dependencies,
    code,
  };
}

Key Concept - The Unique ID: Notice let ID = 0. Every file in our bundle gets a unique integer ID. This helps us separate the concept of the file (ID 1) from its filepath (./utils/math.js), which simplifies the runtime logic later.

Phase 3: Building the Dependency Graph

Now that we can analyse a single file, we need to stitch them together. We start at the Entry Point (usually src/index.js) and recursively find all dependencies.

However, recursion can be tricky with deep trees (stack overflow risks) and circular dependencies. A better approach for bundlers is using a Queue (Breadth-First Search).

Implementation: createGraph

function createGraph(entry) {
  // 1. Parse the entry file
  const mainAsset = createAsset(entry);

  // 2. The queue initially contains just the entry asset
  const queue = [mainAsset];

  // 3. Iterate over the queue. Note: The queue grows as we push new dependencies into it!
  for (const asset of queue) {
    asset.mapping = {}; // This will map relative paths to unique IDs

    // Get the directory of the current module to resolve relative paths
    const dirname = path.dirname(asset.filename);

    asset.dependencies.forEach(relativePath => {
      // Resolve the absolute path of the dependency
      // e.g., if asset is 'src/index.js' and imports './message.js',
      // absolutePath becomes 'src/message.js'
      const absolutePath = path.join(dirname, relativePath);

      // 4. Create a child asset for the dependency
      const child = createAsset(absolutePath);

      // 5. Create the map: './message.js' -> ID 1
      asset.mapping[relativePath] = child.id;

      // 6. Add the child to the queue so its dependencies are processed too
      queue.push(child);
    });
  }

  // The queue is now an array containing every module in the application
  return queue;
}

What just happened? We linearised our application. Instead of a tree structure, we now have a flat array of objects (the queue), where each object knows which files it imports and what ID corresponds to that import. This mapping property is the glue that holds the graph together.

Phase 4: Code Generation (The Runtime Shim)

We have a graph of code, but browsers don't know how to handle require('./file.js') or exports. We need to inject a runtime shim—a tiny piece of code that teaches the browser how to load our modules.

This is the most intimidating part of looking at a Webpack bundle, but it is actually just a self-invoking function (IIFE).

The Architecture of the Shim

Our bundle will look like this:

(function(modules) {
  // The module cache
  // The require function
})(/* The Module Map */);

We need to generate a string that looks exactly like that.

Implementation: bundle

function bundle(graph) {
  let modules = '';

  // 1. Construct the Module Map string
  // Format: id: [ function(require, module, exports) { code }, { mapping } ]
  graph.forEach(mod => {
    modules += `${mod.id}: [
      function (require, module, exports) {
        ${mod.code}
      },
      ${JSON.stringify(mod.mapping)},
    ],`;
  });

  // 2. The IIFE Wrapper
  const result = `
    (function(modules) {
      // Cache to avoid re-executing modules
      // In a real bundler, this prevents circular dependency loops
      // but for simplicity we'll skip the persistent cache logic here.

      function require(id) {
        const [fn, mapping] = modules[id];

        // The 'require' function that the module will call
        function localRequire(name) {
          // Lookup the ID based on the relative path
          return require(mapping[name]);
        }

        const module = { exports: {} };

        // Execute the module code
        // We pass our custom 'localRequire', the 'module' object, and 'exports'
        fn(localRequire, module, module.exports);

        return module.exports;
      }

      // Start the application by requiring the entry point (ID 0)
      require(0);
    })({${modules}})
  `;

  return result;
}

Breaking Down the Magic

Let's dissect the bundle function's output:

  1. Scope Isolation: By wrapping each module's code in a function (require, module, exports) { ... }, we ensure variables defined in one file don't leak into the global scope or other files. This replicates the Node.js module scope behavior.
  2. localRequire: Inside index.js, we might say require('./message.js'). Our global modules object uses numeric IDs, not paths. The mapping object we built in the createGraph phase bridges this gap. localRequire looks up ./message.js in the mapping, finds it is ID 1, and calls the global require(1).
  3. Recursive Execution: require(0) kicks off the chain. ID 0 runs, calls require(1), which runs and returns its exports, which ID 0 then uses.

Testing the Bundler

To verify this works, imagine a simple project structure:

src/message.js

export const message = "Hello from the custom bundler!";

src/index.js

import { message } from './message.js';
console.log(message);

build.js (Our Tool)

const graph = createGraph('./src/index.js');
const result = bundle(graph);
console.log(result);

Running node build.js will output a string. If you copy-paste that string into a browser console, you will see "Hello from the custom bundler!" logged. You have successfully built a JavaScript bundler.

Scalability and Performance Considerations

While our "ToyPack" works for simple cases, modern bundlers solve significantly harder problems:

  • Circular Dependencies: Our simple recursion handles basics, but complex circular references (A -> B -> A) require a robust caching mechanism within the require function to return the currently exported object rather than re-running the module.
  • Tree Shaking: We included the entire math.js file even if we only imported one function. Production bundlers parse the AST to detect unused exports and remove them ("dead code elimination").
  • Non-JS Assets: Webpack uses "Loaders" to transform CSS, Images, or TypeScript into JavaScript modules. This involves extending the createAsset function to handle different file extensions.
  • Caching: Parsing ASTs is slow. Tools like Vite use persistent caching (filesystem cache) or write parsers in faster languages (Go/Rust via esbuild/SWC) to achieve millisecond HMR (Hot Module Replacement) speeds.

Resources

Author

Efe Omoregie

Efe Omoregie

Software engineer with a passion for computer science, programming and cloud computing