Initially, I wanted to host my Hugo site on a custom domain that I own (fahmifj.space) than using the GitHub Pages domain (fahmifj.github.io). But, proceeding this action would likely break the current blog’s SEO, because search engines already indexed the GH Pages domain.

This time, I want to keep this GitHub pages domain co exist with my custom domain without breaking the SEO. The GH pages will still be my primary site, while custom domain is the secondary. Both are going to serve the same contents without domain redirection (stays on the domain you typed).

In short, what I want is like a mirror site.

Although I’m not really concerned about SEO, i think it’s a great opportunity to learn something from this case.

Cloudflare Workers as Reverse Proxy

In Cloudflare, there is a feature called Workers. It’s similar to Google Cloud Functions, where we can write and run serverless code. Using this feature, I can write a proxy/middleware for fetching my GitHub Pages content and then serve it back on my custom domain.

                       ┌──────────────────────────┐
                       │   fahmifj.github.io      │
                       │    (upstream hosting)    │
                       └──────────┬───────────────┘
                                  │ GitHub Pages origin
                          Cloudflare Worker
                          (proxy + rewrite)
                            fahmifj.space
                           (custom domain)

Create Worker (v1)

I will create a worker where it functions as a content fetcher.

export default {
  async fetch(request) {
    const url = new URL(request.url);

    // Force everything to fetch from GitHub Pages 
    url.hostname = "fahmifj.github.io";

    const response = await fetch(url.toString(), {
      headers: request.headers,
      redirect: "follow"
    });

    // Copy GitHub response and serve it
    return new Response(response.body, response);
  }
};

The results:

image-20251208204713163

Custom Domain Worker

I will add custom domain to my worker (Workers & Pages → My Worker → Settings → Domain & Routers).

image-20251209003601514

And now I can see that fahmifj.space serving the same content as fahmifj.github.io.

Worker with URL Rewriter (v2)

Hugo builds my site with absolute URL, and that is based on my GH Pages domain, which is fahmifj.github.io. Because of it, most of link click in my custom domain will land users back to GH Pages domain.

Therefore, I have to make an adjustment on the worker to rewrite all the GitHub Pages links and replace it to my custom domain. I also add a condition to block search engines indexing bot and a regex where it removes canonical link before the contents are served on the custom domain.

export default {
  async fetch(request) {
    const upstream = "fahmifj.github.io"; // GitHub Pages domain
    const host = "fahmifj.space";  // Custom domain
    
    const url = new URL(request.url);
   
    // Prevent indexing by googlebot
    const ua = request.headers.get("user-agent") || "";
    if (ua.includes("googlebot", "bingbot", "slurp", "duckduckbot", "baiduspider", "yandex")) {
      return new Response("Not for indexing", { status: 403 });
    }
    
    // Do not serve sitemap
    if (url.pathname === "/sitemap.xml") {
      return new Response("", { status: 404 });
    }
    
    url.hostname = upstream;
    
    // Fetch upstream content
    const res = await fetch(url.toString(), {
      headers: request.headers,
      redirect: "follow",
    });

    // Only HTML needs rewriting, leave CSS and JS as is
    const contentType = res.headers.get("content-type") || "";
    if (!contentType.includes("text/html")) {
      return res;
    }

    // Read HTML response
    let html = await res.text();

    //  Rewrite all URLs from upstream → custom domain
    html = html.replaceAll(`https://${upstream}`, `https://${host}`);
    html = html.replaceAll(`http://${upstream}`, `https://${host}`);
    html = html.replaceAll(`${upstream}`, host);

    // Remove canonical <link> tag (so GitHub Pages stays canonical)
    html = html.replace(
      /<link[^>]+rel=["']?canonical["']?[^>]*>/gi,
      ""
    );
    
    // Add no-index header (no crawling on my custom domain)
    const newHeaders = new Headers(res.headers);
    newHeaders.set("X-Robots-Tag", "noindex, nofollow");
    
    // Return modified HTML
    return new Response(html, {
      status: res.status,
      headers: newHeaders
    });
  }
};

Conclusion

I have achieved my goal by making GH Pages domain coexist with my custom domain which basically I’m just creating a mirror site of my blog.

Some of you might think that i am so weird to keep using GitHub pages even though I already have a custom domain. Well. the reason is simple, i want to my blog to stay alive as long as possible by using the free service by GitHub itself.

In short, I think using GitHub Pages as my canonical means I don’t have to worry about maintaining a domain for SEO stability. GitHub handles everything and the canonical URL stays consistent.

But if I used my custom domain as the canonical source, then I would be responsible for maintaining that domain (or even server). The custom domain can expire in one or two year and I might forget renewing it or even I decide to switch to another TLD, then it’s gonna affect the SEO.

This way, I will still have the freedom to change my custom domain in the future without having to worry about SEO.

Okay that’s it. See you!

Reference