A large scale B2B e-commerce platform with 10,000+ products was suffering from indexation bloat. Millions of low-value, duplicate URLs generated by faceted navigation filters were consuming the site’s crawl budget, preventing search engines from discovering and indexing new, high-margin product arrivals.
The Zero Waste Solution
Log File Analysis: Audited server logs to identify and eliminate highwaste parameter strings.
Robots.txt Surgery: Implemented strict Disallow rules for faceted navigation as well as canonical tags, and noindex directives.
XML Sitemap Purge: Redirected search bots from low-value “junk” URLs to high-revenue pages, maximizing indexation of missioncritical content.
Tools Used
Screaming Frog (Log File Analyzer)
Search Console (Crawl Stats Report)
Regular Expressions (RegEx)
The Business Impact
Crawl Efficiency: 40% increase in the crawl rate of “Priority” product pages.
Visibility: New products now indexed within 24 hours (previously 14+ days).
Rankings: 15% lift in sitewide organic sessions within 60 days.
The Deep Dive
Crawl Gap Analysis
Utilized Google Search Console and, ScreamingFrog/SEMrush to identify the delta between URLs discovered and URLs indexed. This revealed that faceted navigation (dynamic filters for size, color, and price) were generating over 1.4M unique URL combinations that lacked business value.
Directive Optimization
Implemented a global Disallow strategy within the robots.txt file to block search bots from crawling non-essential parameter strings. This immediately preserved the site’s crawl budget for high priority product and category pages.
Canonical & Tag Consolidation
Developed a logic based canonical tag framework to ensure that any filtered views that did get crawled would pass their link equity back to the root product page, preventing internal competition and duplicate content issues.
Validation & Monitoring
Established a custom dashboard to monitor the indexation rate of new product launches. By reducing technical noise, we ensured that new inventory moved from “Discovered” to “Indexed” in an average of 3.2 days.