There comes a point in every large ecommerce business when product discovery stops scaling quietly, structurally. At first, nothing looks wrong: search returns results; filters behave normally; recommendations load; dashboards don’t flash red alerts. From the outside, the system appears healthy, but the signals begin to change, internally.
Search relevance starts to vary between categories. Updates to the index take longer than expected. Merchandising teams add more manual rules just to maintain visibility for priority products. Engineering teams spend less time improving discovery and more time ensuring it doesn’t break under load.
Growth continues, meaning more SKUs, more categories, more markets, yet discovery starts feeling heavier, slower, harder to evolve. This usually begins once catalogs move past a few million SKUs. By the time they reach 10, 20, or even 80 million, the issue is no longer operational. It’s architectural.
And that’s the moment many retailers realize they are no longer tuning a system. They are stretching one.
“Built for scale” is easy to claim and hard to validate.
At high SKU volumes, the phrase stops being marketing language and becomes measurable. A system either maintains performance and relevance as complexity increases, or it forces teams to compensate.
A discovery platform that truly scales must do four things consistently:
When any of these breaks, the impact is immediate. Merchandisers rely on manual overrides. Engineers throttle features to protect performance. Product teams slow down experimentation because change feels risky.
Over time, discovery becomes rigid. Growth continues, but the experience does not keep up. This is precisely why many retailers begin reevaluating discovery after experiencing the early signs outlined in discussions around why product discovery breaks at scale.
Once catalogs reach tens of millions of SKUs, single-index architectures struggle to keep up.
Data volume increases. Query complexity grows. Update frequency accelerates. A centralized index becomes a bottleneck for both search performance and catalog freshness.
Distributed indexing addresses this by spreading data across multiple nodes. Queries execute in parallel. Index updates complete faster. The system remains stable during traffic spikes or component failures.
Without this approach, teams see familiar warning signs:
For buyers evaluating discovery platforms, the key question is not whether distributed indexing exists, but whether it is foundational to the system and proven at large catalog sizes.
Latency often sounds like a technical metric, but at scale it becomes behavioral. It directly shapes how shoppers interact with large catalogs.
Even small delays add friction. Across millions of searches, that friction compounds into lower engagement, weaker conversion, and declining trust in results.
Consistently fast response times depend on several factors working together:
Performance must hold not only for simple keyword searches, but also for complex queries involving filters, sorting, and large result sets. If response times degrade as catalog size grows, discovery stops enabling exploration and starts limiting it.
Large catalogs are never static. Prices change throughout the day. Inventory fluctuates. New products arrive continuously. Attributes evolve as categories expand.
Discovery systems that rely on batch ingestion introduce delay by design. That delay shows up as stale results, inaccurate filters, and products that exist in the system but remain undiscoverable.
Real-time catalog ingestion removes this lag. New products surface immediately. Inventory and pricing updates reflect as they happen. Filters stay accurate throughout the day.
At scale, freshness is not a feature. It is a baseline requirement.
Reindexing is inevitable as catalogs grow and change. What matters is whether it disrupts the business.
If reindexing requires downtime or degraded performance, teams avoid it. Ranking logic remains untouched. Attribute structures stay outdated. Improvements are postponed because the risk feels too high.
Zero-downtime reindexing allows teams to evolve discovery continuously. Relevance models can be updated. Data structures can change. Experiments can run without impacting live traffic.
This flexibility is what allows discovery to keep pace with changing shopper behavior instead of falling behind it.
Most discovery platforms look similar on paper. Feature parity is high, and checklists rarely expose meaningful differences.
What separates platforms at scale is how they perform under real conditions:
Benchmarks make these differences visible. Buyers should insist on seeing performance demonstrated with realistic catalog sizes and traffic patterns, not reduced test environments.
As catalogs grow, evaluation mistakes become harder to reverse.
Teams test with only a fraction of their catalog. They prioritize interface controls over system durability. They assume scale can be enabled later. They rely on manual merchandising to mask relevance gaps.
These shortcuts work temporarily. Eventually, the underlying limitations surface, often when traffic is highest and expectations are hardest to meet. Leaders who succeed treat product discovery the same way they treat infrastructure. They stress test it early, before growth forces their hand.
Powering discovery for tens of millions of SKUs is not theoretical. It requires systems that have already operated at this level.
Some platforms are trusted by global retailers managing catalogs in the tens of millions, where speed, relevance, and reliability are non-negotiable. That trust is earned through sustained performance, not roadmap commitments.
For teams shortlisting product discovery solutions, the takeaway is to not ask how the system performs today instead ask how it behaves when the catalog doubles.
At this scale, product discovery challenges become architectural rather than operational. Single-index systems struggle to handle data volume, query complexity, and update frequency simultaneously. Distributed indexing, real-time ingestion, and sub-10ms response times become structural requirements.
Distributed indexing spreads catalog data across multiple nodes so queries can execute in parallel and index updates complete faster. For catalogs exceeding tens of millions of SKUs, it prevents any single component from becoming a bottleneck keeping both search performance and catalog freshness stable under load.
Zero-downtime reindexing allows ranking logic, data structures, and relevance models to be updated without pausing or degrading live search. Lack of which causes product discovery to fall behind changing shopper behavior over time.
Large catalogs cannot be static. Prices, inventory, and product attributes change continuously. Batch ingestion introduces delay by design, causing stale filters, inaccurate pricing, and undiscoverable products. Real-time ingestion ensures what exists in the catalog is immediately findable and accurately represented.