You wake up to a thread of customer emails asking why the site was down at 3am. The order count is normal, the error page is intermittent, and your admin dashboard looks fine. The hosting provider sends a graph that shows MySQL connections pegged at the limit and PHP-FPM workers all busy. None of the requests were from real customers.

If you run a PrestaShop store with faceted (layered) search and you have not yet been hit by this pattern, you almost certainly will be. The shape of the trouble is dull, the cause is structural, and the right fix is not the one most store owners reach for first. This is a write-up of how to actually defend against it — specifically the ?q= flood: the moment ordinary crawler traffic turns your own faceted search into a self-inflicted denial-of-service surface. This post is the deep dive on that one failure mode; if you want the wider picture, our PrestaShop security hardening checklist is the hub that ties every layer together.

The symptom

Logs show thousands of requests to URLs like /category-slug?q=Brand-Brand-A/Brand-Brand-B/Color-Red, every one slightly different, coming from rotating IPs and a mix of declared user agents. Some claim to be Bingbot, some claim to be Chrome, some declare themselves as GPTBot or ClaudeBot or PerplexityBot. The category URL itself is legitimate — it is your own faceted search exposing brand and attribute filters as query-string parameters — but the volume and combinatorial diversity of the requests is not.

The PrestaShop forums have collected reports of this pattern from store owners running 1.7.x and 8.x throughout 2024 and 2025. The shape is consistent: one thread alone documents store owners watching tens of thousands of unique ?q= URLs being crawled per day, with the controller continuing to respond even after the ps_facetedsearch module is disabled.

The first thing most owners try is disabling the faceted search module. It does not help. ps_facetedsearch exposes the filter UI in the storefront, but a GET request to /category-slug?q=anything still boots PrestaShop, dispatches to the category controller, and runs through the request lifecycle. The query string is a valid request shape whether or not the module that generated those URLs is currently enabled. Disabling the module hides the filter sidebar from human visitors; it does not prevent automated clients from continuing to hit URLs they already discovered.

Why the volume is growing

The combinatorial nature of faceted-search URLs has been around since the module shipped. What is new is the scale of automated traffic that finds and crawls them.

Cloudflare's July 2025 analysis of crawler traffic across its network reported:

  • Crawler traffic up 18% year over year, with peaks of 32% in April 2025.
  • Googlebot: +96% growth, expanding from roughly 30% to 50% of all crawl traffic.
  • GPTBot: +305% growth, 2.2% to 7.7% share.
  • PerplexityBot: essentially zero to meaningful in a single year.
  • ClaudeBot: down 46% in requests after early-2025 changes.
  • Bytespider: down 85%.

The composition shifts month to month but the trend does not: more crawlers, hitting more URLs, more often. A category page that generates a few hundred filter combinations becomes a target for everything from Googlebot building structured-data indexes to AI training scrapers harvesting product attributes. Cloudflare itself began blocking AI training bots by default on new zones in mid-2025, which tells you how the platform reads the trajectory.

Layered on top of that legitimate crawler traffic is a much smaller but more aggressive layer of hostile automation: scrapers, competitive-intelligence bots, and stress-test traffic. These do not respect robots.txt, rotate IPs aggressively, and forge legitimate user-agent strings. They are the ones that take the site down. Stopping that broader class of unwanted automation — not just the ?q= flood — is its own discipline; we cover the general approach in blocking bad bots and unwanted traffic.

Why the obvious fixes do not work

Most of the advice on community forums falls into one of five buckets. None of them is sufficient on its own.

Disabling the faceted-search module. Removes the UI, leaves the URL surface. The category controller still answers ?q= requests.

Adding Disallow: /*?q= to robots.txt. Googlebot largely honors it. Bingbot honors it inconsistently — there are public reports of it crawling disallowed query patterns anyway. AI crawlers respect robots.txt at very different rates depending on the operator. Hostile scrapers ignore it entirely. Worth doing, never sufficient.

A blanket .htaccess block on any URL containing ?q=. This catches the bots, but it also breaks your own AJAX filter requests (the same URL pattern your storefront uses to fetch filter results without a page reload) and breaks every bookmarked filter URL real customers have saved or shared. The SEO damage compounds if Google had indexed any of those filter pages as canonical for long-tail queries. There is a place for hand-tuned server rules — just not a sledgehammer one; our PrestaShop .htaccess security and performance rules covers the ones worth keeping.

Blocking specific bot user-agents. Whack-a-mole. New bots appear weekly, hostile bots forge UA strings, and the legitimate AI referrers (ChatGPT, Perplexity, Copilot) that drive a growing share of inbound clicks would also be blocked. The crawl-to-click ratio is still poor for AI bots, but it is not zero — Cloudflare's own data shows referrals exist.

Full-page caching of ?q= URLs. The cache hit rate is near zero because each filter combination is a unique URL. You cache 50,000 pages and serve each one once. The cache becomes a write-amplification problem instead of a read shortcut.

The remaining option — and the one the rest of this post is about — is to put the decision of who gets to hit those URLs in front of PrestaShop entirely, where it can be made cheaply and without booting PHP.

The right fix: a Cloudflare Managed Challenge on the faceted-search shape

Cloudflare's Managed Challenge is the right tool for this problem for three reasons. It is non-interactive most of the time (humans pass it transparently), it does not require you to name specific bots in a blocklist, and it lets you keep your storefront's own AJAX filter calls working by matching on a request shape that automation cannot trivially mimic.

The pattern is two custom rules (paid Bot Management) or one (free tier), evaluated against any GET that contains the q query parameter.

If you have Bot Management (paid)

Rule 1 — skip verified bots. Match on the expression cf.client.bot, with the action set to Skip remaining custom rules. This lets Googlebot, Bingbot, and any other Cloudflare-verified crawler through unchallenged. Verified-bot detection is based on reverse DNS and signed IP ranges, not user-agent strings, so a hostile scraper forging "Mozilla/5.0 (compatible; Googlebot/2.1)" will not match.

Rule 2 — challenge likely automation on the faceted-search shape. The expression, written across its logical clauses, reads:

  • http.request.method eq "GET"
  • and len(http.request.uri.args["q"]) >= 0 — the request carries a q parameter
  • and not cf.client.bot — it is not a verified crawler
  • and not any(lower(http.request.headers["x-requested-with"][*])[*] eq "xmlhttprequest") — it is not the storefront's own AJAX filter call
  • and not starts_with(http.request.uri.path, "/module/")
  • and not starts_with(http.request.uri.path, "/api/")
  • and cf.bot_management.score gt 1 and cf.bot_management.score lt 30 — it lands in the "probably automation, not yet definitely" band

Action: Managed Challenge. In plain English: a GET request that carries a q query parameter, is not a verified bot, is not the storefront's own AJAX filter call (which sends X-Requested-With: XMLHttpRequest), is not a module or API endpoint, and scores as likely automation — challenge it. Humans get a one-click pass; automation gets stopped at the edge.

If you want to be stricter on definitively-automated traffic, add a rule before Rule 2 that matches the same GET-with-q, non-verified-bot shape but with cf.bot_management.score eq 1 — Cloudflare's "this is automation, full stop" rating. Action: Block (defensible) or Managed Challenge, depending on appetite.

If you do not have Bot Management (free or Pro plan)

The Cloudflare expression should match the faceted-search shape, not every category URL. Start in log mode or managed challenge before blocking:

(http.request.uri.query contains "q=" and http.request.uri.path contains "/category")
or
(http.request.uri.path matches "^/[^?]+/q-[^/]+")

There is no equivalent bot score on free plans. The old cf.threat_score field that older guides recommend is now permanently set to 0 — Cloudflare deprecated it. Do not waste time on rules that reference it.

Without a bot score, you fall back to matching on request shape alone. Start conservative with a single rule that fires on: a GET request, where len(http.request.uri.args["q"]) >= 0, and not cf.client.bot, and the request is not an AJAX call (no X-Requested-With: XMLHttpRequest header), and the path does not start with /module/ or /api/. Action: Managed Challenge.

This challenges every non-AJAX, non-verified-bot GET that includes a q parameter. Real customers loading a bookmarked filter URL see a brief Managed Challenge and pass it. Bookmarked links keep working. Crawler traffic gets cut off at the edge. The cost is one extra round-trip for human visitors who land on a faceted URL via a saved bookmark or shared link.

Important caveats before you turn any of this on

Test in log-only / Security Events first. Set the rule action to "Log" for at least 24–48 hours and watch Cloudflare's Security Events dashboard. You are looking for two things: that the rule is firing on the traffic you expect (a lot of it), and that it is not firing on legitimate paths you forgot about — sitemap fetchers, analytics tools, a custom module that for some reason uses ?q= in its own URLs. Only flip to Managed Challenge after you have confirmed the matcher is clean.

The X-Requested-With exception is a UX preservation, not a security boundary. Any HTTP client can send that header. An attacker who realizes the rule exists can defeat it by adding the header to their scraper. The reason it is in the rule is to let your own storefront's filter AJAX through without a challenge — not to keep determined automation out. The cf.client.bot and bot-score checks are what actually stop automation; the AJAX exception just keeps real users on the same page when they click filters.

Proxy every public web hostname (orange-cloud) — but handle mail and non-HTTP records correctly. Mail/MX and other non-HTTP records generally must not be proxied through Cloudflare, and the orange-cloud proxy cannot protect non-HTTP services the same way. The real risk is any DNS-only (grey-cloud) record — mail, dev, an admin subdomain — that resolves to the same server as the shop, because that leaks the web origin IP. Anyone who finds the origin can bypass Cloudflare entirely and hit PrestaShop directly. Keep mail and other non-HTTP records DNS-only, but make sure they do not point to or reveal the web origin IP — move mail, dev, and admin onto separate IPs, or restrict them. Audit your DNS panel before you trust the WAF.

Restrict the origin to Cloudflare IPs. Even with everything proxied, hard-block the origin at the firewall to accept HTTP/HTTPS only from Cloudflare's published IP ranges. Otherwise an attacker who learns the origin IP from old DNS history can hit it directly. While you are restricting origin access, make sure that origin is HTTPS-only end to end — our walkthrough on setting up SSL and HTTPS on PrestaShop covers the certificate and redirect side so the edge and the origin agree.

If you depend on AI search referrals, watch the impact. Some AI crawlers are also part of the referral path that brings customers in via ChatGPT, Perplexity, and Copilot links. A blanket challenge will reduce both crawl traffic and (a smaller amount of) inbound referrals. The trade-off is usually worth it for an e-commerce site, but it is a trade-off — check your analytics after a week.

The fallback: a Dispatcher override

If you cannot deploy Cloudflare today — because you do not control DNS, because your host's CDN integration is fragile, because a payment module is incompatible with proxied origins, or because you simply need the bleeding to stop in the next ten minutes — the in-PHP fallback is a Dispatcher override that 301-redirects non-AJAX ?q= requests back to the same URL without the filter.

You place this at override/classes/Dispatcher.php. The class extends DispatcherCore and overrides dispatch(). The logic: if Tools::getValue('q') is present and the request is not an AJAX call (the HTTP_X_REQUESTED_WITH server var is empty or not xmlhttprequest), then parse $_SERVER['REQUEST_URI'], strip the q key out of the parsed query parameters, rebuild the URL with http_build_query() against Tools::getShopDomainSsl(true), send a 301 Location header to the stripped URL, and exit. Otherwise it returns parent::dispatch() untouched.

What this does: any direct-browser, non-AJAX request to a URL containing ?q= gets 301-redirected to the same path with the q parameter removed. AJAX requests (the storefront's own filter calls) pass through unchanged. The faceted filter UI keeps working for real customers; the URL surface that bots crawl collapses to a single canonical category page. (An override at override/classes/Dispatcher.php is a supported PrestaShop extension point — it survives module updates, but remember to clear the class-index cache after dropping the file in, or PrestaShop will keep loading the core class.)

Three trade-offs to understand before you ship this:

  • SEO impact. If Google has indexed any of your filter URLs as canonical for long-tail queries ("blue running shoes size 9"), those rankings collapse — every indexed filter URL now redirects to the unfiltered category. For most stores this is acceptable; the original facet URLs were generally weak rankings anyway. But check Search Console before you deploy, and consider a one-time export of the high-value filter URLs you want to keep, with explicit 301 maps to dedicated landing pages.
  • Bookmarks and shares. A customer who bookmarked or shared a filter URL no longer lands on the filtered view. The Cloudflare Managed Challenge approach preserves this; the Dispatcher override does not.
  • It only helps with PHP-side load. The request still hits PHP — it just exits quickly with a 301 instead of running the full category page. If your bottleneck is connection count rather than CPU, this helps less than the edge-level fix.

Deploy it as a stopgap. Move to Cloudflare as soon as you can.

Edge versus origin: which fix for which bottleneck

The two defenses are not interchangeable. Pick by where your store is actually hurting.

ConcernCloudflare Managed ChallengeDispatcher override (301)
Where it runsAt the edge, before PHPIn PHP, early in the request
Stops PHP/MySQL loadYes — request never reaches originPartly — PHP boots, then exits on 301
Helps with connection exhaustionYesLimited — still consumes a worker briefly
Preserves bookmarked filter URLsYes (one-click challenge)No (strips the filter)
Needs DNS / CDN controlYesNo
CostFree tier works; Bot Management is sharperFree, one override file
Best asThe permanent fixA ten-minute stopgap

If you can do both, do both: the edge rule carries the load, and the override is your safety net for the window before DNS propagates or if you ever have to pull the proxy.

The architectural angle: replace the URL surface itself

Filter Revolution SEO page configuration screen

Indexable filter pages should be deliberate URLs, not accidental crawler combinations.

Both fixes above protect the existing ps_facetedsearch URL surface from automation. Neither changes the fact that the module exposes a combinatorially large public URL space in the first place.

A different class of filter module avoids the problem at the source: filter state lives client-side (or in a single AJAX POST), the public URL stays clean, and only deliberately-curated SEO landing pages — a hand-built page for "men's running shoes" rather than every brand-colour-size combination — become indexable. We built Filter Revolution in that style, partly in response to exactly the traffic patterns described here. So what does that mean for you? It shrinks the high-cardinality public URL surface that makes ps_facetedsearch attractive to crawlers, which means fewer combinatorial pages for bots to discover and fewer 3am pages from your host — configured from the back office, without forking your theme. It is not a magic shield against AI crawlers — nothing is — and it does not retroactively un-index the URLs Google and Bing have already found. But it removes the structural attack surface that the Cloudflare and Dispatcher fixes are working around.

If you are tired of patching the symptom and you are due for a faceted-search rework anyway, look at replacing the module rather than wrapping it. If the existing module fits the store, the Cloudflare rule above will keep working indefinitely.

One last thing: the empty carts

Every bot request that hits the category page also triggers PrestaShop's session and cart bootstrap. Each unique automated visitor gets a ps_cart row. After a month of crawler traffic, your back-office cart counter shows 80,000 "active" carts, your cart-related admin pages slow to a crawl, and any abandoned-cart workflow becomes unusable because almost none of those carts ever had a human attached.

If you have already deployed the Cloudflare rule, this stops growing — but the existing pollution does not clean itself up, and PrestaShop's built-in cart-cleanup tasks are not aggressive enough to handle months of accumulated bot carts. The pragmatic fix is to hide anonymous, bot-origin carts from the back-office counters and listings rather than delete them: the rows stay in the database for whatever forensic value they hold, but they stop polluting the UI. This is a distinct problem from the flood itself and we treat it as its own job — if your cart admin is already buried under bot carts, that is the cleanup to run alongside the edge rule, not instead of it.

Where this sits in your wider defenses

The ?q= flood is one specific failure mode, but the control plane you build to stop it pays off across the rest of your security posture. A few adjacent jobs worth lining up once the edge rule is in place:

  • Per-visitor blocking and IP bans. When a single source keeps probing after the challenge, you want a clean way to cut it off and to see who is hitting you — see customer extra info and IP bans.
  • Form spam, separate from traffic floods. The same automation that crawls facets will also hammer your contact and registration forms; a challenge on submission is the answer there — see reCAPTCHA for PrestaShop.
  • If you can't upgrade the core yet. Stores stuck on an older PrestaShop have fewer native levers; the edge-and-override approach here is part of a broader virtual-patching strategy in advanced hardening for stores you can't upgrade yet.
  • New to all of this. If the terminology above is a lot, start with the plain-English guide for store owners and work up from there.

Quick checklist

  • Turn on Cloudflare for the domain. Proxy every public web hostname (orange-cloud); keep mail and other non-HTTP records DNS-only and make sure they do not point to or reveal the web origin IP.
  • Add Rule 1 (skip verified bots) and Rule 2 (Managed Challenge on the faceted shape) in log-only mode. Watch Security Events for 24–48 hours.
  • Confirm legitimate AJAX filter calls are not being challenged. Confirm Googlebot is reaching the site. Confirm scraper traffic is being matched.
  • Flip Rule 2 from Log to Managed Challenge.
  • Restrict the origin firewall to Cloudflare IP ranges. This is what closes the loop.
  • If Cloudflare is not an option, deploy the Dispatcher override as a stopgap and plan the migration.
  • If you are due for a filter-module rework, evaluate replacing ps_facetedsearch entirely.
  • Clean up the back-office cart pollution separately so the cart admin stays usable.

Closing

This is the new baseline. Faceted-search URL explosion meets AI-era crawler volume, and the result is a self-inflicted DoS surface that ships out of the box on every PrestaShop install. The defaults will not save you, the obvious workarounds break legitimate users, and "block the bots" gets harder every month as the bots get better.

The good news is that the right fix lives at the edge, costs nothing on a free Cloudflare plan, and takes about thirty minutes to deploy carefully. The bad news is that it is one more thing you should have done last year. The compounding news is that the work carries over: once you have a clean origin behind a tuned WAF, the next class of automated nuisance — credential stuffing, scraping, fake-checkout floods — gets noticeably easier to defend against from the same control plane.

For the full layered approach, return to the PrestaShop security hardening checklist. For what happens when a store gets compromised by something more sophisticated than crawler abuse, see the anatomy of a Magecart-style attack on a PrestaShop 1.7.x store. The defensive playbook is the same shape either way: cheap controls at the edge, narrow exceptions where you have to make them, and absolutely no security claims that rest on user-agent strings.

Share this post:
David Miller

David Miller

Over a decade of hands-on PrestaShop expertise. David builds high-performance e-commerce modules focused on SEO, checkout optimization, and store management. Passionate about clean code and measurable results.

Enjoyed this article?

Get our latest tips, guides and module updates delivered to your inbox.

Comments

No comments yet. Be the first!

Be the first to ask a question or share useful feedback.

Loading...
Back to top