PrestaShop Robots.txt: What to Block, What to Allow

404 views

What Is Robots.txt and Why It Matters for PrestaShop

The robots.txt file sits at the root of your PrestaShop installation and acts as the first point of communication between your store and search engine crawlers. It tells bots like Googlebot, Bingbot, and others which parts of your site they may crawl and which they should skip. While it is not a security mechanism (it does not prevent access, only advises crawlers), it is one of the most important tools for managing your crawl budget — the number of pages a search engine will crawl on your site within a given timeframe.

For PrestaShop stores, this matters enormously. A typical PrestaShop installation can generate thousands of URL variations through filters, sorting options, pagination, currency switching, and search queries. If left unchecked, search engine bots will waste their crawl budget on these low-value pages instead of discovering and indexing your actual product and category pages.

How PrestaShop Generates Its Robots.txt

PrestaShop includes a built-in robots.txt generator accessible from the Back Office. Navigate to Shop Parameters > Traffic & SEO and scroll to the bottom where you will find the "Robots file generation" section. Clicking the generate button creates a robots.txt file at your store's root directory.

The default generated file typically includes rules like these -

User-agent: *
Disallow: /classes/
Disallow: /config/
Disallow: /download/
Disallow: /mails/
Disallow: /modules/
Disallow: /translations/
Disallow: /tools/
Disallow: /*?orderby=
Disallow: /*?orderway=
Disallow: /*?tag=
Disallow: /*?id_currency=
Disallow: /*?search_query=
Disallow: /*?back=
Disallow: /*?n=
Disallow: /*&orderby=
Disallow: /*&orderway=
Disallow: /*&tag=
Disallow: /*&id_currency=
Disallow: /*&search_query=
Disallow: /*&back=
Disallow: /*&n=
Sitemap: https://yourstore.com/sitemap.xml

While this is a reasonable starting point, it is far from complete. Many critical URL patterns that waste crawl budget are not included.

What You Must Block in PrestaShop

1. Cart, Checkout, and Account Pages

These pages are user-specific and provide zero SEO value. They should always be blocked -

Disallow: /*?controller=cart
Disallow: /*?controller=order
Disallow: /*?controller=authentication
Disallow: /*?controller=my-account
Disallow: /*?controller=identity
Disallow: /*?controller=addresses
Disallow: /*?controller=address
Disallow: /*?controller=history
Disallow: /*?controller=order-detail
Disallow: /*?controller=password
Disallow: /*?controller=discount
Disallow: /*?controller=order-return
Disallow: /*?controller=order-follow
Disallow: /*?controller=guest-tracking
Disallow: /cart
Disallow: /order
Disallow: /login
Disallow: /my-account
Disallow: /password-recovery

2. Faceted Navigation and Layered Filters

Faceted navigation is the single biggest crawl budget killer for e-commerce stores. When a customer uses filters like color, size, or price range, PrestaShop generates unique URLs for every combination. A category with 5 colors, 4 sizes, and 3 price ranges can produce hundreds of URL combinations — none of which should be in Google's index.

# Block layered navigation filter parameters
Disallow: /*?q=
Disallow: /*&q=
Disallow: /*?selected_filters=
Disallow: /*&selected_filters=
Disallow: /module/ambjolisearch/jolisearch

# Block price filter combinations
Disallow: /*?price=
Disallow: /*&price=

# Block attribute and feature filters
Disallow: /*?id_attribute_group=
Disallow: /*&id_attribute_group=
Disallow: /*?id_feature=
Disallow: /*&id_feature=

3. Internal Search Results

Internal search result pages are thin content and should never be indexed. They frequently create near-duplicate pages and are a known source of quality issues -

Disallow: /*?controller=search
Disallow: /*?s=
Disallow: /*&s=
Disallow: /search
Disallow: /*?search_query=
Disallow: /*&search_query=

4. Pagination Parameters

While category pages themselves should be crawlable, the pagination parameters that generate sort/page variants should be controlled -

Disallow: /*?page=
Disallow: /*&page=
Disallow: /*?p=
Disallow: /*&p=

Important note - Be careful with pagination. If you block /*?page= entirely, you may prevent crawlers from reaching products that only appear on deeper pages. A better approach is to implement rel="canonical" tags pointing paginated pages to the first page, or to use rel="next" and rel="prev" pagination signals.

5. Comparison Pages and Wishlists

Disallow: /*?controller=comparison
Disallow: /comparison
Disallow: /*?controller=wishlist
Disallow: /module/blockwishlist/

6. Admin and System Directories

Disallow: /admin*/
Disallow: /app/
Disallow: /bin/
Disallow: /cache/
Disallow: /classes/
Disallow: /config/
Disallow: /controllers/
Disallow: /docs/
Disallow: /download/
Disallow: /img/tmp/
Disallow: /localization/
Disallow: /mails/
Disallow: /override/
Disallow: /pdf/
Disallow: /src/
Disallow: /tools/
Disallow: /translations/
Disallow: /upload/
Disallow: /var/
Disallow: /vendor/
Disallow: /webservice/

7. URL Tracking Parameters

Marketing campaign parameters create duplicate content when bots crawl tagged URLs -

Disallow: /*?utm_source=
Disallow: /*?utm_medium=
Disallow: /*?utm_campaign=
Disallow: /*&utm_source=
Disallow: /*&utm_medium=
Disallow: /*&utm_campaign=
Disallow: /*?fbclid=
Disallow: /*?gclid=
Disallow: /*?ref=

What You Must Allow in PrestaShop

1. Product and Category Pages

These are the core of your store and must always remain crawlable. Do not block your main content directories.

2. CSS, JavaScript, and Image Files

Google needs to render your pages to evaluate content quality. Blocking CSS or JS files prevents rendering and can hurt rankings -

Allow: /themes/*/assets/
Allow: /themes/*/css/
Allow: /themes/*/js/
Allow: /js/
Allow: /img/
Allow: /modules/*/views/css/
Allow: /modules/*/views/js/

3. CMS Pages

Your legal pages, about pages, and content marketing pages should be fully crawlable. Ensure they are not accidentally caught by broad Disallow rules.

4. Manufacturer and Supplier Pages (If Used)

If you maintain rich manufacturer or supplier pages with unique content, keep them crawlable. If they are thin auto-generated pages, consider blocking them.

Handling AI Crawlers

The rise of AI services has introduced a new category of crawlers that scrape content for training purposes. If you want to prevent your product descriptions, images, and other content from being used by AI models, you can add specific rules -

# Block AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: Bytespider
Disallow: /

Note that blocking Google-Extended prevents Google from using your content for AI training (Gemini) while still allowing regular Googlebot to crawl and index your pages normally.

Complete Recommended Robots.txt for PrestaShop

Here is a comprehensive robots.txt file that you can adapt for your PrestaShop store -

# Main search engine crawlers
User-agent: *

# Allow static assets
Allow: /themes/*/assets/
Allow: /themes/*/css/
Allow: /themes/*/js/
Allow: /js/
Allow: /img/
Allow: /modules/*/views/css/
Allow: /modules/*/views/js/

# Block system directories
Disallow: /app/
Disallow: /bin/
Disallow: /cache/
Disallow: /classes/
Disallow: /config/
Disallow: /controllers/
Disallow: /docs/
Disallow: /download/
Disallow: /img/tmp/
Disallow: /localization/
Disallow: /mails/
Disallow: /override/
Disallow: /pdf/
Disallow: /src/
Disallow: /tools/
Disallow: /translations/
Disallow: /upload/
Disallow: /var/
Disallow: /vendor/
Disallow: /webservice/

# Block cart, checkout, account
Disallow: /cart
Disallow: /order
Disallow: /login
Disallow: /my-account
Disallow: /password-recovery
Disallow: /guest-tracking
Disallow: /*?controller=cart
Disallow: /*?controller=order
Disallow: /*?controller=authentication
Disallow: /*?controller=my-account
Disallow: /*?controller=identity
Disallow: /*?controller=history
Disallow: /*?controller=password

# Block filters and sorting
Disallow: /*?orderby=
Disallow: /*?orderway=
Disallow: /*?n=
Disallow: /*?q=
Disallow: /*?selected_filters=
Disallow: /*?id_currency=
Disallow: /*?tag=
Disallow: /*?back=
Disallow: /*&orderby=
Disallow: /*&orderway=
Disallow: /*&n=
Disallow: /*&q=
Disallow: /*&selected_filters=
Disallow: /*&id_currency=
Disallow: /*&tag=
Disallow: /*&back=

# Block search
Disallow: /*?controller=search
Disallow: /*?search_query=
Disallow: /*&search_query=
Disallow: /*?s=
Disallow: /*&s=
Disallow: /search

# Block tracking parameters
Disallow: /*?utm_source=
Disallow: /*?utm_medium=
Disallow: /*?utm_campaign=
Disallow: /*&utm_source=
Disallow: /*?fbclid=
Disallow: /*?gclid=

# Block comparison and wishlist
Disallow: /*?controller=comparison
Disallow: /comparison

# Sitemap
Sitemap: https://yourstore.com/1_index_sitemap.xml

# Block AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

Common Mistakes to Avoid

Blocking the Modules Directory Entirely

The default PrestaShop robots.txt blocks /modules/. While you do not want module PHP files crawled, many modules serve critical CSS and JavaScript from this directory. The blanket block can prevent Google from rendering your pages correctly. Instead, block /modules/ but explicitly allow CSS and JS subdirectories as shown above.

Using Robots.txt Instead of Noindex

A critical misunderstanding - robots.txt tells bots not to crawl a URL, but it does not prevent indexing. If another site links to a page you have blocked in robots.txt, Google may still index it (showing "A description for this result is not available due to this site's robots.txt"). For pages you want completely removed from search results, use the noindex meta tag or the X-Robots-Tag HTTP header instead.

Forgetting the Sitemap Reference

Always include your sitemap URL at the bottom of robots.txt. This helps crawlers find your sitemap immediately. If you use a module that generates multiple sitemaps, reference the sitemap index file.

Using Overly Broad Rules

A rule like Disallow: /*? would block every URL with any query parameter, which would be catastrophic. Be specific with your rules and test them using Google Search Console's robots.txt tester before deploying.

Testing Your Robots.txt Configuration

  1. Google Search Console - Use the Robots.txt Tester tool (found under the legacy tools) to check specific URLs against your rules
  2. Manual testing - Visit yourstore.com/robots.txt directly in your browser to verify the file is accessible and correctly formatted
  3. Coverage report - After deploying changes, monitor the Coverage report in Google Search Console for any unexpected increases in "Excluded" pages
  4. Log file analysis - Check your server logs to verify that bots are actually respecting your rules and not wasting crawl budget on blocked URLs

Multistore Considerations

If you run a PrestaShop multistore setup, each store (domain) needs its own robots.txt file at its root. The PrestaShop generator creates rules for all shops in a single file, but if your stores are on different domains, you need to split them accordingly. Each store's robots.txt should reference its own sitemap and have rules appropriate to its URL structure.

When to Regenerate Your Robots.txt

You should regenerate or update your robots.txt whenever you -

  • Add new modules that create public-facing URLs (search modules, filter modules)
  • Change your URL structure or enable/disable friendly URLs
  • Switch themes (different themes may serve assets from different paths)
  • Add or remove languages (which changes URL prefixes)
  • Enable or disable the multistore feature
  • Notice unusual crawl patterns in your server logs or Google Search Console

Remember - always keep a backup of your working robots.txt before regenerating. The PrestaShop generator overwrites the file completely, and any custom rules you have added manually will be lost unless you re-add them after generation.

For more details, read our guides: PrestaShop SEO: The Complete Guide to Ranking Higher and The Complete Guide to XML Sitemaps for PrestaShop SEO.

Was this answer helpful?

Still have questions?

Can't find what you're looking for? Send us your question and we'll get back to you quickly.

Loading...
Back to top