Posted by Dom-Woodman
Listings sites have a very specific set of search problems that you don’t run into everywhere else. In the day I’m one of Distilled’s analysts, but by night I run a job listings site, teflSearch. So, for my first Moz Blog post I thought I’d cover the three search problems with listings sites that I spent far too long agonising about.
Quick clarification time: What is a listings site (i.e. will this post be useful for you)?
The classic listings site is Craigslist, but plenty of other sites act like listing sites:
- Job sites like Monster
- E-commerce sites like Amazon
- Matching sites like Spareroom
1. Generating quality landing pages
The landing pages on listings sites are incredibly important. These pages are usually the primary drivers of converting traffic, and they’re usually generated automatically (or are occasionally custom category pages) .
For example, if I search “Jobs in Manchester“, you can see nearly every result is an automatically generated landing page or category page.
There are three common ways to generate these pages (occasionally a combination of more than one is used):
- Faceted pages: These are generated by facets—groups of preset filters that let you filter the current search results. They usually sit on the left-hand side of the page.
- Category pages: These pages are listings which have already had a filter applied and can’t be changed. They’re usually custom pages.
- Free-text search pages: These pages are generated by a free-text search box.
Those definitions are still bit general; let’s clear them up with some examples:
Amazon uses a combination of categories and facets. If you click on browse by department you can see all the category pages. Then on each category page you can see a faceted search. Amazon is so large that it needs both.
Indeed generates its landing pages through free text search, for example if we search for “IT jobs in manchester” it will generate: IT jobs in manchester.
teflSearch generates landing pages using just facets. The jobs in China landing page is simply a facet of the main search page.
Each method has its own search problems when used for generating landing pages, so lets tackle them one by one.
Facets and free text search will typically generate pages with parameters e.g. a search for “dogs” would produce:
These are still just ordinary free text search and facets, the URLs are just user friendly. (They’re a lot easier to work with in robots.txt too!)
Free search (& category) problems
If you’ve decided the base of your search will be a free text search, then we’ll have two major goals:
- Goal 1: Helping search engines find your landing pages
- Goal 2: Giving them link equity.
Search engines won’t use search boxes and so the solution to both problems is to provide links to the valuable landing pages so search engines can find them.
There are plenty of ways to do this, but two of the most common are:
- Category links alongside a search Photobucket uses a free text search to generate pages, but if we look at example search for photos of dogs, we can see the categories which define the landing pages along the right-hand side. (This is also an example of URL friendly searches!)
- Putting the main landing pages in a top-level menu Indeed also uses free text to generate landing pages, and they have a browse jobs section which contains the URL structure to allow search engines to find all the valuable landing pages.
Breadcrumbs are also often used in addition to the two above and in both the examples above, you’ll find breadcrumbs that reinforce that hierarchy.
Category (& facet) problems
Categories, because they tend to be custom pages, don’t actually have many search disadvantages. Instead it’s the other attributes that make them more or less desirable. You can create them for the purposes you want and so you typically won’t have too many problems.
However, if you also use a faceted search in each category (like Amazon) to generate additional landing pages, then you’ll run into all the problems described in the next section.
At first facets seem great, an easy way to generate multiple strong relevant landing pages without doing much at all. The problems appear because people don’t put limits on facets.
Lets take the job page on teflSearch. We can see it has 18 facets each with many options. Some of these options will generate useful landing pages:
The China facet in countries will generate “Jobs in China” that’s a useful landing page.
On the other hand, the “Conditional Bonus” facet will generate “Jobs with a conditional bonus,” and that’s not so great.
We can also see that the options within a single facet aren’t always useful. As of writing, I have a single job available in Serbia. That’s not a useful search result, and the poor user engagement combined with the tiny amount of content will be a strong signal to Google that it’s thin content. Depending on the scale of your site it’s very easy to generate a mass of poor-quality landing pages.
Facets generate other problems too. The primary one being they can create a huge amount of duplicate content and pages for search engines to get lost in. This is caused by two things: The first is the sheer number of possibilities they generate, and the second is because selecting facets in different orders creates identical pages with different URLs.
We end up with four goals for our facet-generated landing pages:
- Goal 1: Make sure our searchable landing pages are actually worth landing on, and that we’re not handing a mass of low-value pages to the search engines.
- Goal 2: Make sure we don’t generate multiple copies of our automatically generated landing pages.
- Goal 3: Make sure search engines don’t get caught in the metaphorical plastic six-pack rings of our facets.
- Goal 4: Make sure our landing pages have strong internal linking.
The first goal needs to be set internally; you’re always going to be the best judge of the number of results that need to present on a page in order for it to be useful to a user. I’d argue you can rarely ever go below three, but it depends both on your business and on how much content fluctuates on your site, as the useful landing pages might also change over time.
We can solve the next three problems as group. There are several possible solutions depending on what skills and resources you have access to; here are two possible solutions:
Category/facet solution 1: Blocking the majority of facets and providing external links
- Easiest method
- Good if your valuable category pages rarely change and you don’t have too many of them.
- Can be problematic if your valuable facet pages change a lot
Nofollow all your facet links, and noindex and block category pages which aren’t valuable or are deeper than x facet/folder levels into your search using robots.txt.
You set x by looking at where your useful facet pages exist that have search volume. So, for example, if you have three facets for televisions: manufacturer, size, and resolution, and even combinations of all three have multiple results and search volume, then you could set you index everything up to three levels.
On the other hand, if people are searching for three levels (e.g. “Samsung 42″ Full HD TV”) but you only have one or two results for three-level facets, then you’d be better off indexing two levels and letting the product pages themselves pick up long-tail traffic for the third level.
If you have valuable facet pages that exist deeper than 1 facet or folder into your search, then this creates some duplicate content problems dealt with in the aside “Indexing more than 1 level of facets” below.)
The immediate problem with this set-up, however, is that in one stroke we’ve removed most of the internal links to our category pages, and by no-following all the facet links, search engines won’t be able to find your valuable category pages.
In order re-create the linking, you can add a top level drop down menu to your site containing the most valuable category pages, add category links elsewhere on the page, or create a separate part of the site with links to the valuable category pages.
The top level drop down menu you can see on teflSearch (it’s the search jobs menu), the other two examples are demonstrated in Photobucket and Indeed respectively in the previous section.
The big advantage for this method is how quick it is to implement, it doesn’t require any fiddly internal logic and adding an extra menu option is usually minimal effort.
Category/facet solution 2: Creating internal logic to work with the facets
- Requires new internal logic
- Works for large numbers of category pages with value that can change rapidly
There are four parts to the second solution:
- Select valuable facet categories and allow those links to be followed. No-follow the rest.
- No-index all pages that return a number of items below the threshold for a useful landing page
- No-follow all facets on pages with a search depth greater than 1.
- Block all facet pages deeper than x level in robots.txt
As with the last solution, x is set by looking at where your useful facet pages exist that have search volume (full explanation in the first solution), and if you’re indexing more than one level you’ll need to check out the aside below to see how to deal with the duplicate content it generates.
This will generate landing pages for the facets you’ve decided are valuable and noindex the landing pages which are low-quality. It will only create pages for a single level of facets, which prevents duplicate content.
Aside: Indexing more than one level of facets
If you want a second level of facets to be indexable, e.g. Televisions – Facet 1 (46″), Facet 2 (Samsung), then the easiest option is to remove the fourth rule from above and either add links to them using one of the methods in Solution 1, or add the pages to your sitemap.
The alternative is to set robots.txt to allow category pages up to 2 levels to be indexed and all facets to be followed up to two levels.
This will, however, create duplicate content, because now search engines will be able to create:
- Televisions – 46″ – Samsung
- Televisions – Samsung – 46″
You’ll have to either rel canonical your duplicate pages with another rule or set-up your facets so they create a single unique URL.
You’ll also need to be aware that unless you set-up more complicated logic, all of your followable facets will multiply. Depending on your setup you might need to block more paths in robots.txt or set-up more logic.
Letting search engines index more than one level of facets adds a lot of possible problems; make sure you’re keeping track of them.
2. User-generated content cannibalization
This is a common problem for listings sites (assuming they allow user generated content). If you’re reading this as an e-commerce site who only lists their own products, you can skip this one.
As we covered in the first area, category pages on listings sites are usually the landing pages aiming for the valuable search terms, but as your users start generating pages they can often create titles and content that cannibalise your landing pages.
Suppose you’re a job site with a category page for PHP Jobs in Greater Manchester. If a recruiter then creates a job advert for PHP Jobs in Greater Manchester for the 4 positions they currently have, you’ve got a duplicate content problem.
This is less of a problem when your site is large and your categories mature, it will be obvious to any search engine which are your high value category pages, but at the start where you’re lacking authority and individual listings might contain more relevant content than your own search pages this can be a problem.
Solution 1: Create structured titles
Set the <title> differently than the on-page title. Depending on variables you have available to you can set the title tag programmatically without changing the page title using other information given by the user.
For example, on our imaginary job site, suppose the recruiter also provided the following information in other fields:
- The no. of positions: 4
- The primary area: PHP Developer
- The name of the recruiting company: ABC Recruitment
- Location: Manchester
We could set the <title> pattern to be: *No of positions* *The primary area* with *recruiter name* in *Location* which would give us:
4 PHP Developers with ABC Recruitment in Manchester
Setting a <title> tag allows you to target long-tail traffic by constructing detailed descriptive titles. In our above example, imagine the recruiter had specified “Castlefield, Manchester” as the location.
All of a sudden, you’ve got a perfect opportunity to pick up long-tail traffic for people searching in Castlefield in Manchester.
On the downside, you lose the ability to pick up long-tail traffic where your users have chosen keywords you wouldn’t have used.
For example, suppose Manchester has a jobs program called “Green Highway.” A job advert title containing “Green Highway” might pick up valuable long-tail traffic. Being able to discover this, however, and find a way to fit it into a dynamic title is very hard.
Solution 2: Use regex to noindex the offending pages
Perform a regex (or string contains) search on your listings titles and no-index the ones which cannabalise your main category pages.
If it’s not possible to construct titles with variables or your users provide a lot of additional long-tail traffic with their own titles, then is a great option. On the downside, you miss out on possible structured long-tail traffic that you might’ve been able to aim for.
Solution 3: De-index all your listings
It may seem rash, but if you’re a large site with a huge number of very similar or low-content listings, you might want to consider this, but there is no common standard. Some sites like Indeed choose to no-index all their job adverts, whereas some other sites like Craigslist index all their individual listings because they’ll drive long tail traffic.
Don’t de-index them all lightly!
3. Constantly expiring content
Our third and final problem is that user-generated content doesn’t last forever. Particularly on listings sites, it’s constantly expiring and changing.
For most use cases I’d recommend 301’ing expired content to a relevant category page, with a message triggered by the redirect notifying the user of why they’ve been redirected. It typically comes out as the best combination of search and UX.
For more information or advice on how to deal with the edge cases, there’s a previous Moz blog post on how to deal with expired content which I think does an excellent job of covering this area.
In summary, if you’re working with listings sites, all three of the following need to be kept in mind:
- How are the landing pages generated? If they’re generated using free text or facets have the potential problems been solved?
- Is user generated content cannibalising the main landing pages?
- How has constantly expiring content been dealt with?
Good luck listing, and if you’ve had any other tricky problems or solutions you’ve come across working on listings sites lets chat about them in the comments below!
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!