Occasionally, I'm working n a client site that requires some sort of "featured article" feature. The idea is that editors can flag a handful of content pieces as more important than most and bump them up to the front of the queue for an otherwise chronological listing.

It's a great idea, but sometimes there's a minor detail in implementation that we miss: deduplication.


WordPress ships automatically with a "sticky posts" feature that helps deal with featured content and automatically handles deduplication. Editors can flag a post as being sticky, and WordPress will automatically pull it out of the chronological feed and dump it at the top of the queue.

The catch: your first page of queried posts will now have n + m elements: the standard [cci]posts_per_page[/cci] amount plus all of your sticky content. The advantage, though, is that your sticky posts won't show up twice as readers page through content.

Sticky posts are great for featured content if you can solve the pagination problem.


If you roll your own featured post functionality, you can more easily solve the pagination problem ([cci]posts_per_page[/cci] is respected automatically), but you're left determining a better way to de-duplicate your post content. There isn't much of a problem with a post showing up on page 1 (featured) and then later on page 50 - readers don't often get that deep into pagination unless they're looking for something, and it's rarely the content they already had on page 1.

But when a new post is also featured, the chances of it showing up twice within the first few pages of your chronological feed are high. You never want readers to see this, for example:

CNN Duplicate Article

CNN wrote a great article about HP - but there's absolutely no reason to list the same coverage of the same story twice. These links appear to be for different articles, given their differing images, update times, and Facebook share counts. The links go through to nearly identical articles - with the similarity down to a 1-day difference in publication date. The first article was a solo post; the second a collaboration that re-uses the majority of the content from the first verbatim.

This is an attempt by CNN to make it feel to readers that there's more coverage on an event than really exists.

The Consequence

A reader coming to your site is looking for unique content - your personal take on an event or idea, breaking coverage of some interesting news, or just prose that's unique from what they've read elsewhere. Seeing the same content on pages 1 and 50 of your feed is disconcerting if anyone notices.

Seeing duplicated content on page 1 of your site is frustrating and demonstrates a failure on your part to understand the goals and interests of your customers.

There is no faster way to drive content away from your site than self-plagiarism. Pointing two different snippets (excerpts, links, etc) to the same content is a fantastic way to make it appear like you have more content than you do. It's also a great way to tell your readers you care more about traffic than providing value to them, a message they'll understand loud and clear when they leave you for another source of information.