7 reasons why I don't like content 'aggregators' who scrape blog sites
Today a post on twitter drew my attention to Bioinfo-Bloggers, a site that aggregates content — i.e. the full blog post is reproduced — from 28 different bloggers who write about bioinformatics and genomics.
Outwardly, this might seem like a good idea. The bloggers get more exposure to their material, and readers can visit just one site instead of 28 separate RSS feeds. However, there are several reasons why I have issues with this type of aggregation. Many of my concerns apply even when individual bloggers have expressly licensed their material for reuse (e.g. by use of a CC0 Creative Commons license).
- The site lists the 28 blogs as 'contributors' and lists the blog writers as 'authors'. This strongly suggests that the people in question have consented to their material being used, even when this is not the case.
- Links to the original blog posts are included, but only at the end of each reproduced entry. The included text says that 'This is a syndicated post', further suggesting that the original authors agreed to have their content syndicated.
- The Bioinfo-Bloggers website asserts copyright over all material (see footer section of website).
- The original bloggers lose web traffic. This can matter for minor reasons such as when you want to include details of how popular your blog is for outreach sections on research grants. But it potentially — depending on how much traffic Bioinfo-bloggers gets — deprives you of knowing who is looking at your content, which articles are more popular, etc.
- People don't a chance to comment on your blog (unless they follow the links). You may lose some direct engagement with your readers.
- If people start using this site rather than viewing your blog, what happens if Bioinfo-Bloggers stops including your blog site, or shuts down altogether? In the former case, people might just assume you are not posting any more.
- What happens if Bioinfo-Bloggers starts including content from other blogs that you don't approve of? Your blog post may appear alongside another which espouses views you find offensive.
The first three points could easily be addressed by removing the claim of copyright over all material, by making it explicit that this site is just scraping other sites and that the original bloggers may not be aware of this, and by placing links to the original blog content at the top (not bottom) of each article.
There are currently some ongoing discussions about this on Twitter. E.g.
.@kbradnam *shrug* coming from open source, you sort of get used to stuff being used however. Relevance > details for me. YMMV!
— Titus Brown (@ctitusbrown) November 5, 2013
@BioMickWatson @kbradnam @ctitusbrown @BioinfoBloggers - lower left hand side "Copyright © 2013 Bioinfo-Bloggers. All Rights Reserved"
— Casey Bergman (@caseybergman) November 5, 2013