Deep Linking from RSS
One of the more unique and perhaps controversial features of FeedJournal
is that it can filter out the meat of an article published on the web.
How does it accomplish this? FeedJournal has four ways of retrieving the
actual content for the next issue.
In the trivial case, a site (like this blog for
example) decides to include the full article text within its RSS feed.
FeedJournal simply published the content; no surprises here. By the way,
this is how all standard RSS aggregators work. The problem is when a
site decides to only publish summaries or teasers of the full article
text. FeedJournal needs to deal with this because it is an offline RSS
reader, users cannot click on their printed newspaper to read the full
The <link> tag inside the RSS feed specifies
the URL for the full article. In case the RSS only includes summaries of
the full articles, FeedJournal retrieves the text from this URL.
In most cases, just following this link is not
a good solution. The web page typically includes lots of irrelevant
content, like a navigation menu, a blogroll, or other articles.
FeedJournal lets the user write a regular expression for each feed,
automatically rewriting the article’s URL to the URL of the
printer-friendly version. As an example the URL to a full article in
International Herald Tribune is
while the link to the printer-friendly version is
By inserting bin/print_ipub.php?file=/ in the middle of
the URL we will reach the printer-friendly article. This article is much
more suitable for publishing in FeedJournal, because it more or less
only contains the meat of the article.
“More or less”, I said in the last sentence.
There are usually some unwanted elements left in the printer-friendly
version, like a header and a footer. These can be filtered out by
letting FeedJournal begin the article after a specified substring in the
HTML document source. Likewise, another substring can be selected as
ending the relevant content.
By applying these functions it is possible to scoop, or extract, the
meat of almost any web published article. Of course it is only necessary
to do this once for every feed. To my knowledge, FeedJournal is the only
aggregator who has the functionality described in the last three
Is this legal, you ask? Wouldn’t a site owner require each user to
actually visit the web site to read the content and click on all those
fancy ads sprinkled all over? Well, my stance is that if the content is
freely available on the web, I am free to do whatever I want with it for
my own purposes. Keep in mind that we are not actually republishing the
site’s content, we are only filtering it for our own use. Essentially, I
think of this as a pop-up or ad blocker running in your browser.
What is interesting to note is that some web sites have tried to include
in their copyright notice a paragraph limiting the usage of their
content. Digg.com, for example, initially had a clause in the their
copyright effectively prohibiting RSS aggregators from using their RSS
feeds! Today, it is removed.
As long as FeedJournal is used for personal use, and the issues are not
sold or made available publicly, I do not see any legal problems with
the deep linking.