Linking Policies

Molly Holzschlag and Cory Doctorow discussed the idea of linking polices recently. I actually wrote a short paper on this in school a few months ago and I’m glad that web sites that try to restrict links to their site are being publicly mocked and brought to task for making such egregiously stupid and misleading statements on their sites. Cory stated:

?The Web exists because no one has the right to grant or withhold permission for links. Fast Company exists because of the Web. Accordingly, we neither grant nor deny permission to link to our site, and urge you to do the same.?

Linking polices are bunk. The Web is built on the idea that everything can link to everything else freely. Tim Berners-Lee wrote about this on the W3C site 7 years ago and explained how linking prohibitions violate the fundamental idea of the World Wide Web.

The following is my paper on Deep Linking and link policies written in May of 2004.

Is deep linking, creating hyperlinks that bypass a web site?s home page in favor of directly accessing desired content, a violation of copyright, a misleading or deceptive practice, or otherwise illegal or wrong? In recent years a number of high profile cases have brought deep linking to popular attention around the world. Major news sites in Denmark and Germany and in the United States have brought suit to stop other web sites from linking to pages besides their homepage, or in some cases from linking to their site at all.

There are a number of ethical issues related to deep linking. However, many of the supposedly ethical objections to deep linking are in fact simply technical or business matters. There are no real ethical principles involved in many of these arguments. This is important to keep in mind when analyzing this issue.

What are valid arguments against deep linking? By linking directly to content on another site and using misleading language in and around the link, a web site may be misrepresenting the relationship between the two sites. A competitor may be misrepresented as a partner or affiliate. The text or images used in conjunction with the link could falsely lead visitors to believe that a disreputable linker is associated with or endorsed by the site that is linked to. Such a misrepresentation can seriously damage a reputation.

Next, there is an issue of spiders and bots which scan a server and create links to the content that is found. These are not human visitors who may be potential customers, these are scripts that scan and extract content from another web site. Some web site owners hold that this illegally trespasses on their servers and steals processor time and bandwidth (a similar argument is made against spam). Since stealing is generally regarded as wrong this is an interesting, potentially persuasive, argument against all automated accessing of web content.

The World Wide Web is based on the concept of hyper-linking. Tim Berners-Lee, founder of the WWW, discussed the idea of linking on the World Wide Web Consortium (W3C) site:

The intention in the design of the web was that normal links should simply be references, with no implied meaning. A normal hypertext link does NOT necessarily imply that
? One document endorses the other; or that
? One document is created by the same person as the other, or that
? One document is to be considered part of another. 1

Additionally, he points out that the windowed interface system, wherein new pages either replace the current one or open in a new window reinforces the perception that following links leads to a new, distinct place. As a result, users of modern web browsers are unlikely to follow links to a new web site and not understand that it is distinct from the previous site. Further, as most users are using graphical browsers, the visual design of each site is likely to be distinct, further reducing the possibility of mistaking two separate sites for one.

There is still, however, the issue of the text and images surrounding a link leading to confusion and damages to the site that is linked to. There is a big difference between ?Our services are endorsed by Company X ? and ?You many also be interested in information found at Company X ?. The context in which a link is placed can be misleading and may violate libel or other laws that would be equally applicable offline.

It is very easy to create a link, no permission is required for it to work, and they can be placed pretty much anywhere on a page. There is no effective way to stop someone from creating links to your site or any page on it. This leaves little recourse for those who want to stop others from creating links to their site. However, there are a number of simple and effective methods to redirect visitors who follow these links to the site back to the home page.

Another issue that should be mentioned is that of search engines and other sites using bots and spiders to find content on other servers that are then displayed with links to the pages. Accessing and searching the servers uses processing power and bandwidth and some claim that it is an unauthorized trespass and amounts to stealing. This is a grey area and little has been resolved. However, by placing a robots.txt file that denies access to these search scripts a site can use less than a Kb of bandwidth in denying access and can keep any reputable searchers off of their servers. Password protection and user authentication can further block access to pages within a site.

Almost everyone who uses a computer and some who don?t are affected by this issue. If linking without permission were to be banned it would destroy the web as we know it. Isolated sites that are impossible to find would be the norm. The difficulty of negotiating a linking agreement with billions of sites would destroy search engines such as Google and render directories mostly ineffective as well. The World Wide Web would become largely untangled and unconnected. This could deprive billions worldwide of access to important information. Losing the dynamic interconnectivity of the Web would further the Cyber-Balkanization that David Siegel documented 2.

Now let?s look more carefully at the major groups of stakeholders in this issue. First are the owners and operators of web sites that are being linked to. They have an interest in protecting the content that they worked hard to publish to the web, their reputations, and their server cycles and the bandwidth that they must pay for. This is the group that would object to deep linking, they are also the group that wants to preserve deep linking so that their sites will be connected to other useful, related web sites. There is an odd internal conflict in this group. For instance, objected to linking directly to pages within their site, but uses deep links to provide more information on performers and venues, as shown in the screenshot 3.

Ticketmaster Deep Linking to another site

The next group that should be considered is the visitors to Web sites. Billions of people are now using the Internet regularly so this group is of some small importance when considering the way the Web operates. Visitors to web sites are not generally interested in the site on its own, it is part of a broader information gathering or disseminating activity. These activities tend to involve a little work at a lot of locations. Bates? berrypicking model
of information gathering shows people moving from site to site gathering and processing small bits of information to meet a larger information need. Providing useful links to other web sites is a service that aids visitors in this task.

We have already seen how the legal issues surrounding deep linking are poorly defined in most discussions of the issue. There is the idea of trespassing when spiders and bots use server resources which is a little odd on a web site that is open to the public. There is also the more nebulous yet historically accepted idea of protecting trademarks and reputations that could be damaged when a disreputable site misrepresents its connection to a reputable one through links. However, in the court cases in the United States and Europe in the last few years judges have consistently ruled that deep linking in general is not illegal and should not be restricted or prohibited. It is only in a special case where deceptive labeling and contexts are causing real damages that it might violate some law.

The first response to unwanted deep linking should be to contact the owner or operator of the web site that features the links. Ask that the links either be removed are changed to point to the homepage and any misleading text be changed to accurately reflect the relationship between the two sites.

If a polite request does not work there are many simple technological solutions that can keep the links form working. These include a script that checks the referrer when a web page is requested. Using either a whitelist or blacklist to filter referrers any visitors coming from a denied referrer can be redirected invisibly to the homepage or another page explaining the redirection. This can be done in a number of ways using a variety of scripting languages or other techniques and can be implemented at minimal cost (certainly a fraction of any court costs). These techniques can be used to restrict access to any or all of the public pages on a web site. For private pages the referrer check can be combined with user authentication using a username and password and SSL or other secure connections to further restrict access to pages.

Chris Berman, Associate Editor at, wrote a scathing review of a Danish court decision that was critical of deep linking and the court?s arguments for doing so. He followed the court?s argument out to its logical conclusion:

But why stop with the web? How about those sneaky academics, citing the work of fellow scholars with footnotes to specific articles using exact page numbers in the journals that published them? And just think of the worst offenders of all — librarians, who not only help patrons find books, magazines and other materials but often even show them where to find specific information within the works?4

His point is a good one, even if it sounds a bit far fetched. URLs and descriptions of what is found on a particular web page are facts. They are therefore not protected by copyright any more than highway signs are. Just as the information in a card catalog or a bibliography does not violate the spirit of copyright laws, most deep links are simply pointing to and describing publicly available information. So, even if a court is mistakenly ruling against deep linking, it is clearly not in line with the spirit of laws meant to protect the investment of content creators.

As for those who would create deep links on their own sites, how should they respond to a request that a deep link be removed? They must first ask if they have been using the link to mislead visitors to the site. If so they should either remove the link or change the context it is found in. Next they should ask if it is worth the trouble if they should be the victim of a spurious lawsuit on the part of the site they are linking to. Legal fees should be balanced against any losses due to removing the link and any principles they will be compromising by removing the link. Finally, they can either choose to comply with the request, work out a deal, or decline to comply. Whatever response is chosen it should be a result of conscious deliberation and an understanding of the potential consequences of compliance or denial of such a request.

The deep links we are discussing point to information that is voluntarily placed on a public web server with no restrictions on who can browse to the URLs. In this regard, objecting to deep linking is rather like placing sign on your front lawn and complaining when people driving by look at it. Because the information is being put out voluntarily and is open to all comers, the Fair Information Use principles don?t apply all that clearly. It is hard to argue that only certain pre-approved people should be allowed to point to public web pages. It can be argued that web pages are like personal information, and even if it is known it should not be revealed without express permission. But this is certainly not in line with the fundamental principles the WWW was founded on nor is it in the interest of a dynamic, growing online environment. Carrying out such a restrictive policy would effectively balkanize the Web by removing the ability to freely search, index, and point to other web sites.

Deep linking in general is not particularly tied to the cognitive or affective development of the individuals or groups involved. In the vast majority of cases where deep linking occurs it is accepted as part of the inherent functionality of the Web, as it was meant to be. This indicates that most individuals are operating above the level where they feel they must protect even their public domain from unlicensed linking.

There are some exceptional case where these development sequences may aid in understanding the actions of the parties involved and the proper methods for responding to their actions. In the case of unscrupulous individuals who are using deep links to another site to misrepresent their relationship with those sites there are some ethical and developmental issues at play. However, this could range from individuals who are unable to restrain themselves from creating inappropriate links on their sites to those who have carefully planned and thought out their actions and are deep linking as a part of a concerted campaign of lies and misrepresentations. Each case must be handled on an individual basis and involves individuals and organizations operating at varied developmental levels.

There is also the case of those who object to deep links to their sites. Some are objecting based on reasoned principles and legal arguments and they are operating at a high cognitive and affective level. Even if they are wrong or you disagree with their conclusions they are still operating at a high level. However, others are operating a much lower level and their objections are based on perceptions of intrusions into their ?territory? or other egocentric principles. These will be the ones that are more likely to sue first and ask questions later. Sometimes they may be individuals who want to strictly control who
links to their site and at other times it may be an organization with the same desire, but either way their objections are more knee jerk reactions than reasoned arguments.

There are some very basic principles like stealing and lying at play in deep linking. The issue of whether using spiders and bots to scan and index the contents of web servers is stealing or trespassing is extremely important and hard to settle. In a traditional context it might be considered trespassing but if we look at it by means of an offline analogy it looks different. Consider a door to door salesperson. If walk up to your door they are technically trespassing, but if they see the ?No Solicitors? sign in your front door and they immediately move on no court in the land would charge them with trespassing. Similarly, if a well behaved spider sees a robots.txt file and immediately stops scanning the server it seems reasonable to say they have behaved appropriately and broken no laws.

Next is the issue of placing links in a deceptive or misleading context. Using deep links to misrepresent the relationship (or lack thereof) between two web sites violates basic tenets of honesty and disclosure. This includes both claims of relationships that don?t exist and hiding relationships that do exist. The underlying principle is simple and applies just as well to deep links as to newspaper columns or book reviews. Any relationships should be clearly defined and openly revealed when referring others to a web site, book, movie, or anything else. The Dutch Internet Service Provider Xs4all, which recently won a lawsuit brought by the Church of Scientology, pointed out in a corporate statement, ?after all, a hyperlink is merely a road marker on the Internet, and it can therefore never be unlawful?5.

In the end, there is very little legal or ethical ground to stand on when arguing against deep linking. Unless there is a clear effort to mislead visitors through the use of deceptive images or language used in conjunction with the link there is no strong argument against providing links to any public page in a web site. There may be sound objections to using spiders and bots that ignore robots.txt files or meta-tags that tell them not index pages or directories on a server, but this is a small part of the deep linking issue and using it as an excuse to ban deep linking is akin to using spam as a reason to ban e-mail.

Deep linking is a generally accepted and approved part of the World Wide Web and it is vital to the continued health and interconnectivity of the Web. There may always be organizations like and the Church of Scientology that operate at a lower level of social understanding and affective development, who cannot cope with the free flow of publicly available information. Losing any bit of control over their ?domain? is more than they can cope with and leads them to ill advised lawsuits, which they have consistently lost, empty threats, and hypocritical actions like own deep linking.

With neither tradition, the law, or technology on the side of deep linking opponents attempts to stop deep linking will continue to fail and result in scorn in the general online community. Non-linking policies and linking agreements reflect a fundamental lack of understanding of the nature of the Web. Fortunately, most non-technical users of the Web can grasp this intuitively and judges have been able to understand the logic and reasoning behind allowing deep linking. So, for the time being, it appears that deep linking is safe and the Balkanization of the Web through the removal of links has been averted.

1 Commentary on Web Architecture Tim Berners-Lee, World Wide Web Consortium April 1997, Referenced March 10, 2004

2 The Balkanization of the Web David Siegel 1996, Referenced on March 26, 2004

3 Screenshot from Referenced on April 11, 2004

4 Deep Linking Lunacy Chris Sherman, July 9, 2002, Referenced April 2, 2004

5 Scientology loss keeps hyperlinks legal Matt Hines, CNET September 8, 2003, Referenced March 22, 2004

Kevin Hall
Latest posts by Kevin Hall (see all)