Abstract
We analyze the persistence of information on the web, looking
at the percentage of invalid URLs contained in academic articles
within the CiteSeer (ResearchIndex) database. The number of
URLs contained in the papers has increased from an average of
0.06 in 1993 to 1.6 in 1999. We found that a significant percentage
of URLs are now invalid, ranging from 23% for 1999 articles, to
53% for 1994. We also found that for almost all of the invalid
URLs, it was possible to locate the information (or highly related
information) in an alternate location, primarily with the use of
search engines. However, the ability to relocate missing information
varied according to search experience and effort expended.
Citation practices suggest that more information may be lost in the
future unless these practices are improved. We discuss persistent
URL standards and their usage, and give recommendations for
citing URLs in research articles as well as for finding the new
location of invalid URLs.
Original language | English |
---|---|
Title of host publication | CIKM 2000 |
Publication date | 2000 |
Publication status | Published - 2000 |
Event | Ninth International Conference on Information and Knowledge Management - McLean, United States Duration: 6 Nov 2000 → 11 Nov 2000 |
Conference
Conference | Ninth International Conference on Information and Knowledge Management |
---|---|
Country/Territory | United States |
City | McLean |
Period | 06/11/2000 → 11/11/2000 |