Effectiveness of caching in a distributed digital library system

J. Hollmann, Anders Ardø, P. Stenstrom

    Research output: Contribution to journalJournal articleResearchpeer-review

    Abstract

    Today independent publishers are offering digital libraries with fulltext archives. In an attempt to provide a single user-interface to a large set of archives, the studied Article-Database-Service offers a consolidated interface to a geographically distributed set of archives. While this approach offers a tremendous functional advantage to a user, the fulltext download delays caused by the network and queuing in servers make the user-perceived interactive performance poor. This paper studies how effective caching of articles at the client level can be achieved as well as at intermediate points as manifested by gateways that implement the interfaces to the many fulltext archives. A central research question in this approach is: What is the nature of locality in the user access stream to such a digital library? Based on access logs that drive the simulations, it is shown that client-side caching can result in a 20% hit rate. Even at the gateway level temporal locality is observable, but published replacement algorithms are unable to exploit this temporal locality. Additionally, spatial locality can be exploited by considering loading into cache all articles in an issue, volume, or journal, if a single article is accessed. But our experiments showed that improvement introduced a lot of overhead. Finally, it is shown that the reason for this cache behavior is the long time distance between re-accesses, which makes caching quite unfeasible.
    Original languageEnglish
    JournalJournal of Systems Architecture
    Volume53
    Issue number7
    Pages (from-to)403-416
    ISSN1383-7621
    DOIs
    Publication statusPublished - 2007

    Keywords

    • performance
    • H.3 information storage and retrieval
    • document caching
    • H.3.7 digital libraries

    Fingerprint Dive into the research topics of 'Effectiveness of caching in a distributed digital library system'. Together they form a unique fingerprint.

    Cite this