Operating system/File Systems
File System History
Basics and Terminology
"File Header to Block" Abstraction
"File Name to File Header" Abstraction
Caching, as a concept, simply refers to the usage of a faster (albeit usually smaller) storage medium to store copied data from a slower medium to allow faster access to data. The trade-off is a decreased reliability of data, although this disadvantage is eliminated when a good caching algorithm is implemented.
Although a hardware cache exists within and parallel to the processor in a computer system, in theory, any storage medium can serve as a cache for a slower storage medium. The hardware cache itself is slower than general purpose registers but faster than main memory. Thus, it serves as a cache for main memory. In cases when larger amounts of data from secondary storage needs to be cached, the main memory can be used.
A new type of caching came into play with the advent of the world wide web. A server PC (e.g. web server, FTP server) could now act as a "virtual" storage medium. In this case, a secondary storage medium, presumably providing faster access compared to TCP/IP, could now serve a cache for data acquired through a network (e.g. HTML and images in the case of a web server).
Consistency and coherency is an important consideration in the design of a cache system. The principle is to maintain consistent data across the storage hierarchy (from registers to network servers) in order to avoid erratic computations. This is a problem commonly noted, although not having a significant effect, in the case of web browsers. Oftentimes, you will need to refresh the page or clear the cache in order to see the most recent images on screen.
Transactions and Journaling Filesystems
File Structure, Disk Scheduling
File: a named collection of bytes stored on disk. From the OS’ standpoint, the file consists of a bunch of blocks stored on the device. Programmer may actually see a different interface (e.g., records), but this doesn’t matter to the file system (just pack bytes into disk blocks, unpack them again on reading). Common addressing patterns: • Sequential: information is processed in order, one piece after the other. This is a common mode: e.g. editor writes out new file, compiler compiles it, etc. • Random Access: can address any block in the file directly without passing through its predecessors. E.g. the data set for demand paging, databases. • Content based: search for blocks with particular values, e.g. hash table, associative database, dictionary. Not provided by all operating system.
Distributed File Systems
Distributed Systems Basics
RPC: Remote Procedure Call