Distributed file system
![]() | It has been suggested that this article be merged into Clustered file system. (Discuss) Proposed since January 2013. |
This article needs additional citations for verification. (December 2011) |
In computing, a distributed file system or network file system is any file system that allows access to files from multiple hosts sharing via a computer network.[1] This makes it possible for multiple users on multiple machines to share files and storage resources.
The client nodes do not have direct access to the underlying block storage but interact over the network using a protocol. This makes it possible to restrict access to the file system depending on access lists or capabilities on both the servers and the clients, depending on how the protocol is designed.
In contrast, in a shared disk file system all nodes have equal access to the block storage where the file system is located. On these systems the access control must reside on the client.
Distributed file systems may include facilities for transparent replication and fault tolerance. That is, when a limited number of nodes in a file system go offline, the system continues to work without any data loss.
The difference between a distributed file system and a distributed data store can be vague, but DFSes are generally geared towards use on local area networks.
History and examples
The first file servers were developed in the 1970s. In 1976 Digital Equipment Corporation created the File Access Listener (FAL), an implementation of the Data Access Protocol as part of DECnet Phase II which became the first widely used network file system. In 1985 Sun Microsystems created the file system called "Network File System" (NFS) which became the first widely used Internet Protocol based network file system. Other notable network file systems are Andrew File System (AFS), Apple Filing Protocol (AFP), NetWare Core Protocol (NCP), and Server Message Block (SMB) which is also known as Common Internet File System (CIFS).
Transparency
Transparency is usually built into distributed file systems so that files accessed over the network can be treated the same as files on local disk by programs and users. The multiplicity and dispersion of servers and storage devices are thus made invisible. It is up to the network file system to locate the files and to arrange for the transport of the data.
- Access transparency is that clients are unaware that files are distributed and can access them in the same way as local files are accessed.
- Location transparency A consistent name space exists encompassing local as well as remote files. The name of a file does not give its location.
- Concurrency transparency All clients have the same view of the state of the file system. This means that if one process is modifying a file, any other processes on the same system or remote systems that are accessing the files will see the modifications in a coherent manner.
- Failure transparency The client and client programs should operate correctly after a server failure.
- Heterogeneity File service should be provided across different hardware and operating system platforms.
- Scalability The file system should work well in small environments (1 machine, a dozen machines) and also scale gracefully to huge ones (hundreds through tens of thousands of systems).
- Replication transparency To support scalability, we may wish to replicate files across multiple servers. Clients should be unaware of this.
- Migration transparency Files should be able to move around without the client's knowledge.
Performance
A common performance measurement of a network file system is the amount of time needed to satisfy service requests. In conventional systems, this time consists of a disk-access time and a small amount of CPU-processing time. But in a network file system, a remote access has additional overhead due to the distributed structure. This includes the time to deliver the request to a server, the time to deliver the response to the client, and for each direction, a CPU overhead of running the communication protocol software. The performance of a network file system can be viewed as one dimension of its transparency; to be fully equivalent, it would need to be comparable to that of a local disk.
Concurrent file updates
Concurrency control becomes an issue when more than one person or client is accessing the same file and want to update it. Hence updates to the file from one client should not interfere with access and updates from other clients. Concurrency control or locking may either be built into the file system or provided by an add-on protocol.
See also
- Clustered file system
- Disk sharing
- Distributed data store
- Global file system
- Gopher (protocol)
- List of distributed file systems
- CacheFS
References
- ^ Silberschatz, Galvin (1994). Operating System concepts, chapter 17 Distributed file systems. Addison-Wesley Publishing Company. ISBN 0-201-59292-4.
External links
- A distributed file system for distributed conferencing system ("A DFS for the DCS") by Philip S Yeager, Thesis, University of Florida, 2003. (pdf)