Jump to content

Talk:Record-oriented filesystem

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Guy Harris (talk | contribs) at 05:29, 26 September 2023 ("Advantages and costs" section needs work: I'm not certain the notion of a "record-oriented filesystem", as opposed to a record-oriented file access API, makes sense.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
WikiProject iconComputer science Unassessed
WikiProject iconThis article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
???This article has not yet received a rating on Wikipedia's content assessment scale.
???This article has not yet received a rating on the project's importance scale.
Things you can help WikiProject Computer science with:

I'm confused about this article, is it a FILESYSTEM (like FAT, ext2, ReiserFS) or a FILE-FORMAT (magic numbers, file extension) system? Improfane

A: It's not about a specific file system, but rather the whole class of filesystems that support record-oriented operation. The key point is that the system calls used to access files are designed to access records, rather than chunks of data read or written in application-specific formats. Most mainframe operating systems support a rich variety of record-oriented record formats. Most commonly, records are fixed in length within any given file, or a file may have variable-length records. Unlike the stream-oriented systems found on systems like Unix, PC-DOS, Windows, and Mac, the data in the file is accessed strictly in terms of records. Variable-length records are preceded by a (usually) binary byte-count, and may contain any coded bytes at all, both binary and characters. There is no concept of an "end of line" delimiter, such as a carriage-return character.

Some people, particularly Unix advocates, dismiss record-oriented file systems as being based on punched-card technology, and therefore presumably "old-fashioned." The Unix-like stream-oriented approach is modelled after another 19th century technology, that of the paper-tapes used by the printing telegraph, used to mechanize the transmission of telegrams. These started being used for computers in the form of Teletype machines used as inexpensive input devices by the mini-computers of the '60s and '70s.

For its part, the Hollerith punched card was at least originally conceived for computational purposes.

This article, it seems to me, was written by a Unix advocate who wished to diminish the advantages of record-oriented file access methods. It is clearly not NPOV. I plan to fix it, when I find time to address the matter properly.

--RussHolsclaw 04:23, 12 February 2006 (UTC)[reply]

I agree! Moreover, terminology: does an IBM mainframe OS even use a filesystem? You have VTOCs, catalogs, data sets, but file system? Never heard of that. Source needed. --Kubanczyk 09:05, 7 October 2007 (UTC)[reply]

IBM calls file systems Access Methods.


Paper tape as used for text message transmission actually contain individual records which are delimited by various control characters. Each line of text (aka record) is terminated be a carriage-return character (which sends the print head to the left) and a line-feed character which rolls the paper platen up a line in position for the next line.

A better example of a datastream used in punched tape is in a numerical controlled machine tools NC These use a stream of commands to define which cutting tool to use, the starting position, subsequent points along the cutting path and other control information.

A record oriented file has several advantages. After a program writes a collection of data as a record the program that reads that record has the understanding of that data as a collection. Although it is permitted to read only the beginning of a record, the next sequential read returns the next collection of data (record) that the writer intended to be grouped together. Another advantage is that the record has a length and there is no restriction on the bit patterns composing the data record, i.e. there is no delimiter character.

There is a cost associated with record oriented. The length definition takes up space. On a magnetic tape that definition takes the form of an inter-record gap. On a disk a meta data area must be allocated. This is minimal in a file where all the records are the same length. On a file composed of varying length records a maximum record length is defined to determine the size of the length metadata associated with each record.

DGerman (talk) 01:09, 7 February 2008 (UTC)[reply]

After adding all this information in the discussion page, I decided it best to basically rewrite the article. I have saved the original article if anyone wants it. It is also available in the wiki history. Tired now. In the future I may locate and include some references. DGerman (talk) 02:15, 7 February 2008 (UTC)[reply]

Too specific

While it is true that current IBM mainframe operating systems have record-oriented file systems that do not use delimitor characters, that is not universally true. Even IBM used record delimitors on the 14xx/7010, and RCA used them on several different product lines. Shmuel (Seymour J.) Metz (talk) 19:04, 1 June 2010 (UTC)[reply]

"Advantages and costs" section needs work

The first problem is that the section doesn't clearly indicate with what a record-oriented file system is being compared.

Is it being compared to a byte-stream file system such as those offered by UN*Xes and Windows, where the lowest-level file system operations are "read N bytes from the current location and advance the current location pointer by N bytes", "write N bytes to the current location and advance the current location pointer by N bytes", "move the current location pointer to byte N", "adjust the current location pointer by N bytes, N being positive or negative", and "move the current location pointer to the current end of the file and then adjust it by N bytes, N being positive or negative" (possibly with an additional operation to set the file size to a specified number of bytes)?

Is it being compared with a block-array file system, where the lowest-level file system operations are "read from N blocks starting from block M" and "write to N blocks starting from M", "block" here referring to some fixed physical block size, such as a "block" being a single disk sector? As I remember, the usual file system APIs of RSX-11M and VMS were record-oriented, but the layer offered by the file system code was more like a block array, with QIO calls to read from or write to a file, with, at least on VMS, user-mode code being required to go through RMS, but RMS, running in a more privileged mode, doing file I/O in response to requests by making those QIO calls?

Or is it being compared to other file system types, or to more than one file system type?

If it's being compared to byte-stream file systems (as used on most desktop/notebook computers, most smartphones and tablets, and a lot of servers), then note that a byte-stream file can be structured as a sequence of records, and there are frequently libraries for OSes with byte-stream files that do so (sometimes called, for example, "ISAM packages"), so it's not clear how some of the points apply.

A record oriented file has several advantages. After a program writes a collection of data as a record the program that reads that record has the understanding of those data as a collection.

What does it mean to "[have] the understanding of those data as a collection"? And, if the program that reads that record is doing so through a library that implements a record structure on a byte-stream file system, would that program also "[have] the understanding of those data as a collection"?

Often a file will contain several related records in sequence; after the program reads the beginning of the sequence, the next sequential read returns the next collection of data (record) that the writer intended to be grouped together.

That's the definition of a sequential read. Again, how is this different from a program using a record-oriented library for a byte-stream file?

Another advantage is that the record has a length and there is usually no restriction on the bit patterns composing the data record, i.e. there is no delimiter character.

Not all files in a byte-stream file system have delimiter characters. Text files typically do, but object file, executable image files, library files, database files, and many other file types do not. Many of them have structures in the file that are, in effect, records with a record length field in the record.

There is usually a cost associated with record oriented files. For fixed length records, some records may have unused space, while for variable length records the delimiter or length field takes up space. Variable length blocks may have overhead due to delimiters or length fields.

That would also apply to record structures atop a byte-oriented file system.

In addition, there is overhead imposed by the device. On a magnetic tape overhead typically takes the form of an inter-record gap.

That's a characteristic of a magnetic tape, not of a record-oriented file system; the only way to reduce that would be to accumulate many logical records in a physical record/block on the tape.

On a direct access device with fixed length sectors, there may be unused space in the last sector of a block.

That's true only if records aren't allowed to begin in the middle of a sector. Record-oriented file systems may choose to do so, so that they don't need to do the sort of buffering that byte-stream file systems do, but a library implementing records atop a byte-stream file system might also do so in order to avoid some of the buffering overhead.

On a direct access device with variable length physical records, that overhead typically takes the form of metadata and inter-record gaps.

True, although multiple logical records might be packed into a single physical record/block, just as on tape. This may be somewhat specific to S/360 and successors (and compatibles); minicomputers tended to use direct access devices with fixed block/sector sizes, as do personal computing devices and UN*X/Windows-based servers.

A major advantage of record-oriented file systems is that they abstract files kept on paper in earlier times. A record might contain data associated with a particular, e.g., building, contact, employee, part, venue.

Again, that's just a question of which software abstracts files; again, it's quite possible to implement records atop a byte-oriented - or block-array - file system.

A second motivator for the idea of record orientation is that it is in some sense the more natural orientation for persistent storage on a non-volatile but slow physical storage device. Most physical storage devices can communicate only in units of a block. Significant portions of modern operating system kernels and associated device drivers are devoted to hiding the naturally structured and delimited (and in some sense a block is just a physical record) nature of physical storage devices.

Operating system kernels, yes; the buffer caches of UN*Xes and Windows, and the per-open-file OS data structures that maintain the aforementioned current location pointer, do hide the block structure. However, given that records don't necessariy directly correspond to blocks, some code will have to hide the blocks, to some degree, from applications reading or writing records.

Associated device drivers, not really; they generally get "read from N sectors, starting at sector M of the disk" and "write to N sectors, starting at sector M of the disk" commands, with the - byte-stream, block-array, or record-oriented - file system code, some or all of which may be running in some privileged-mode section of the OS, translating block offsets within the file to physical sector numbers on the disk. (Or logical block numbers, if the disk itself maps logical block numbers to physical sector numbers; there may be further mapping with virtualized storage, etc..) Guy Harris (talk) 10:17, 23 September 2023 (UTC)[reply]

I've always read it as comparing record-oriented file systems with byte oriented file systems; both are abstractions from the underlying hardware.
The reference to overhead is generic and the reference to tape is a sepecific example; on other devices the overhead takes other forms.
I believe that QIO in VMS is a low level interface used by higher levels of RMS, not an interface for normal applications.
It might be helpful to post a sepate section for each issue. — Preceding unsigned comment added by Chatul (talkcontribs) 03:23, 26 September 2023 (UTC)[reply]
S/3x0 and z/Architecture, and the (non-UN*X) OSes running on them, are special cases, given CKD drives, as I think the record structures were designed around those drives. (Although I have the impression that there are no physical CKD drives any more, just a CKD drive abstraction implemented by firmware/software atop fixed-block drives.)
For systems with fixed-block drives, including DEC systems and the hardware atop which UN*X systems and Windows run, there's a block-array file system abstraction atop which the byte-stream or record abstraction is built. Record-oriented file systems support certain forms of structuring of the data in those blocks, while byte-stream file systems can write arbitrarily-structured binary data to those blocks - that includes structuring as fixed-length records, variable-length records, variable-length records with fixed control fields, and indexed versions of those structures.
So I see the main difference being that OSes with byte-stream file APIs provide a lower-level abstraction atop which record-oriented files can be implemented, whereas OSes with record-oriented APIs don't provide that lower-level abstraction. It's not as if the latter systems can provide facilities that the former can't also provide. I.e., it's not a question of the on-disk file system layers being different, except to the extent that the on-disk file system may provide a way to associate a record format and, for fixed-length records, a record size with a file, even though the on-disk file system provides a block-array abstraction. Instead, it's a question of what file access APIs are available; the notion of "record-oriented" vs. "byte-stream" is really above the file system when the file system is viewed as providing an abstraction of a file as an array of bytes with metadata.
An OS with byte-stream oriented APIs may, or may not, provide a library that provides a record-oriented abstraction. A system with record-oriented APIs has the advantage that programmers who want to use record-structured files don't have to get a third-party library or write their own library, but "a system with record-oriented APIs" could be a system in which the lowest-level APIs available to programmers are byte-stream APIs and that includes a record-oriented library.
Yes, on VMS direct QIO access to files from user-mode code is, as far as I know, not allowed. On RSX-11M, a program can probably issue those QIOs (RSX-11M has to run on machines that have only kernel and user mode, and, as far as I know, didn't stuff RMS into supervisor mode if running on a machine with supervisor mode; RSX-11M Plus might have supported only machines with supervisor mode and may have put RMS in superviso-rmode code). However, I have the impression they're not recommended and not documented for use on files.
RMS does, according to VMS Software's RMS reference manual, support "block I/O", which appears to provide access to the block-array layer. However, if you do block writes to a file, RMS will mark the hints it maintains for the number of records in the file and the number of user data bytes in the file as being invalid. So VMS, at least, doesn't seem to offer a "pure" record-oriented abstraction.
So I'd says that "record-oriented" vs. "byte-stream" are, if you don't have hardware/firmware that enforces a certain organization of data more complicated than "fixed-size blocks", largely differences between APIs rather than between file systems, except to the extent thata file system provides metadata that implementations of record-oriented APIs might use.
For example, an ODS-5 VFS could probably be written for some UN*X, in which case byte streams would be read from or written to files. If the UN*X in question has an "extended attributes" API, it could allow reading and writing per-file metadata, and an RMS implementation could be written for that UN*X.
Similarly, it might be possible to write a VMS XQP or ACP (pluggable file system) that supported some UN*X file system that supported extended attributes, and RMS might be able to use that file system (UFS, HFS+, APFS, ZFS, ${pick_your_linux_file_system}, etc.) mostly or completely transparently. Guy Harris (talk) 05:28, 26 September 2023 (UTC)[reply]