OPERATING SYSTEMS AND FILE SYSTEMS

Your operating system is responsible for managing the stored data on your computer's nonvolatile storage media so that your programs can treat the data in a consistent way. In other words, the operating system hides the hardware details that are vastly different between a floppy disk, hard disk, and CD drive, and presents a consistent interface called a file system.

A file system supports locating data stored on the hardware by a directory system and file name. It also keeps track of handy bookkeeping information such as the file size, the date the file was created, the last time it was modified, and which users have permission to read and/or write data to the file.

In turn, your programming language provides methods that bridge the gap between the operating system and the variables in your program.

Sequential File Reading

There are two fundamentally different ways of reading and writing files -- sequential access and random access. Sequential access means the opened file acts like a stream of bytes that has to be read in sequence.

Suppose you need to read the 10th line of text in a file named myfile.txt in the c:\mydocuments folder. The following is some pseudocode that's representative of all computing languages:

OPEN "c:\mydocuments\myfile.txt" AS #input
DO 
  READ #input INTO astring 
  IF EOF PRINT "hit end of file" BREAK
  INCREMENT linenumber
UNTIL linenumber EQUALS 10
IF NOT EOF use the astring variable
CLOSE #input

This pseudocode illustrates several important features that all programming languages use:

  • To read from a file, you have to open it first. The operating system ensures that you're allowed to open that file and causes a fatal error if you're not.
  • When a file is opened, the programmer uses a variable, #input in the example, to refer to the file. Using a special character to start the name of a variable that stands for a file is a common convention. The variable that refers to a file is sometimes called a file handle .
  • The command that reads the file specifies where to put the data. This example assumes the READ command reads a single line of text into a string variable. For this to work, the file must use a convention for special character or characters signaling the end of the line.
  • With sequential access you have to read and discard everything in the file up to the point where the data you want lives. In this example you read 10 lines into the astring variable with each successive line replacing the previous one.
  • When reading a file you must always consider the possibility that the file is not as long as you expect. You must provide for detecting the end of the file, commonly denoted as the EOF condition. In the example, hitting EOF before the 10th line causes the program to break out of the loop.
  • When you're done with the file, you have to close it. The reason being that holding a file open uses operating system resources and may prevent another program from using it.

In the pseudocode example, it's assumed that the READ command would read an entire text line into the string variable. That's actually a common convenience in programming languages because processing text files is a common problem. Opening a file and reading it in this manner is called working in the text mode . It assumes the data is text with each line terminated by one or more control characters.

Control Characters

By convention, certain characters known as control characters are used to indicate the end of the line. Early computers used a number of special control characters to control printers, ring bells on teletypes, and so on. The characters in common use for end-of-line indication are called CR (carriage return) and LF (line feed). Back in the days of teletypes, CR returned the print head to the start of the line, and LF advanced the paper one line.

Unfortunately, not every operating system uses the same combination. Windows systems use two characters -- CR and LF -- Macintosh uses just CR, and Unix and Linux systems use just LF. This difference creates an annoying requirement for a translation program when transferring files between systems. You may also see the term newline for the character or character sequence that terminates a line.

Reading in Binary Mode

Programming languages also provide a mode of file reading that doesn't try to find lines of text terminated by a newline marker. Instead, the exact contents of the file are delivered to your program without interpretation. In this binary mode, your read command has to designate an array of bytes to receive the data and explicitly say how many bytes to read.