Saturday 8 February 2014

FAT Chaining 2

In the previous post, we have seen the various Data Structures that the operating System uses in order to manage the files stored in the drives. Since, we are dealing with the system programming, it is important to know about Interrupts, Assembly Language, etc.

Anyways, if you don’t know that much about Interrupts, Assembly Language, the steps for developing the code are simpler to understand.

Let’s start with the practical implementation details.

Now, we want to read the file stored in Floppy Disk (FD). User will provide the name of the file through Command Line Interface (CLI).

Steps Involved:

1) Read the no. of sectors for:

i) FAT

ii) Root Directory (RD)

2) Find the RD entry for the given input file.

3) If RD entry found: Get the first cluster no. (Stored at 1AH offset in RD Entry) and read it. Also read FAT.

Else File Does not exist.

4) Now, read the remaining file by making use of FAT.

Now, we will elaborate these 4 steps in order to get into the rhythm.

Step 1:

Standard FD has predefined size of the Data Structures.

For instance,

Boot sector – 1 Sector

FAT – 9 Sectors

Copy of FAT – 9 Sectors

Root Directory – 14 Sectors

And then file area starts.

As these sizes are known previously, we are not going to retrieve them Boot Sector. Though, we can get the size of FAT from Number of sectors per FAT i.e. 16H offset & size of RD from Number of root-directory entries i.e. 11H offset of Boot Sector.

Note that: Size of RD = No. of RD Entries * 32

As size of each entry is 32 Bytes (See Fig. 3 of Previous Post).

Step 2:

User is going to provide the file name, which we have to search in the RD. For that purpose, we have to read the whole RD and then check first 11 Bytes (8 Bytes file name + 3 Bytes Extension) of each 32 Byte entries in it.

For reading RD, we are going to use INT 25H.

INT 25H (Absolute disk read)

AL = Drive Number (0 = A, 1 = B, etc.)

CX = Number of sectors to read

DX = Starting sector number

DS: BX = Segment: offset of buffer

Returns:

If function successful

Carry flag = clear

If function unsuccessful

Carry flag = set

AX = error code

MOV AL, 00

MOV BX, OFFSET BUFF

MOV CX, 14

MOV DX, 19

INT 25H

How does starting sector no. equal to 19?

RD comes after Boot Sector (1 Sector), FAT (9 Sectors), Copy of FAT (9 Sectors). So, 1+9+9 = 19 and from 19th sector onwards RD starts (as first sector is sector 0).

After the execution of INT 25H, we will get whole RD into buffer BUFF and we start checking first 11 Bytes of each RD entry.

A legal 32 Byte entry does not start with E5H or 00H. So, during file name check, we have to take care of this thing.

Step 3:

As soon as we find the RD entry for which we were looking, read the first cluster no. at an offset 1AH in that entry. Read and display the first cluster. Now FAT comes into the picture for displaying the remaining clusters of the file.

For reading 9 sectors of FAT, we make use of INT 25H as described in step 2. Only change will be in the values of CX, DX and use some other buffer.

Note:

MOV CX, 09

MOV DX, 01

MOV BX, BUFF1

One this should be noted here that we are going to make use of INT 21H and INT 25H only.

Let’s have a look about the 09H and 4CH functions of INT 21H

Function: 09H

It writes a $-terminated string to standard output.

DS must point to the string's segment, and DX must contain the string's offset:

For example:

.DATA

STRING BYTE “THIS IS A STRING$”

.CODE

MOV AH, 09H

MOV DX, OFFSET STRING

INT 21H

Function: 4CH

Ends the current process and returns an optional 8-bit code to the calling process.

A return code of 0 usually indicates successful completion. It is same as EXIT 0.

MOV AX, 4C00H

INT 21H

;Here AL = 00, After the execution register AL (AX = AH || AL) will contain Error Code if present.

Step 4:

This step is more interesting as compared to other steps. We got first cluster of the required file and now we have to find the next cluster no. in order to read the remaining file. If you have observed the nomenclature of the FAT at the time formatting the Pen-Drive, you may have seen FAT12, FAT16 or something like nowadays NTFS (New Technology File System).

Now, What does this FATX represents? What is the significance of X, here?

X represents X-bit wide cluster entries. Now let’s convert X into Bytes.

If FAT12 then X = 1.5 Bytes or If FAT16 then X = 2 Bytes, etc. One thing should be noted that Floppy Disk is FAT12 (i.e. 1.5 Byte).

Let F be the first cluster no. obtained from RD entry of the given input file. Now, we multiply F by 1.5 in order to get the next cluster no. Here, comes the tricky point:

Let after multiplication i.e. 1.5 * F, the four nibbles be wxyz (i.e. 1.5F = wxyz). If 1.5F is even, consider last three nibbles otherwise first three nibbles.

Suppose, 1.5F is even, consider xyz and add first nibble as 0. So, the next cluster no. becomes 0xyz. Now, read the location value from FAT present at 0xyz. From this location, we have to read the next sector of the file. In the similar fashion, we can read remaining file. At any instant of point, the next cluster no. equals to or greater than 0FFFH, stop i.e. file is completely read. Here, we are getting the location of the next sectors to be read from FAT in continuation. This continuation is referred as “Chaining”.

Note: 1 Cluster = 2^i Sector, i = 0,1,2……

Now, try to understand the below ASM code, which is written on the basis of the above discussion. I strongly suggest you people to read “The Microsoft(R) Guide for Assembly Language and C Programmers By Ray Duncan” for better understanding of System Programming.

print macro msg
    mov ah,09h
    lea dx,msg
    int 21h
    endm

.model tiny
.code
    org 100h
begin:
    jmp start
    msg1 db 10d,13d,"Error in reading the root directory:$"
    msg2 db 10d,13d,"Error in reading the sector:$"
    msg3 db 10d,13d,"Found:$"
    msg4 db 10d,13d,"Not Found:$"
    msg5 db 10d,13d,"Error in reading FAT:$"
    buff db 7168 dup('$')
    buff1 db 4608 dup('$')
    buff2 db 513 dup('$')
    filename db 12 dup('$')
    str_cls dw ?
    temp dw ?
    cnt db 224
    count db ?
    num dw ?

start:
    mov count,08
    mov si,80h
    mov cl,[si]
    dec cl
    mov si,82h
    lea di,filename
l1:
    mov al,[si]
    cmp al,'.'
    je l2
    mov [di],al
    inc si
    inc di
    dec count
    dec cl
    jmp l1

l2:
    cmp count,00
    jz l4
    inc si
    dec cl

l3:
    mov [di],20h
    inc di
    dec count
    jnz l3

l4:
    mov count,03h
   
l5:
    mov al,[si]
    mov [di],al
    inc si
    inc di
    dec count
    dec cl
    jnz l5

    cmp count,00
    jnz l6
    jmp l7            ;filename completed

l6:
    mov [di],20h
    inc di
    dec count
    jnz l6   
   
l7:
    mov al,00
    mov bx,offset buff
    mov cx,14
    mov dx,19
    int 25h
    jc error1
    add sp,02h

    lea si,buff
l8:
    mov al,[si]
    cmp al,0E5h
    je loop1
    cmp al,00
    je loop1
    jmp loop2

loop1:
    add si,0032
    dec cnt
    jnz l8

loop2:
    lea di,filename
    mov cx,11

l9:
    mov al,[si]
    cmp al,[di]
    je loop3
    add si,cx
    add si,21
    dec cnt
    jnz loop2
    jmp n_found
loop3:
    inc si
    inc di
    dec cl
    jnz l9
    jmp found

n_found:   
    print msg4
    jmp end1

found:
    print msg3
   
    sub si,11
    add si,1Ah
    mov bl,[si]
    inc si
    mov bh,[si]
    mov str_cls,bx        ;first cluster   

    mov al,00
    mov bx,offset buff1        ;buff1 contains FAT
    mov cx,09
    mov dx,01
    int 25h
    add sp,02
    jc error2
    mov ax,str_cls
    jmp en2

;FAT cluster reading

error1: jmp error11

en2:
    add ax,31
    mov dx,ax
    mov al,00
    mov cx,01
    mov bx,offset buff2
    int 25h
    jc error3
    add sp,02
   

    mov ah,09h
    lea dx,buff2        ;displaying cluster contents
    int 21h


    mov ax,str_cls
    mov bx,03
    mul bx
    mov bx,02    ;1.5 multiplication
    div bx
   
    lea si,buff1
    add si,ax
    mov bl,[si]
    inc si
    mov bh,[si]
    mov num,bx

    mov ax,str_cls
    and ax,01
    cmp ax,00
    je even1
    jmp odd1

even1:
    mov ax,num
    and ax,0fffh
    cmp ax,0fffh
    je end1
    mov str_cls,ax
    jmp en2


odd1:
    mov ax,num
    and ax,0fff0h
    mov cl,04
    shr ax,cl
    and ax,0fffh
    cmp ax,0fffh
    je end1
    mov str_cls,ax
    jmp en2


error11:
    print msg1
    jmp end1
error2:
    print msg5
    jmp end1
error3:
    print msg2
end1:
    mov ah,4ch
    int 21h
    end begin   

I hope you people like this post.

Please do write comments !!!!

Wednesday 5 February 2014

FAT Chaining

 

Most of the people will not understand the term ”FAT Chaining”. There are two things:

1) FAT

2) Chain

Yes, FAT is File Allocation Table and you will understand chaining at the end of this post. Basic aim of this post is to understand how a file gets retrieved when we click the icon or enter the name of the file on command prompt.

Suppose, we want to access a file present in a Floppy Disk. Now the question arises, Why are we retrieving the file from Floppy Disk and not from Hard Disk?

The reason is that we know the details of the Standard Data Structures of the Floppy Disk (FD). Here, data structures constitute to the following:

1) Boot Sector

2) File Allocation Table (FAT)

3) Root Directory

4) File Area

Now, we will not let ourselves only getting theoretical knowledge, instead we will develop Assembly Language Code for this. So, we divide this topic into two posts. This post will contain information about the above mentioned Data Structures and in the next post we will play with the Interrupts in order to develop the Assembly Level Code.

Now, let’s see how these things going to be used during file retrieval:

The whole area of the FD is divided among the above 4 Data Structures as shown below in Fig.1. Each MS-DOS logical volume is divided into several fixed-size control areas and a files area. The size of each control area depends on several factors like the size of the volume and the version of FORMAT used to initialize the volume, for example, but all of the information needed to interpret the structure of a particular logical volume can be found on the volume itself in the boot sector.  

image

 

1) Boot Sector:

Boot sector is known as the logical sector 0, contains all of the critical information regarding the disk medium's characteristics (Fig.2).

image

We are not going into much detail about these fields as they are self explanatory. From Boot Sector, we get all the physical information about the Drive. Let say, we have to find the size of the logical space of the drive. This can be calculated by making use of:

1) Total sectors in logical volume and

2) Bytes per sector

2) File Allocation Table:

Each file's entry in a directory contains the number of the first cluster assigned to that file, which is used as an entry point into the FAT. From the entry point on, each FAT slot contains the cluster number of the next cluster in the file, until a last-cluster mark is encountered.

At the computer manufacturer's option, MS-DOS can maintain two or more identical copies of the FAT on each volume. MS-DOS updates all copies simultaneously whenever files are extended or the directory is modified.

If access to a sector in a FAT fails due to a read error, MS-DOS tries the other copies until a successful disk read is obtained or all copies are exhausted. Thus, if one copy of the FAT becomes unreadable due to wear or a software accident, the other copies may still make it possible to salvage the files on the disk. As part of its procedure for checking the integrity of a disk, the CHKDSK program compares the multiple copies (usually two) of the FAT to make sure they are all readable and consistent.

3) Root Directory:

Following the file allocation tables is an area known as the root directory. The root directory contains 32-byte entries that describe files, other directories, and the optional volume label (Fig. 3).

An entry beginning with the byte value E5H is available for reuse; it represents a file or directory that has been erased. An entry beginning with a null (zero) byte is the logical end-of-directory; that entry and all subsequent entries have never been used.

image

 

4) Files Area:

The remainder of the volume after the root directory is known as the files area. MS-DOS views the sectors in this area as a pool of clusters, each containing one or more logical sectors, depending on the disk format. Each cluster has a corresponding entry in the FAT that describes its current use: available, reserved, assigned to a file, or unusable (because of defects in the medium). Because the first two fields of the FAT are reserved, the first cluster in the files area is assigned the number 2.

Try to follow the next post which will be more interesting.

Thanks a lot !!!