Tuesday, 18 October 2016

How Linux File Systems Ext2, Ext3, Ext4 work?



The ext(Extended File System) inspired by the traditional Unix File System (UFS) and was designed by Rémy Card, started the +2Gb file size limit era.

What is an Inode ?
The inode is a data structure used to represent a file system object, which can be one of various types including a file or a directory. Each inode stores the attributes and disk block location(s) of the filesystem object’s data. File system object attributes may include manipulation metadata, as well as owner and permission data.

Ext2 (Second Extended File System)
  Ext2 does not have journaling feature and for this is recommended for flash drives, usb drives, as it doesn’t need to do the over head of journaling also you can use the the noatime mount option, for the same reason.
  The Maximum individual file size can be from 16 GB to 2 TB and overall ext2 file system size can be from 2 TB to 32 TB.
   The space in ext2 is split up into blocks. These blocks are grouped into block groups, analogous to cylinder groups in the Unix File System. There are typically thousands of blocks on a large file system. Data for any given file is typically contained within a single block group where possible. This is done to minimize the number of disk seeks when reading large amounts of contiguous data.
   Each block group contains a copy of the superblock and block group descriptor table, and all block groups contain a block bitmap, an inode bitmap, an inode table and finally the actual data blocks.
Example of ext2 inode structure:


Ext3 (Third Extended File System)
ext3 allows journaling. Journaling has a dedicated area in the file system, where all the changes are tracked. When the system crashes, the possibility of file system corruption is less because of journaling.
The Maximum individual file size can be from 16 GB to 2 TB and overall ext2 file system size can be from 2 TB to 32 TB.(similar to ext2)
 Types of journaling available in ext3:
  • Journal        – Metadata and content are saved in the journal.
  • Ordered       – Only metadata is saved in the journal. Metadata are journaled only after writing the content to disk. This is the default.
  • Writeback   – Only metadata is saved in the journal. Metadata might be journaled either before or after the content is written to the disk.
You can convert a ext2 file system to ext3 file system directly (without backup/restore).
Drawbacks on ext3:
  • A directory can have at most 31998 subdirectories, because an inode can have at most 32000 links.
  • You cannot be fsck while the filesystem is mounted for writing.
  • ext3 does not support the recovery of deleted files.
  • ext3 does not have native support for snapshots(capture the state of the filesystem at a point in time).
  • ext3 does not do checksum when writing to the journal.

Ext4 (Fourth Extended File System)
The ext4 filesystem can support volumes with sizes up to 1 exbibyte (EiB) and files with sizes up to 16 tebibytes.
Extents are added in ext4. An extent is a range of contiguous physical blocks, improving large file performance and reducing fragmentation. A single extent in ext4 can map up to 128 MiB of contiguous space with a 4 KiB block size.
You can mount ext3 or ext2 file systems as ext4(Backward compatibility).
Ext4 can pre-allocate on-disk space for a file(applications like media streaming and databases make great use of this).
Journal checksumming is available now and improves journal reliability.
Improved  file system checking due to skipping unallocated block groups and sections.
Journal can be disabled.

No comments: