F2FS

Last updated
F2FS
Developer(s) Samsung Electronics, Motorola Mobility, Huawei and Google
Full nameFlash-Friendly File System
Introducedv3.8, 2012-12-20 [1] with Linux
Structures
Directory contentsmulti-level hash table
File allocationbitmap (free space), table
BootableYes, starting from GRUB 2.04 (2019-07-05)
Limits
Max volume size16 TB
Max file size3.94 TB
Max no. of filesDepends on volume size
Max filename length255 bytes [2]
Features
Dates recordedmodification (mtime), attribute modification (ctime), access (atime)
Date resolution1 ns
AttributesPOSIX, extended attributes
File system
permissions
POSIX, ACL
Transparent
compression
LZO, LZ4 (since Linux 5.6), [3] zstd (since Linux 5.7) [4]
Transparent
encryption
Yes
Other
Supported
operating systems
Linux and Android
Website f2fs.wiki.kernel.org

F2FS (Flash-Friendly File System) is a flash file system initially developed by Samsung Electronics for the Linux kernel. [5]

Contents

The motive for F2FS was to build a file system that, from the start, takes into account the characteristics of NAND flash memory-based storage devices (such as solid-state disks, eMMC, and SD cards), which are widely used in computer systems ranging from mobile devices to servers.

F2FS was designed on a basis of a log-structured file system approach, which is adapted to newer forms of storage. Jaegeuk Kim, the principal F2FS author, has stated that it remedies some known issues [5] of the older log-structured file systems, such as the snowball effect of wandering trees and high cleaning overhead. In addition, since a NAND-based storage device shows different characteristics according to its internal geometry or flash memory management scheme (such as the Flash Translation Layer or FTL), it supports various parameters not only for configuring on-disk layout, but also for selecting allocation and cleaning algorithms.

Note, that by default F2FS uses "posix" fsync scheme, which carries higher risks of leaving the file system in dirty state during unclean shutdown (as it does not guarantee atomicity of write operations) at the benefit of better performance. There is a more stringent method that respects hardware limitations for greater security at the expense of performance; see the "fsync_mode" option in the manual for details. [6]

Features

Design

On-disk layout

F2FS divides the whole volume into a number of segments, each of which is fixed at 2 MB. A section is composed of consecutive segments, and a zone consists of a set of sections. By default, section and zone sizes are set to the same size, but users can easily modify the size with mkfs.

F2FS splits the entire volume into six areas, and all except the superblock area consist of multiple segments as described below.

Superblock (SB)
The SB is located at the beginning of the partition. There are two copies to avoid file-system corruption. It contains basic partition information and some default F2FS parameters.
Checkpoint (CP)
The CP contains file system information, bitmaps for valid NAT/SIT sets, orphan inode lists, and summary entries of current active segments.
Segment Information Table (SIT)
The SIT contains the valid block count and validity bitmap of all the Main Area blocks.
Node Address Table (NAT)
The NAT is an address table for the Main Area node blocks.
Segment Summary Area (SSA)
The SSA contains entries which contain the owner information of the Main Area data and node blocks.
Main Area
The main area contains file and directory data and their indices.

In order to avoid misalignment between file system and flash storage, F2FS aligns the start block address of the CP with the segment size. It also aligns the Main Area start block address with the zone size by reserving some segments in the SSA area.

Metadata structure

F2FS uses the checkpoint scheme to maintain file system integrity. At mount time, F2FS first tries to find the last valid checkpoint data by scanning the CP area. In order to reduce the scanning time, F2FS uses only two copies of the CP. One of them always indicates the last valid data, which is called a shadow copy mechanism. In addition to the CP, the NAT and SIT also use the shadow copy mechanism. For file system consistency, each CP points to which NAT and SIT copies are valid.

Index structure

The key data structure is the "node". Similar to traditional file structures, F2FS has three types of nodes: inode, direct node, indirect node. F2FS assigns 4 KB to an inode block which contains 923 data block indices, two direct node pointers, two indirect node pointers, and one double indirect node pointer as described below. A direct node block contains 1018 data block indices, and an indirect node block contains 1018 node block indices. Thus, one inode block (i.e., a file) covers:

4 KiB × (923 + 2×1018 + 2×10182 + 10183) = 4,228,213,756 KiB = 4,129,114.996 MiB = 4,032.338863 GiB = 3.937830921 TiB

Note that all the node blocks are mapped by the NAT, which means that the location of each node is translated by the NAT. To mitigate the wandering tree problem, F2FS is able to cut off the propagation of node updates caused by leaf data writes.

Directory structure

A directory entry (dentry) occupies 11 bytes, which consists of the following attributes.

A directory entry structure
hashHash value of the file name
ino Inode number
lenThe length of file name
typeFile type such as directory, symlink, etc.

A dentry block consists of 214 dentry slots and file names. A bitmap is used to represent whether each dentry is valid or not. A dentry block occupies 4 KB and has the following composition:

Dentry Block (4 K) = bitmap (27 bytes) + reserved (3 bytes) +                       dentries (11 * 214 bytes) + file name (8 * 214 bytes)

F2FS implements multi-level hash tables for the directory structure. Each level has a hash table with a dedicated number of hash buckets as shown below. Note that "A(2B)" means a bucket includes 2 data blocks.

Term
A indicates bucket
B indicates block
N indicates MAX_DIR_HASH_DEPTH
level #0    A(2B) level #1    A(2B) - A(2B) level #2    A(2B) - A(2B) - A(2B) - A(2B)     ... level #N/2  A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)     ... level #N    A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B) 

When F2FS finds a file name in a directory, first a hash value of the file name is calculated. Then, F2FS scans the hash table in level #0 to find the dentry consisting of the file name and its inode number. If not found, F2FS scans the next hash table in level #1. In this way, F2FS scans hash tables in each level incrementally from 1 to N. In each level F2FS needs to scan only one bucket determined by the following equation, which shows O(log(# of files)) complexity.

 bucket number to scan in level #n = (hash value) % (# of buckets in level #n)

In the case of file creation, F2FS finds empty consecutive slots that cover the file name. F2FS searches the empty slots in the hash tables of whole levels from 1 to N in the same way as the lookup operation.

Default block allocation

At runtime, F2FS manages six active logs inside the "Main Area:" Hot/Warm/Cold node and Hot/Warm/Cold data.

Block allocation policy
Hot nodeContains direct node blocks of directories.
Warm nodeContains direct node blocks except hot node blocks.
Cold nodeContains indirect node blocks.
Hot dataContains dentry blocks.
Warm dataContains data blocks except hot and cold data blocks.
Cold dataContains multimedia data or migrated data blocks.

LFS has two schemes for free space management: threaded log and copy-and-compaction. The copy-and-compaction scheme which is known as cleaning, is well-suited for devices showing very good sequential write performance, since free segments are served all the time for writing new data. However, it suffers from cleaning overhead during high utilization. Conversely, the threaded log scheme suffers from random writes, but no cleaning process is needed. F2FS adopts a hybrid scheme where the copy-and-compaction scheme is adopted by default, but the policy is dynamically changed to the threaded log scheme according to the file system status.

In order to align F2FS with underlying flash-based storage, F2FS allocates a segment in a unit of a section. F2FS expects the section size to be the same as the garbage collection unit size in FTL. With respect to the mapping granularity in FTL, F2FS allocates each section of the active logs to as many different zones as possible. FTL can write the active log data into one allocation unit according to its mapping granularity.

Cleaning process

F2FS does cleaning both on demand, and in the background. On-demand cleaning is triggered when there are not enough free segments to serve VFS calls. The background cleaner is executed by a kernel thread, and triggers the cleaning job when the system is idle.

F2FS supports two victim selection policies: greedy, and cost-benefit algorithms. In the greedy algorithm, F2FS selects a victim segment having the smallest number of valid blocks. In the cost-benefit algorithm, F2FS selects a victim segment according to the segment age and the number of valid blocks in order to address the log block thrashing problem present in the greedy algorithm. F2FS uses the greedy algorithm for on-demand cleaning, the background cleaner uses the cost-benefit algorithm.

In order to identify whether the data in the victim segment are valid or not, F2FS manages a bitmap. Each bit represents the validity of a block, and the bitmap is composed of a bit stream covering whole blocks in the Main Area.

Adoption

Phone manufacturers

Google first used F2FS in their Nexus 9 in 2014. [18] However Google's other products didn't adopt F2FS until the Pixel 3 when F2FS was updated with inline crypto hardware support. [19]

Huawei has used F2FS since the Huawei P9 in 2016. [20] [21] OnePlus has used F2FS in the OnePlus 3T. [22]

Motorola Mobility has used F2FS in their Moto G/E/X and Droid phones since 2012.

ZTE has used F2FS since the ZTE Axon 10 Pro in 2019. [23]

Linux distributions

F2FS has been merged into Linux kernel in late 2012. [24] Many distributions support it. [25] [26] [27]

See also

Related Research Articles

XFS is a high-performance 64-bit journaling file system created by Silicon Graphics, Inc (SGI) in 1993. It was the default file system in SGI's IRIX operating system starting with its version 5.3. XFS was ported to the Linux kernel in 2001; as of June 2014, XFS is supported by most Linux distributions; Red Hat Enterprise Linux uses it as its default file system.

ReiserFS is a general-purpose, journaling file system initially designed and implemented by a team at Namesys led by Hans Reiser and licensed under GPLv2. Introduced in version 2.4.1 of the Linux kernel, it was the first journaling file system to be included in the standard kernel. ReiserFS was the default file system in Novell's SUSE Linux Enterprise until Novell decided to move to ext3 for future releases on October 12, 2006.

ext2, or second extended file system, is a file system for the Linux kernel. It was initially designed by French software developer Rémy Card as a replacement for the extended file system (ext). Having been designed according to the same principles as the Berkeley Fast File System from BSD, it was the first commercial-grade filesystem for Linux.

ext3, or third extended filesystem, is a journaled file system that is commonly used by the Linux kernel. It used to be the default file system for many popular Linux distributions. Stephen Tweedie first revealed that he was working on extending ext2 in Journaling the Linux ext2fs Filesystem in a 1998 paper, and later in a February 1999 kernel mailing list posting. The filesystem was merged with the mainline Linux kernel in November 2001 from 2.4.15 onward. Its main advantage over ext2 is journaling, which improves reliability and eliminates the need to check the file system after an unclean shutdown. Its successor is ext4.

The Unix file system (UFS) is a family of file systems supported by many Unix and Unix-like operating systems. It is a distant descendant of the original filesystem used by Version 7 Unix.

The inode is a data structure in a Unix-style file system that describes a file-system object such as a file or a directory. Each inode stores the attributes and disk block locations of the object's data. File-system object attributes may include metadata, as well as owner and permission data.

In computing, the Global File System 2 or GFS2 is a shared-disk file system for Linux computer clusters. GFS2 allows all members of a cluster to have direct concurrent access to the same shared block storage, in contrast to distributed file systems which distribute data throughout the cluster. GFS2 can also be used as a local file system on a single computer.

The Minix file system is the native file system of the Minix operating system. It was written from scratch by Andrew S. Tanenbaum in the 1980s and aimed to replicate the structure of the Unix File System while omitting complex features, and was intended to be a teaching aid. It largely fell out of favour among Linux users by 1994 due to the popularity of other filesystems - most notably ext2 - and its lack of features, including limited partition sizes and filename length limits.

Lustre is a type of parallel distributed file system, generally used for large-scale cluster computing. The name Lustre is a portmanteau word derived from Linux and cluster. Lustre file system software is available under the GNU General Public License and provides high performance file systems for computer clusters ranging in size from small workgroup clusters to large-scale, multi-site systems. Since June 2005, Lustre has consistently been used by at least half of the top ten, and more than 60 of the top 100 fastest supercomputers in the world, including the world's No. 1 ranked TOP500 supercomputer in November 2022, Frontier, as well as previous top supercomputers such as Fugaku, Titan and Sequoia.

Journalling Flash File System version 2 or JFFS2 is a log-structured file system for use with flash memory devices. It is the successor to JFFS. JFFS2 has been included into the Linux kernel since September 23, 2001, when it was merged into the Linux kernel mainline as part of the kernel version 2.4.10 release. JFFS2 is also available for a few bootloaders, like Das U-Boot, Open Firmware, the eCos RTOS, the RTEMS RTOS, and the RedBoot. Most prominent usage of the JFFS2 comes from OpenWrt.

The Journaling Flash File System is a log-structured file system for use on NOR flash memory devices on the Linux operating system. It has been superseded by JFFS2.

NILFS or NILFS2 is a log-structured file system implementation for the Linux kernel. It was developed by Nippon Telegraph and Telephone Corporation (NTT) CyberSpace Laboratories and a community from all over the world. NILFS was released under the terms of the GNU General Public License (GPL).

The following tables compare general and technical information for a number of file systems.

ext4 is a journaling file system for Linux, developed as the successor to ext3.

exFAT is a file system introduced by Microsoft in 2006 and optimized for flash memory such as USB flash drives and SD cards. exFAT was proprietary until 28 August 2019, when Microsoft published its specification. Microsoft owns patents on several elements of its design.

LogFS is a Linux log-structured and scalable flash file system, intended for use on large devices of flash memory. It is written by Jörn Engel and in part sponsored by the CE Linux Forum.

Btrfs is a computer storage format that combines a file system based on the copy-on-write (COW) principle with a logical volume manager, developed together. It was founded by Chris Mason in 2007 for use in Linux, and since November 2013, the file system's on-disk format has been declared stable in the Linux kernel.

An HTree is a specialized tree data structure for directory indexing, similar to a B-tree. They are constant depth of either one or two levels, have a high fanout factor, use a hash of the filename, and do not require balancing. The HTree algorithm is distinguished from standard B-tree methods by its treatment of hash collisions, which may overflow across multiple leaf and index blocks. HTree indexes are used in the ext3 and ext4 Linux filesystems, and were incorporated into the Linux kernel around 2.5.40. HTree indexing improved the scalability of Linux ext2 based filesystems from a practical limit of a few thousand files, into the range of tens of millions of files per directory.

A concurrent hash-trie or Ctrie is a concurrent thread-safe lock-free implementation of a hash array mapped trie. It is used to implement the concurrent map abstraction. It has particularly scalable concurrent insert and remove operations and is memory-efficient. It is the first known concurrent data-structure that supports O(1), atomic, lock-free snapshots.

References

  1. Michael Larabel (2012-12-22). "F2FS File-System Merged Into Linux 3.8 Kernel". Phoronix . Retrieved 2016-05-25.
  2. Jaegeuk Kim (2013-03-18). "f2fs: align f2fs maximum name length to linux based filesystem". GitHub . Retrieved 2023-05-16.
  3. 1 2 Michael Larabel (2019-12-23). "F2FS Data Compression Using LZO/LZ4 + Selective File Extension Handling To Land In 2020". Phoronix. Retrieved 2020-04-07.
  4. 1 2 Michael Larabel (2020-04-07). "F2FS Introduces Zstd Compression Support With The Linux 5.7 Kernel". Phoronix. Retrieved 2020-04-07.
  5. 1 2 Jaegeuk Kim (2012-10-05). "f2fs: introduce flash-friendly file system" . Retrieved 2016-05-25.
  6. f2fs: fix to force keeping write barrier for strict fsync mode.
  7. Jaegeuk Kim (2014-09-22). "f2fs: introduce FITRIM in f2fs_ioctl".
  8. Chao Yu (2015-10-26). "f2fs: support file defragment".
  9. Jaegeuk Kim (2013-08-26). "f2fs: add flags for inline xattrs".
  10. Huajun Li (2013-11-10). "f2fs: Enable f2fs support inline data".
  11. Chao Yu (2014-09-24). "f2fs: support inline dir".
  12. Jaegeuk Kim (2014-09-20). "f2fs-tools: release 1.4.0".
  13. Jaegeuk Kim (2014-09-25). "f2fs: support atomic_write feature for database".
  14. Jaegeuk Kim (2015-06-24). "f2fs updates for v4.2".
  15. Jaegeuk Kim (2016-04-25). "resize.f2fs: support to expand partition size".
  16. Chao Yu (2015-12-17). "f2fs: support data flush in background".
  17. Chao Yu (2015-01-25). "f2fs: enable rb-tree extent cache".
  18. Smith, Joshua Ho, Ryan. "The Google Nexus 9 Review". www.anandtech.com. Retrieved 2019-05-10.{{cite web}}: CS1 maint: multiple names: authors list (link)
  19. Frumusanu, Andrei (2018-11-02). "The Google Pixel 3 Review". www.anandtech.com. Retrieved 2019-05-11.
  20. Larabel, Michael (2018-12-28). "F2FS Gets More Fixes In Linux 4.21 With The File-System Now Supported By Google's Pixel". www.phoronix.com. Retrieved 2019-05-10.
  21. Humrick, Matt (2017-05-12). "Huawei P10 and P10 Plus". www.anandtech.com. Retrieved 2019-05-11.
  22. Chester, Brandon. "The OnePlus 3T Review". www.anandtech.com. Retrieved 2019-05-10.
  23. "ZTE Axon 10 Pro Officially Uncovered: The First To Use F2FS". Gizchina.com. 2019-05-06. Retrieved 2019-05-10.
  24. "Pull new F2FS filesystem from Jaegeuk Kim commit". git.kernel.org.
  25. "Arch Linux Wiki". wiki.archlinux.org. Retrieved 27 June 2021.
  26. "Debian Wiki". wiki.debian.org. Retrieved 27 June 2021.
  27. "Gentoo Wiki". wiki.gentoo.org. Retrieved 27 June 2021.