ReFS file system structure and data recovery algorithm

"ReFS" (Resilient File System) is a new file system from Microsoft that was created as a replacement for "NTFS". It has several solid advantages, namely, the developers have fixed all NTFS bugs. It is much more protected from information corruption, it can better withstand the increased load, and it also scales much more easily.



image



Main functions of the Resilient File System
, .

— Integrity streams ( ).

«allocate on write» — .

, , , .

, .

«data sriping» — , RAID .

— «disk scrubbing», .

.

, , , .

«NTFS».

.

«ReFS» , .

«Storage Spaces» — .

«NTFS»: «BitLocker», «USN» , «ACL» , «mount points»… , «ReFS», «API».


Features of "ReFS"







Checksums are now used for metadata by default, and they are also applied to data in individual files. So, in the process of reading / writing, verification is carried out "on the fly". When the file system detects file corruption, it will instantly delete the entries without restarting the computer. That is, "ReFS" now corrects itself on its own when errors appear.



"ReFS" provides a higher reliability of storing information, compared to the old FS. B + trees are used to store files and metadata. Sizes, number of partitions and files are now limited to the maximum 64-bit value. White space is stored in three different tables, broken down by chunk size (small, medium, large). File names and paths are written in "Unicode", they should not exceed 32 kilobytes, that is, the file name can be specified in 30 thousand characters.



Power outage protection. Let's say you write a new file name (or other metadata), the electricity went out and you did not have time to save them. In "NTFS" - the file will be damaged because you change the metadata directly. But "ReFS" only makes a copy of the metadata, and does not change the main ones until the saving occurs, the peculiarity of the "Copy-on-write" function.



Storage Spaces is a media virtualization feature. It allows you to create a single space from several physical disks on one PC or several over a local network. It is also possible to configure "mirroring" as RAID arrays.



Differences from NTFS



ReFS was originally designed to support large volumes of partitions, files, directories and their names. The new FS can contain up to two hundred and sixty-two thousand exabytes of information, and "NTFS" - only sixteen exabytes.



It also lacks encryption, compression, deduplication, disk quotas, hard links, and extended attributes. Some of them have been replaced with new ones, for example, "ReFS" fully supports BitLocker encryption.



Now, in the "ReFS" file system, you can only format the disk pool (storage space), where the new FS will show itself in all its glory. But Windows 10 won't let you format regular media to "ReFS". The developers emphasize the importance of "ReFS" specifically for servers, it is available on server OS or in the "LTSC" version.



Windows Server 2016 will allow formatting regular volumes to "ReFS", but will not allow formatting the boot disk, because the boot sector must be on an "NTFS" partition.



Filesystem architecture



ReFS's structures are significantly different from all other Windows file systems. The main building blocks are B + trees. They are single-level (like leaves) and multi-level (like trees). This provides good scaling for each element included in the FS structure. This scheme, as well as the 64-bit addressing of each element, makes it impossible to problems with further increase.



image



As the root record of the B + tree, the rest of the records have the same size, 16 kb, for the metadata block. Size 60 bytes - allocated for intermediate (address) nodes. Therefore, a small number of tiers will be required to properly describe large-scale storage structures. This made it possible to increase the performance of FS, in comparison with others.



ReFS file system structure



"ReFS" can be identified by a specific signature located at the beginning of the section:



image



0x4000 bytes - the length of all ReFS pages.



image



The first page number is 0x1e, that is, 0x78000 bytes immediately following the boot partition. This is a standard Microsoft mapping that advises that the first metadata should be searched after a fixed offset.



Deleted data search algorithm









The data recovery utilities will perform a full scan of the "ReFS" formatted disk space using a signature-based analysis algorithm. By checking the disk block by block, they will find ready data sequences, identify them, and print the results. Since the API for working with disks for "ReFS" and "NTFS" are the same, the data recovery processes are extremely similar.



First, the "Volume Header" is determined, it contains the number of sectors per cluster and how much of a sector. The main version lies in the zero sector, and the copy is located in the last one. Next, "Superblock" is read, it is located in the 30th block and there are also 2 copies in the second and third blocks at the end. From it, links to the "checkpoint" and its copy are extracted, its latest current version is determined by the "Virtual Allocated Clock".



Checkpoint contains information about the main tables, then headers "Page Header" and blocks with pointers (Pointers) to the complete list of tables are read. Then the "Container Table" is searched for to obtain physical addresses from virtual ones, and a search is performed on the "Object ID Table" - all tables are found.



The utilities go down to level zero — that is, b-tree sheets — and read the file data. Since the search is carried out page by page, if there are failures, these elements are simply excluded from the analysis, and the scanning process itself proceeds further. Thus, data recovery utilities find all the information that it is possible to "get" from the disk.



See the source for the full article with all additional video tutorials .



All Articles