From Wikiversity
Jump to navigation Jump to search

In information technology, a backup, or data backup, or the process of backing up, refers to the copying into an archive file of computer data that is already in secondary storage—so that it may be used to restore the original after a data loss event. The verb form is "back up" (a phrasal verb), whereas the noun and adjective form is "backup".[1]

Backups are primarily to recover data after its loss from data deletion or corruption, and secondarily to recover data from an earlier time, based on a user-defined data retention policy.[3] Though backups represent a simple form of disaster recovery and should be part of any disaster recovery plan, backups by themselves should not be considered a complete disaster recovery plan (DRP). One reason for this is that not all backup systems are able to reconstitute a computer system or other complex configuration such as a computer cluster, active directory server, or database server by simply restoring data from a backup.[2]

Types of data storage[edit | edit source]

Magnetic and optical[edit | edit source]

Various types of data storage have benefits and disadvantages. As such, hard drives, optical discs and linear tape have the most reliability for archival storage. The latter two are not vulnerable to data loss from mechanical failure, due to their modularity, as their controller is external rather than tied to the component that holds the data.

In addition, optical media can, if supported by the drive model, be scanned for impending integrity errors using software such as QpxTool and Nero DiscSpeed, before any data loss occurs. A higher rate of still correctible errors suggests sooner data corruption, and/or media of lower quality.

Flash storage[edit | edit source]

Flash memory (solid state drives, USB sticks, memorybcards), while physically the most robust and usually fast, tend to be expensive per capacity, and may not be able to retain full data integrity for a long time (i.e. years), as the transistors which hold data lose charge over time. The retention duration tends to be shorter for higher-density storage. Those are suitable for secondary backups and short-term storage however.

While powered on and idle, the flash storage's control firmware usually refreshes the information stored inside the sectors routinely.[3]

Loss of data integrity is indicated by downspiking transfer rates caused by the flash memory controller attempting to correct errors.

Memory cards[edit | edit source]

Mobile phone and tablet PC users may back up files short-term onto the removable memory card as an insurance against technical defect which denies access to its non-removable internal storage.

This can be useful for photography and filming during a trip where cloud storage would be impractical due to possibly limited transfer rates and data plans unable to handle high-resolution imagery, and the protrusion of a flash drive connected through USB On-The-Go would compromise necessary ergonomy.

Cloud storage[edit | edit source]

Cloud storage is technically not controllable by the end user. Services might have varying retention spans, technical difficulties are not predictable to end users, and access requires internet connection. As with any online service, the slight possibility of erroneous account termination by a service provider exists as well. However, Cloud storage can act as a secondary, temporary off-site backup, such as during vacation.

Practices[edit | edit source]

Preparation[edit | edit source]

In a risky environment where there is an increased likelihood of data loss, such as a vaction or trip with the possibility of losing equipment, a higher backup frequency such as daily is recommended, which can be done at the base (hotel, holiday apartment, etc.) onto a portable hard disk drive or solid state drive.

For dedicated cameras and camcorders, memory cards can be cycled through.

File system structure image[edit | edit source]

For users who momentarily lack space storage for backups, an image of merely the file system structure, which contains information about file names, paths, fragments and time attributes, can significantly facilitate later data recovery in case of damage. Without this information, any damage affecting the file system header could lead to files being orphaned and only detectable by forensic software through file headers and footers. Fragmented files would need to be puzzled together.

The file system structure (or header) is usually stored in the first 50 to 200 Megabytes, which can be captured using disk imaging software within seconds.

While such a backup does not contain file contents (except possibly those located at the earliest logical block addresses (LBAs) shortly after the file system header itself), it is a fallback solution which is better than nothing.

Compressed archives[edit | edit source]

Compressed archives may be used where efficient, but with the knowledge that the slightest damage to the archive file could magnify enormously, possibly rendering the rest of an inside file or the entire archive (depending on compression method) useless.

Magnification of damage[edit | edit source]

Damage on a PNG image from flipping a single bit

To get a visualized idea of how fragile compressed archives can be to damage, try opening a PNG or TIFF image inside a byte editor (or hex editor), and edit only few bytes somewhere near the middle, and then try the same on an uncompressed bitmap (BMP) for comparison. The PNG and TIFF images are to be completely demolished and glitched out from the damaged point, while effects on the uncompressed bitmap are only pinhole-sized.

This experiment might not be as effective on a JPEG image as on PNG and TIFF, as its compression algorithm is more robust against damage. It may cause some gitching and hue alterations on ordinary JPEG, and digital stains on progressive JPEG, but nothing that demolishes the entire image beyond repair. For reference, see the Commons: category: Bit-blending experiment.

Efficiency[edit | edit source]

Human-readable text and code is highly compressible, and archive formats' internal file systems typically handle a high number of small files more efficiently than data storage file systems.

On FAT32/16/12 and exFAT for example, any non-empty file reserves at least one entire cluster (space allocation unit), which may be preformatted to around 16 to 256 KB, depending on total storage size. Too many small files cause space being wasted through cluster overhead, whereas archive formats handle many small files efficiently, even with compression deactivated.

However, digital photographs and video are internally compressed to a degree where additional compression in an archive format such as Zip, RAR, and 7z would not make much of a difference, while significantly slowing down extraction speeds rather than allowing for direct playback, and making it vulnerable to damage.

Note that a program stream video (most mobile video) with an end-of-file moov atom depends on completeness or else be unplayable. This is not the case for transport stream video, frequently used by dedicated camcorders such as those of Panasonic and Sony (AVCHD).

Other areas[edit | edit source]

Web browsing[edit | edit source]

Develop the habit of exporting your browsing session into a text file regularly. The automatic session restoration might fail, and the session database might have a proprietary format only readable or decodeable through difficult-to-use tools rather than a simple text editor.

Commercial Backup products[edit | edit source]

Different commercial products provide Backup and recovery capabilities such as Commvault, Veritas NetBackup and Veritas Backup Exec, Veeam Backup & Replication, Arcserve.

Activities[edit | edit source]

  1. Read about w:NDMP (Network Data Management Protocol)
  2. Learn about RPO and RTO

See Also[edit | edit source]

References[edit | edit source]

  1. Wikipedia: Backup
  2. Wikipedia: Backup
  3. "Understanding Life Expectancy of Flash Storage". 2020-07-23. Retrieved 2020-12-19.