Backup

From Wikiversity
Jump to navigation Jump to search

In information technology, a backup, or data backup, or the process of backing up, refers to the copying into an archive file of computer data that is already in secondary storage—so that it may be used to restore the original after a data loss event. The verb form is "back up" (a phrasal verb), whereas the noun and adjective form is "backup".[1]

Backups are primarily to recover data after its loss from data deletion or corruption, and secondarily to recover data from an earlier time, based on a user-defined data retention policy.[3] Though backups represent a simple form of disaster recovery and should be part of any disaster recovery plan, backups by themselves should not be considered a complete disaster recovery plan (DRP). One reason for this is that not all backup systems are able to reconstitute a computer system or other complex configuration such as a computer cluster, active directory server, or database server by simply restoring data from a backup.[2]

Murphy's Law states that "if something can go wrong, it probably will". As such, any data one does not wish to lose should be stored on more than one device.

Types of data storage[edit | edit source]

Magnetic and optical[edit | edit source]

If supported by the optical drive model, software is able to measure the rate of minor errors on optical discs. The errors depicted in this graph are normal and well within a correctable range and have not corrupted any data.

Various types of data storage have benefits and disadvantages. As such, hard drives, optical discs and linear tape have the most reliability for archival storage. The latter two are not vulnerable to data loss from mechanical failure, due to their modularity, as their controller is external rather than tied to the component that holds the data. In case of motor or loading mechanism failure of an optical drive, inserting a dull needle into the pinhole can force opening the tray to retrieve the media. Slot-load drives may require opening should the loading mechanism fail.

Optical media is ideal for the sporadic low-maintenance long-term archival of moderately sized chunks of data such as video recordings from an important event, at capacities available as of 2021.

In addition, optical media can, if supported by the drive model, be scanned for impending integrity errors using software such as QpxTool and Nero DiscSpeed, before any data loss occurs. A higher rate of still correctible errors suggests sooner data corruption, and/or media of lower quality.

Hard drive failure is mechanical and sudden, whereas optical media deterioration ("disc rot") happens slowly over time. Optical media is best stored in a cold and dry environment, whereas hard disk drives are less sensitive to heat and humidity while not in operation.[3][4] However, humidity should be avoided during hard drives' operation, as it increases the likelihood of failure.[5] ard drive motors' lubrication slowly loses effectiveness as well.[6] Hard drives' magnetic fields fade slowly over time, which could lead to logical data errors before the hard drive technically fails. For prevention, data would have to be rewritten once every few years. Proneness to such increases with drives' information density. The magnetic signal on magnetic tapes also fades over time, though manufacturers proclaim a "shelf life" of several decades for such.[7]

On hard disk drives, a slight possibility of manufacturing error leading to rapid failure exists.[8]

Humidity in an enclosed storage location can be reduced using silica gel baglets.

Optical media is the most likely to survive environmental disasters such as a flood, ionizing radiation from nuclear disaster, or the marginal possibility of an electromagnetic/solar radiation storm,[9][10][11] as it is water-resistant and has no internal electrical components vulnerable to single effect events.

There exists optical media that is dedicated to archival, using silver, gold, or rock-like layers rather than organic dye, making it less sensitive to environmental conditions.[12]

Flash storage[edit | edit source]

Flash memory (solid state drives, USB sticks, memorybcards), while physically the most durable and usually fast, tend to be expensive per capacity, and might not be able to retain full data integrity for a long time (i.e. years), as the transistors which hold data lose charge over time. They are suitable for supplementary backups and short-term storage however. The main use of flash storage is for portable electronics and operating systems.

The retention duration tends to be shorter with increasing data density, and deteriorates throughout weardown caused by repeated program/erase cycles. Life expectancy (program/erase cycles) also deteriorates with data density. However, flash memory does not rely on moving mechanical parts that can fail like hard disk drives, hence the name "solid state drive". While USB sticks and memory cards are by definition also "solid state drives", the term is established to refer to larger-sized units for clarity. The lack of mechanical parts means that the flash memory does not wear down while idle unlike the spinning hard drive motor.

While powered on and idle, the flash storage's control firmware usually refreshes the information stored inside the sectors routinely.[13]

Loss of data integrity is indicated by downspiking transfer rates caused by the flash memory controller attempting to correct errors.

Imminent failure of flash storage can be caused by power surges and voltage spikes. In comparison, hard disk drive controllers can also fail by such, but data stored on the metal disks, meaning a costly recovery may be possible, and optical discs can simply be retrieved as described earlier.

As flash storage is typically not intended for archival, and a possibility of component failure exists, anything stored on such that one does not wish to lose should be mirrored to at least one other location. For example, a large hard disk drive can be purposed to store corresponding disk images of flash media that are refreshed once in a while. This is especially recommended when storing a collection of frequently needed files or even an operating system on a flash drive, where scraping the files together or installing an operating system took time and effort, as a disk image allows sequentially reinstating the data onto the same or a different flash drive in case of error.

Memory cards[edit | edit source]

Mobile phone and tablet PC users may back up files short-term onto the removable memory card as an insurance against technical defect which denies access to its non-removable internal storage.

This can be useful for photography and filming during a trip where cloud storage would be impractical due to possibly limited transfer rates and data plans unable to handle high-resolution imagery, and the protrusion of a flash drive connected through USB On-The-Go would compromise necessary ergonomy.

Cloud storage[edit | edit source]

Cloud storage is technically not controllable by the end user. Services might have varying retention spans and inactivity policies[14] and technical difficulties are not predictable to end users. Access requires internet connection, and transfer rates are limited by such. As with any online service, the slight possibility of erroneous account termination by a service provider exists as well. However, Cloud storage can act as a supplementary and short-term off-site backup, such as during vacation.

Practices[edit | edit source]

Preparation[edit | edit source]

In a risky environment where there is an increased likelihood of data loss, such as a vaction or trip with the possibility of losing equipment, a higher backup frequency such as daily is recommended, which can be done at the base (hotel, holiday apartment, etc.) onto a portable hard disk drive or solid state drive.

For dedicated cameras and camcorders, memory cards can be cycled through.

File system structure image[edit | edit source]

For users who momentarily lack space storage for backups, an image of merely the file system structure, which contains information about file names, paths, fragments and time attributes, can significantly facilitate later data recovery in case of damage. Without this information, any damage affecting the file system header could lead to files being orphaned and only detectable by forensic software through file headers and footers. Fragmented files would need to be puzzled together.

The file system structure (or header) is usually stored in the first 50 to 200 Megabytes, which can be captured using disk imaging software within seconds.

While such a backup does not contain file contents (except possibly those located at the earliest logical block addresses (LBAs) shortly after the file system header itself), it is a fallback solution which is better than nothing.

Other tips[edit | edit source]

  • Lost backups: If you are unable to physically locate a data storage device which contains a backup, act as if it were lost and consider just creating another backup from the source. The time and effort spared by foregoing to search for the lost device might be worth it. In case it is found again, one has one more redundant backup. If the redundant backup is not necessary, one of the storage devices can be re-purposed.
    • The same applies to individual files you are not sure where you have already backed them up to. Back them up again, and to prevent losing track of files, create logical structures of directories.
  • Distinct backup media: Backup media distinct from its source media prevents failure at a similar time from a common cause such as a manufacturing error.
  • Tracking manual backups: Manual backups of individual directories can be tracked using a text file such as lastbackup.txt or backupinfo.txt, containing the time of the last backup and optionally additional information such as the name of the backup device.
  • Reputable brands: Data storage not from reputable and trustworthy brands or pre-owned should not be relied upon in long term, only for temporary storage, testing purposes, and similar.
  • Proprietary formats: Software that stores backups in proprietary formats such as that of Acronis (.tib) should be used with caution due to the possibility of vendor lock-in.

Compressed archives[edit | edit source]

Compression ratio[edit | edit source]

Compressed archives may be used where efficient, such as for text documents and code, where strong compression formats such as 7-Zip (LZMA) and XZip could reduce size by a factor of 100 or more. Uncompressed bitmap images can achieve compression ratios of around 10; more or less depending on content.

Binary data such as multimedia (picture, video, audio) that has already been internally compressed can not be effectively shrunk by applying additional compression to it, as most redundancy has already been defeated by internal compression.

Magnification of damage[edit | edit source]

Damage on a PNG image from flipping a single bit

It should however be considered that the slightest damage to a compressed archive file could magnify enormously, possibly rendering the rest of an inside file or the entire archive useless. The scope of the damage depends on compression method, where solid compression causes the latter, as it deduplicates information across contained files to maximize compression effectiveness. As such, it is recommended to store compressed archives on no less than two devices.

If a separate device is unavailable, storing a duplicate of the archive file on the same device for redundancy could still allow repairing errors in uncommon locations by merging the intact parts using a byte editor (or "hex editor"), though errors are difficult to locate if the storage device controller's firmware returns damaged data to the computer without reporting it as such. Some flash storage devices may return sectors (512 bytes each) with damaged data as null bytes. In that case, a multiple of 512 sequential null bytes between binary data suggests a logical error in the file. On optical discs, a sector typically has 2048 bytes.

To get a visualized idea of how fragile compressed archives can be to damage, try opening a PNG or TIFF image inside a byte editor (or hex editor), and edit only few bytes somewhere near the middle, and then try the same on an uncompressed bitmap (BMP) for comparison. The PNG and TIFF images are to be completely demolished and glitched out from the damaged point, while effects on the uncompressed bitmap are only pinhole-sized.

This experiment might not be as effective on a JPEG image as on PNG and TIFF, as its compression algorithm is more robust against damage. It may cause some gitching and hue alterations on ordinary JPEG, and digital stains on progressive JPEG, but nothing that demolishes the entire image beyond repair. For reference, see the Commons: category: Bit-blending experiment.

Compression formats that have a weaker ratio, but in return require far less computing efforts, might use block-based storage, where information is stored in compressed blocks of a fixed size. Data blocks after a damaged one might be recoverable. A popular block-based compression format is Gzip; see Gzip § Damage recovery.

Efficiency[edit | edit source]

Human-readable text and code is highly compressible, and archive formats' internal file systems typically handle a high number of small files more efficiently than data storage file systems.

On FAT32/16/12 and exFAT for example, any non-empty file reserves at least one entire cluster (space allocation unit), which may be preformatted to around 16 to 256 KB, depending on total storage size. Too many small files cause space being wasted through cluster overhead, whereas archive formats handle many small files efficiently, even with compression deactivated.

However, digital photographs and video are internally compressed to a degree where additional compression through an archive format such as Zip, RAR, and 7z would not shrink the size effectively while significantly slowing down extraction speeds rather than allowing for direct playback, and making it vulnerable to damage.

Note that a program stream video (most mobile video) with an end-of-file moov atom depends on completeness or else be unplayable. This is not the case for transport stream video, frequently used by dedicated camcorders such as those of Panasonic and Sony (AVCHD).

Other areas[edit | edit source]

Web browsing[edit | edit source]

Sessions[edit | edit source]

Develop the habit of exporting your browsing session into a text file regularly, which can be done through a browser extension. The automatic session restoration might fail, and the session database might have a proprietary format only readable or decodeable through difficult-to-use tools rather than a simple text editor.

Some browser extensions allow exporting both page title and URL, facilitating later searching. Some extensions have an option to limit the export to only tabs of the current window, which could be of use if tabs in other windows have not been changed.

Some browser extensions have an internal session manager, while others export the session into a file in the download folder. Some might have both. Some extensions for this purpose have been discarded as a result of Mozilla Firefox, one of the most popular web browsers, deprecating support for extensions in a legacy format with the transition to version 57 "Quantum", though functionally similar surrogates are presumed to be released.

Web forms[edit | edit source]

Web form data may get lost as a result of browser crashes caused by RAM exhaustion, operating system crashes or power outages without uninterrupted power supply (external unit or laptop battery), or a failed form submission.[15]

Browser extensions such as Textarea Cache prevent loss of form data by backing it up automatically.[16]

Sites for which this is not wanted can be added to an exclusion list in the extension's settings.

Consider drafting any text that is presumed to take minutes or longer to write into an offline text file, from which it can be copied and pasted into a web form.

The same also applies to other areas such as to programs' internal text areas like the WinRAR archive comment field[17], since it may be discarded in case of an error or cancellation. Regarding WinRAR, using proprietary formats is not recommended anyway due to potential incompatibility throughout systems.

Browsing history[edit | edit source]

Web browsers might automatically delete history under certain conditions such as space storage exhaustion on the partition that stores the user data folder (or "profile folder"), or history from earlier than a time threshold such as three months. An update might unexpectedly change the retention duration, discarding history falling outside if shortened.[18]

If you wish to retain history beyond browsers' retention span, consider routinely creating copies of the history database. Firefox's live database file is named places.sqlite and located in the profile folder. For Google Chrome/Chromium, extensions exist that allow exporting it into the download folder.[19]

Note that on mobile phones, restrictions by operating system could deny file-level access to browsers' user data. Then, the only way to export browsing data is using a mobile browser which supports third-party extensions, such as the Chromium fork "Kiwi browser", which is compatible with many desktop Google Chrome extensions.[20]

Paper[edit | edit source]

Reliable long-term storage of relatively low amounts of data onto paper is possible using software such as PaperBak (open-source), which can encode digital data into printable machine-readable matrix bar code.

Paper books can not be duplicated as simply as digital computer files, but have supreme longevity and reliability due to immunity from technical failure. Abstractly, paper is a form of optical media.

Backing up paper books themselves may be desirable to protect against short-term losses such as forgetting to take them after use at a location such as school, university, or work place. Pages can be individually digitized in a high quality using optical flatbed scanners, though the process is slow and demands patience. It is faster and more conveniently done using a digital camera, preferably monuted to a tripod pointing down for a fixed position of the pages within the images, and in good lighting. Pairs of pages can be photographed individually, though filming while quickly skimming through the pages is the fastest method. The necessary resolution for clear readability depends on size and clarity of font, but at least 1080p is recommended, where each still frame of the video acts as a 2.1-Megapixel photograph, only with slight quality losses from video compression.

Commercial Backup products[edit | edit source]

Different commercial products provide Backup and recovery capabilities such as Commvault, Veritas NetBackup and Veritas Backup Exec, Veeam Backup & Replication, Arcserve.

Activities[edit | edit source]

  1. Read about w:NDMP (Network Data Management Protocol)
  2. Learn about RPO and RTO

See Also[edit | edit source]

References[edit | edit source]

  1. Wikipedia: Backup
  2. Wikipedia: Backup
  3. https://smallbusiness.chron.com/temperature-should-laptop-hard-drive-run-81401.html
  4. https://www.clir.org/pubs/reports/pub121/sec5/
  5. https://www.zdnet.com/article/heat-doesnt-kill-hard-drives-heres-what-does/
  6. http://www1.coe.neu.edu/~smuftu/docs/2011/ME5656_Term_Project%20Air%20lubrication%20in%20HDDs.PDF
  7. How to estimate the lifespan of LTO tapes – "30 years" (source)
  8. HDD Expert: Reliability of HDDs depends on manufacturing process – Anton Shilov, September 30, 2014, KitGuru
  9. Military warns EMP attack could wipe out America
  10. Could Solar Storms Destroy Civilization? Solar Flares & Coronal Mass Ejections – Kurzgesagt - in a nutshell (Video, 09m43s)
  11. Could an EMP attack be a part of the end times?
  12. "Verbatim Gold" (silver and gold) and "M-Disc" (rock-like)
  13. "Understanding Life Expectancy of Flash Storage". www.ni.com. 2020-07-23. Retrieved 2020-12-19.
  14. For example, Dropbox retains unused accounts for up to one year.
  15. Data from failed form submissions may be recoverable through a "core dump" generated from the browser's process, for which you may refer to this guide by Joey Adams.
  16. Textarea Cache for Firefox
  17. https://documentation.help/WinRAR/HELPArcComment.htm
  18. Benson, Ryan. "Archived History files removed from Chrome v37". Obsidian Forensics. Archived from the original on 2014-10-10.
  19. Export History/Bookmarks to JSON/CSV*/XLS* extension from the Google Chrome web store
  20. Features - Kiwi Browser