File management

From Wikiversity
Jump to navigation Jump to search

This lesson teaches the management of digital files and directories.

While there exists no ideal method of file management, this resource documents possibly helpful practices and inspiring ideas.

This lesson was created under the impression that the average computer and mobile phone user still struggles to keep track of files in the long term despite of all the tools at their disposal.

Directories[edit | edit source]

Make use of directories. Categorize files into logically structured directories to facilitate finding them later.

Some inexperienced computer users might have a habit of just throwing files into the root directory of their device such as a flash drive. With an increasing count of accumulating files, they would become increasingly difficult to find, especially multimedia content which can not simply be searched for text strings using tools like `grep` but only through metadata.

Create a mind map by asking yourself where you would look for these files whenever necessary.

Below are two example directory structures for video projects, where the project files of the first project are directly in the project folder, while those of the second are in a subfolder; a matter of preference.

  • Video projects
    • Example project 1
      • Assets
    • Example project 2
      • Project files
      • Assets

For more examples of directory structures, see the /directory structures sub page. Feel free to add examples to it.

For directories where a sufficiently descriptive directory name would be considered too long, consider creating a descriptive text file inside named description.txt, info.txt, or similar. Comments about specific files may be noted in a text file with .meta, .meta.txt, or a similar suffix appended to its name.

If files within the same category have wildly varying sizes, such as command line outputs, you might want to move files beyond a size threshold such as 1 MB or 5 MB into a separate folder whose name can be the same with an added "-large" suffix. This allows minimizing the size of compressed snapshots of the directory.

You don't need to spend much effort coming up with good names for files and folders, as you can change them later at any time if you happen to come up with a better idea.

Revision history[edit | edit source]

When spending much time on a project, write-up, email draft, etc., consider saving a revision history, by selecting "Save as" (also accessible through Ctrl+↑Shift+S keyboard shortcut on some software) and changing the file name through numbering or timestamping, or alternatively creating renamed copies externally using a file manager or command terminal. For possible file naming schemes and variations of numbering and time-stamping, see File naming.

This enables you to revert to an earlier revsion in case of error, prevents total loss in case of failed writes caused by a power outage or software crashing, and later facilitates comprehending the work progression, while only consuming marginial space compared to common data storage, and being efficiently compressible due to redundancy.

A new revision does not necessarily have to be created upon every saving, but whenever the changes since the last revision are major enough at your discretion. Optionally, a short comment summarizing the changes can be added to the file name.

For programming code, suffixes like -stable and -unstable may be added after the time stamp, for example: UserScriptName-revision-20210918010349-stable.js. Separating the number or time stamp allows convenient double-click selection in the file saving dialogue.

Revisions can optionally routinely be moved into a separate subfolder.

Dumping ground[edit | edit source]

For files and folders you are unsure where to put, consider creating a directory on your device named dump, sandbox, or similar.

You may wish to categorize those into text, compressed archives, drawings, or by whichever task, or dated folder names such as 2021-09-18.

When managing files and directories in a command line terminal, an item can temporarily be given a simple and short name such as 1 to facilitate typing in the commands, that will be changed shortly after. Examples are:

Output from multiple commands into one log file
$ ls [path] -alR >>1 # Write names and attributes of that folder's content into the file "1".
$ find [path] >>1 # Write a list of bare file paths into the file "1".
$ mv -n 1 [desired file name] # Rename "1" to desired file name. "-n" prevents accidental overwriting.

The above command could also be done in a single line with (ls [path] -alR; find [path]) >>[desired file name], but the former might be preferred to save time if the target directory contains a high number of files, as the former command starts immediately after pressing ⏎ Return (also known as "Enter"), or when one wishes to enter commands before thinking of an output file name, while the latter requires typing in the whole command first.

Moving files from multiple sources into a directory
$ mkdir 1 # Create a folder named "1".
$ mv -n *.mp4 *.mkv 1 # Move mp4 and mkv files into it.
$ mv -n 1 Videos-$(date +%Y-%m-%d) # Renaming folder to intended name.

Should it be necessary to print out the absolute (full) file path of a file for, for example, quick copying to the clipboard, use the command readlink -f [target file], and for information about the current mount point, use the findmnt -T . command.

Finished tasks[edit | edit source]

Files whose task has finished, such as a posted message that was initially drafted offline into a text file, can be moved into a subfolder of the current directory named done, or alternatively in one big shared folder for this purpose.

Storage types[edit | edit source]

Auxiliary data which is frequently accessed can reside on the operating system drive, on its own partition or a separate one. This especially applies to portable computers and operating systems installed on external media such as a USB flash drive.

Partitioning[edit | edit source]

User data may be stored on the same or a separate partition as the operating system.

The benefit of a separate partition for user data is that possible file system corruption on the operating system partition would not spill over to user data, though modern file systems such as NTFS and ext4 protect themselves from damage by journaling, which allows the file system driver to recover quickly after an unexpected termination of write access caused by an operating system crash or unexpected removal. Infrequently accessed files that may be necessary in near future can be stored there as well, or moved to a secondary drive if their size is significant.

Another benefit of a separate user data partition is the smaller backup size of the operating system partition and facilitated recovery in case of a malfunctioning operating system where other means of repair have failed or would be too difficult. Because operating systems are subject to corruption and can at worst become unbootable, it is good practice to back up their partition regularly into a disk image. A smaller operating system partition can be imaged more quickly and the routine induces less wear on the backup media. Should the operating system malfunction, it can be imaged and then restored more quickly from the functional previous disk image, with less work to merge desirable changes since the last backup.

Secondary and external storage[edit | edit source]

On desktop computers, large files can reside on a secondary large hard disk drive or solid state drive, the former of which costs less per space. For laptops, stationary hard drives located at home or portable external hard drives or solid state drives can be used. While working with a laptop as passenger in any moving vehicle such as a bus or tram, solid state memory is preferrable due to sturdiness, as hard disk drives do not like constant physical movement due to mechanical wear. If portable storage necessary, files can be stored on a constantly inserted memory card, which does not compromise ergonomy as they don't physically protrude, or at most a little. In addition, it can be occasionally removed when the data is needed on a different device. SD cards with 1 TB exist since at least 2017, though are expensive.[1]

A home server adds the convenient benefit of access from all devices at home, and even through the internet if set up by the user ("private cloud"), but typically has longer latency times (access delays) than physically attached storage, and lacks mass storage access that may be necessary for some programs to work properly. In such a case, the files would have to be downloaded first and worked with locally, and uploaded after finishing.

Incubate work on flash storage[edit | edit source]

If, for example, your computer has a setup with solid state drive for operating system and hard disk as expansion storage, a project may be worked on on the flash storage, and changes can be applied to the hard drive at the end of a day.

If the hard drive is set to spin with no or a long timeout such as an hour or more, this may not be as necessary, but for short timeouts, frequent spin-ups would cause mechanical wear and tear in addition to annoying delays.

Using flash memory can be particularly useful on battery-powered laptops for power efficiency, though mechanical hard drives are increasingly being usurped by solid state memory in laptops, mainly due to physical robustness, as hard drives on laptops have long served for cost saving, whereas solid state memory is becoming increasingly affordable since the mid-2010s, though external USB hard drives may be used on the go.

Hard drives' purposes[edit | edit source]

After purchasing a hard drive, choose whether to dedicate it to either auxiliary storage or archival.

Use as auxiliary storage such as for a workstation or server demands the drive be in constant operation, which wears it down over time, making it unsuitable for long-term archival.

Use as archival storage demands only sporadic (rare) operation to add or retrieve data, which induces far less mechanical wear.

Data retrieved from an archive drive should be copied over to auxiliary storage to avoid needless wear on the former whenever that data is recalled again in near future, as can be expected, and in order to further duplicate files that have proven to be useful, as archival is commonly done with an uncertainty of which files will be useful in future. A functional archive drive may later be repurposed for auxiliary storage after moving the archived data over to a new device.

Packing and archival[edit | edit source]

Perpetual streams of new files such as web downloads, photos and videos from digital cameras and cell phones, and screen captures can be packed by renaming their parent folder into a uniquely identifying name, such as with date stamp: Camera-2021-09-18, after which they can be moved to an archive drive at the next backup appointment. The folder's name may also contain a location, device type, and/or short description. If packed more than once on the same day, time stamps or part numbers can be added to the names, or the files can be merged into the same directory.

Alternatively to renaming on the source device, the directory with uniquely identifying name can be created on the archive drive first, and files can be moved there out of the source directory.

When to pack files is end users' decision, though it is recommended to do so before exhausting free space on the source device. Renaming is not as necessary if new folders are created automatically, like some digital camera firmwares do per 999 or 1000 pictures. All filled folders can be considered eligible for archival.

Individual files needed for a specific purpose such as an impending project can be copied or moved into a separate directory.

Write protection[edit | edit source]

Write protection may be desirable to defeat the fear of accidental modification of data when not desired by the user.

A simple way of achieving write protection in Linux-based operating systems is to mount or re-mount a device or partition as read-only with this command which requires superuser privileges: mount -o remount,ro [device or mountpoint].

If write-protection is not supported by the operating system, an SD card with write protection switch feature can be used. The switch relies on the SD card reader to obey it and deny writing access to the operating system. Some memory card readers, both built-in ones and USB adapters, might not obey the write protection switch.

Another way to achieve write protection is finalized write-once optical media or a read-only optical drive with insufficient laser beam power to write data, as described in § Sensitive environment.

File listing[edit | edit source]

Searches within file lists inside a text file are significantly faster than searches through a file system.

See this guide on how to create file lists.

Time stamp preservation[edit | edit source]

Some methods of file transfer, such as copying within/onto mobile phone storage, the cp command without activated -p ("preserve") option, and a directory on Unix/Linux not owned by the current user, might discard date and time stamp file attribute(s), resetting it to the current time.

To preserve last-modified time stamps over FTP, downloading is preferred, as uploading while preserving it requires both client and server to support the MDTM (Modify Fact: Modification Time) command, which it is not widely.

High numbers of small files[edit | edit source]

With an increasing number of files, file searches slow down. High numbers of small files also restrict portability, as they demand more file operations for file transfers, slowing the process down.[2] Additionally, higher cluster sizes in combination waste more space to cluster overhead (unused reserved space).

If you happen to have a high number of currently unneeded small files, such as tens of thousands, consider packing them into one big archive file for improved portability.

Compression may be considered where efficient, such as in human-readable text files and code, and/or where more necessary, such as online file sharing. Compression ratios of 100 may be achievable by strong compression algorithms on text documents and code. However, it should be taken into consideration that damage magnifies enormously over compressed archives, as demonstrated in Backup § Compressed archives. Therefore, it is recommended to store compressed archives on at least two devices.

Text inside compressed archives can be searched through directly without extracting using tools such as zgrep, xzgrep, bzgrep, and for 7-Zip, 7z e -so -bd [path] |grep [query].

Avoid exhausting the operating system's partition[edit | edit source]

Exhausted space storage should be avoided, especially on an operating system partition, as it could lead to bogus behaviour by software not designed to handle such condition, or other unwanted behaviour. For example, a failed write while saving could blank the target file, causing the loss of work and reset of configuration. A web browser might automatically delete early browsing history entries to make space for new.

On an operating system partition, keeping a safety margin of free storage such as 5% at any time is recommended, and at least 1% on secondary expansion storage. On archival media, a controlled exhaustion of space is less critical, though the readability of the final written files should be verified.

Disk usage analysis[edit | edit source]

Disk usage analyzers facilitate finding directories with the largest content size. Some illustrate the directory structure graphically.

Disk usage analyzers calculate the size of directories on any selected path, allowing the user to easily discover directories which occupy the most space. Large folders not currently needed can be moved over to an archive drive, which clears the most space on the source device.

Popular tools for desktop operating systems include Baobab for Linux (pre-installed on some popular distributions) and Xinorbis for Windows, both with sophisticated graphical user interfaces. Linux is also equipped with the command-line tool du, which allows outputting results directly into a text file.

For mobile (Android OS), ES File Explorer is equipped with such functionality, though that application has been subject to controversy and has developed into adware.

Mobile[edit | edit source]

Memory cards[edit | edit source]

Some smartphones and tablet computers allow the expansion of storage capacity using memory cards, typically MicroSD, which significantly facilitates file management and is user-friendly.

Memory cards can be re-used immediately between devices without need for file transferring, and data stored on the memory card is not at the risk of mobile devices' technical defect, as it can be ejected, after which data can be retrieved externally. Mass storage access from an external computer also may allow recovering some files imminently after a deletion accident caused by bogus software[3] and/or human error.

For huge file transfers, ejecting the memory card and directly transferring to the PC through mass storage may save time compared to MTP (media transfer protocol) through the phone or tablet, as the latter does not handle high counts of files within a directory well. Additionally, memory cards can immediately be reused in a different device without lengthy file transfers. USB-OTG (On The Go) may be used as well, connected through an adapter directly to the mobile device, though it might not preserve a date and time attribute. Tablet computers with desktop operating system are widely equipped with at least one default-sized USB-A port.

Additionally, using a memory card takes stress off the device's non-replaceable internal memory, preserving its limited rewrite cycles, which is especially beneficial for repeated heavy tasks such as high-resolution filming and mobile FTP server hosting.

Between computer and mobile[edit | edit source]

Media Transfer Protocol[edit | edit source]

File management on mobile phones and tablet computers with mobile operating systems is more restricted than on desktop/laptop computers and mass storage devices such as USB sticks and memory cards, as the media transfer protocol (MTP) used to access from a computer lists files slowly, which is problematic for loading directories with high counts of files.

As such, it is recommended to manage such directories on the device itself. If transfer between a desktop/laptop computer is desired, handle those files in a little to unpopulated directory.

MTP file listing can be sped up by not loading preview thumbnails. Depending on file manager used, this may be done by deactivating preview thumbnails in settings or choosing the "detail" view mode, where files appear in a list instead of a grid which foregoes the loading of preview thumbnails.

However, files can be dropped into a directory without opening it, by pasting it through the right-click context menu.

A benefit of MTP is it not being prone to file system corruption as a result of unexpected removal, meaning without being "safely unmounted" through the client operating system (i.e. desktop / laptop computer), as it operates through an abstraction layer and the file system is controlled by a driver on a battery-powered host device (i.e. smartphone / tablet).

Only a selection of files, no directories, should be moved away from the device, because users have reported files on MTP not being listed properly.[4] If a directory is moved away from the device, the computer might delete it from the mobile device without all content having been transferred away. Instead, it should be copied, and the byte size be compared on both the computer and the smartphone itself, where a match indicates a successful transfer, meaning the directory can now be deleted from the mobile device. The only exception where moving folders out is safe is when the number of files within is overseeable, i.e. less than ten, where all files are clearly listed in the computer's file manager.

Windows Explorer additionally displays files while listing is in progress, which can be of use when moving files out. The loading of the file list can be interrupted to allow moving out the displayed files chunk-wise, reducing the number of remaining ones each time.

Transferring files onto the device through MTP may dismiss their date and time attribute.

File Transfer Protocol[edit | edit source]

An alternative to MTP is FTP (file transfer protocol) through ethernet.

On the desktop computer, a dedicated and sophisticated FTP client such as FileZilla (open-source) may be used to handle high numbers of files, though FTP is widely supported by file managers and web browsers.

FileZilla does not support moving files out of an FTP server, meaning downloading and deleting automatically, whereas moving within a server is supported through the standard rename command. If the intention is moving files out of an FTP server, the highlighted selection of files on the server needs to be deleted after the transfer after verifying that all files have been transferred successfully, meaning no new entries in the "Failed transfers" list. To get a peace of mind that the selection was transferred successfully, try downloading it again while skipping existing local files. If no new files are downloaded, this ensures all files have already been transferred. This might apply to other software as well.

FTP server applications for mobile devices may handle file listing differently. Some do not report the year of the file, only day and month, causing the FTP client to insert the current year for files except those last modified at a later time of the year than currently, for which the previous year is inserted instead. Another distinction between FTP server apps is whether they list file and directory names starting with a dot, which is considered hidden in the Unixverse (i.e. on Unix and Linux-based operating systems, which includes Android OS, the most popular mobile operating system).

FTP server applications typically allow the user to select a specific directory to share, rather than the entire storage. This feature has been proposed for MTP, but never implemented there so far.

Two open-source FTP server apps for Android OS are the integrated FTP server of "Amaze File Manager", and the more sophisticated "primitive FTPd", only the latter of which reports files and folders with names starting with a dot.

Alternatively, files may be uploaded vice versa from the mobile device to an ethernet FTP site served by a home computer, though as of 2021, no mobile file manager's FTP client supports preserving files' date and time stamps upon uploading.

On-device management[edit | edit source]

Additionally, file access on the most popular mobile operating system has been restricted significantly over time, and to varying degrees per storage type (internal, memory card, and USB-OTG).

Such restrictions affect third-party applications installed by the user, including file managers. Pre-installed file managers are usually unaffected, though these tend to be functionally restricted, such as lacking range selection, where only two entries need to be tapped for all inbetween to be marked.

Options to deactivate these restrictions at user discretion were not officially provided, leaving so-called rooting as the only possibility of bypassing them. This is a process in which the user unlocks administrative access over the operating system.

The operating system vendor claims aforementioned file access restrictions to serve user security, though them being a cloud storage vendor as well suggests a commercial interest that conflicts with end users' desire of freedom, and simultaneously may encourage users to unlock root access, which is against vendors' recommendation, and where inexperienced tampering can lead to malfunction.

Other ideas[edit | edit source]

Archival queue[edit | edit source]

New files from portable devices which are currently unneeded can be moved into a buffer directory of files ready for archival, which means moving them to a large and stationary hard drive at the next connection to the computer.

External flash storage such as USB sticks and solid state drives can also be used to store data for, for example, the duration of a trip or vacation, where they can be moved to an archive hard drive when arriving at home.

Temporary redundant retention after archival[edit | edit source]

Files that have already been moved to a larger stationary archive drive may be redundantly kept on the smaller portable data storage such as a mobile phone or USB stick, but in a directory in which any file is eligible for deletion, such as at space exhaustion.

This would serve as a short-term backup, which could be retrieved from in case anything goes wrong with the archive drive prior to it getting backed up itself.

This increases file fragmentation on the portable device, though that does not noticably affect performance on flash storage.

Partition for small files[edit | edit source]

If your computer setup has no secondary drive and/or partition, you may create a small partition (e.g. 4 GB) with a low cluster size for more efficient storage of small files.

Additionally, if space storage happens to be exhausted on the main partition, with software arbitrarily attempting to write to it, files can still be added on the secondary partition without interference.

Sensitive environment[edit | edit source]

Inside sensitive environment, data may be exchanged through rewritable optical media such as DVD±RW and BD-RE, as these use external storage controllers, making the media itself unable to contain malicious hardware such as so-called rubber duckies used to simulate keystrokes from a USB keyboard.

Additionally, finalized write-once media and/or read-only (ROM) optical drives can ensure write protection where necessary, for example in a malware-infested environment.

Observations and tips[edit | edit source]

Spare directories[edit | edit source]

When space on a device or partition is exhausted, no new directories that could be helpful for organizing files can be created, such as for moving files from a highly populated directory (i.e. with many files, such as a download folder) on a mobile phone in order to skip having to open the populated folder directly through MTP (Media Transfer Protocol), which notoriously handles long file listings poorly.

Prepare for such a situation by creating a reserve of spare empty directories inside one dedicated directory. The spare directories can be moved out of the reserve, and be renamed as necessary, even without space left, which allows organizing files on the go to be able to move them elsewhere immediately when arriving home.

File move behaviour[edit | edit source]

When the aim is to bring files to an archive, moving files rather than copying and deleting afterwards has the convenience benefit of acting like a check list of files, instead of creating duplicates that would later have to be sorted out without accidentally deleting non-copied files, as well as imminently clearing (freeing) space on the source device.

Moving instead of copying and then deleting files also defeats the psychological barrier that may come from the deletion step, as it feels like a destructive action even though it leads to the same result as moving between storage devices. Another barrier is the uncertainty of having inadvertently selected any file not copied.

When moving files, some file managers may delete files individually after transfer, while others only delete selected files from source only after the last file has finished transfering.

Windows Explorer uses the former method for mass storage devices, but the latter with Media Transfer Protocol. The Linux file manager Nemo always uses the former method.

With the latter method, any interruption would cancel the file transfer without having freed up any space on the source device.

Escape auto-closure[edit | edit source]

Some file managers such as Windows Explorer close themselves when detecting the removal (unmounting and/or physical unplugging) of a storage device, while others jump to the starting directory. Some file managers might do nothing.

The first case may be perceived as annoying, as it forces users to re-open the file manager and navigate all the way back to the previous directory.[5]

This can be prevented by opening a different device in the file manager before unplugging. After plugging the device back in, the previously opened directory can be navigated back to immediately through the navigation history, using the on-screen button or Alt+ on the keyboard.

Using the command line[edit | edit source]

Command-line file operations can be logged for later reference by using the "verbose" switch --verbose or shorthand -v on the cp and mv commands in Linux and redirecting the output into a text file by appending >>path/to/textfile.txt. On Windows, file operations are outputted by default, meaning no "verbose" switch is necessary. On Linux, using | tee -a path/to/textfile.txt allows visibly printing out command line output in real-time (as usual) while logging into a text file simultaneously.

Spaces in file names[edit | edit source]

Keep in mind that space characters in file and path names can become a nuisance when entering paths, creating variables, and navigating and auto-completing in a command prompt/terminal using the ↹Tab key.

References[edit | edit source]

  1. SanDisk outs the 'world's first' 1TB SD card (2016-09-20)
  2. Related: Answer on Quora to "Why is copying 1,000 1MB files so much slower than copying 1 1GB file, given that the same amount of data is being copied?" by Franklin Veaux, on January 14th, 2020
  3. Users report Android devices' entire internal user storage being deleted instantly, caused by poor software design – "I just deleted a random folder in my internal storage and it wiped my internal storage. What the heck just happened?" – Reddit.com/r/Android (2013-01-11)
  4. Example: Not all files are visible over MTP – Android StackExchange
  5. How to stop Windows Explorer closing for removable Disks! – SevenForums

See also[edit | edit source]

External links[edit | edit source]