Web application file-reference methods
Faculty of Engineering and Technology → School of Computer Science → Database Department
Type classification: this is an article resource. |
Subject classification: this is a technology resource. |
This research project examines the different methods employed by web applications to refer, from within the database, to non-application files in the filesystem. We aim to determine what attributes (of database schema, filesystem, file, user, etc.) are used to determine the location of the files in the filesystem.
Introduction
[edit | edit source]Most database-backed web applications store non-application files, such as those uploaded by users, in the filesystem rather than as objects in the database.[1] These files relate to tables and rows in the databases in various ways, and there are many different ways of recording these relationships.
The 'non-application' files are those that are considered part of the application's data, rather than 'resource' files such as icons and other files used as part of, and distributed with, the application. Generally, these are files that are uploaded by users, and there will often be ancillary versions of them created by the application (such as, in the case of images for example, lower-resolution versions for quicker web display).
Here we investigate the methods used by a number of web applications to store file reference information in their databases. We
The web applications examined here all: use a database; and store attached files in the filesystem.
For each application, we aim to find out what attributes are stored about files, and how they are used when accessing files. Ultimately, we are trying to determine what information is required, given an arbitrary database schema, to determine whether files exist that relate to any given row, and where and how to access such files.
Applications
[edit | edit source]WordPress
[edit | edit source]WordPress is a MySQL-backed blogging platform in which uploaded files are saved to a user-configurable uploads directory.
Files in this directory can be organised into a number of different subdirectory structures (by default, named by year and month) under wp-content/uploads/
in the installation directory.
Files are accessed directly by the web server (and not read/written via the application).
Each uploaded file has a single row in the posts
table
(which may or may not be linked to an actual post row via the post_parent
key).
The post_type
column of this row has a value of 'attachment'.
Rows in the postmeta
table are linked to this row.
Where the meta_key
column of postmeta
is '_wp_attached_file', the meta_value
column is a filename, relative to the WP upload directory.
Table: | postmeta
|
---|---|
Column: | meta_value (only for those rows where meta_key equals '_wp_attached_file')
|
Directory: | Under wp-content/uploads
|
Format: | Above directory concatenated with the value of meta_value
|
Cardinality: | One file per row |
Drupal
[edit | edit source]Drupal is a general-purpose Content Management System, again using MySQL for its database.
For the purposes of this reseach, Drupal 7.4 was installed, using the 'minimal-7.4' installation profile.
File information is stored in the file_managed
table, with a custom-URI in the uri
column.
Files can be stored in either the public or private hierarchies, depending on the prefix of the uri
value:
public://
or private://
.
Table: | file_managed
|
---|---|
Column: | uri
|
Directory: | Under sites/{site_name}/{public or private directory} (these values must be retrieved from elsewhere in the DB)
|
Format: | Above directory concatenated with the value of the uri column after the prefixed custom quasi-protocol has been stripped; i.e. the captured section in the regular expression (public|private)://(.*)
|
Cardinality: | One file per row |
Bespoke Application 'A'
[edit | edit source]This is a bespoke database application built to manage personnel training records for a medium-sized organisation. The source code is not freely available, but the method used is explained here and intended to stand alone, as it could be taken as a hyphothetical example of how this type of file reference could be done and so there is no need to be able to examine the original source.
Table: | completed_qualifications
|
---|---|
Column: | id
|
Directory: | Under data/dbname/completed_qualifications/
|
Format: | Above directory concatenated with the value of the id column
|
Cardinality: | Multiple files per row; the above format results in a directory, all files in which are related to the row in question |
MediaWiki
[edit | edit source]MediaWiki keeps its files in directories whose names are constructed from the first two characters of the MD5 hash of the file's name.
Table: | image
|
---|---|
Column: | img_name
|
Directory: | Under images/
|
Format: | Above directory with two levels of subdirectories named from substrings of the MD5 hash of the img_name column: the first character, and then the first two characters, of the hash; within this lower directory resides the file, with the name given in image.img_name
|
Cardinality: | One file per row |
Conclusion
[edit | edit source]Given an arbitrary database schema and the absolute path to the highest directory in which files are stored, it is possible to construct an SQL statement that will provide the latter part of the path to the file or files.
References
[edit | edit source]This is a research project at Wikiversity. |
This resource includes primary and/or secondary research. Learn more about original research at Wikiversity. |