User:John Bessa/Systems

From Wikiversity
Jump to: navigation, search

This describes my systems career from 1989 to 2002, when it was abruptly halted by economic meddling.

Operating System (OS)[edit]

SunOS[edit]

The SunOS operating system was developed by engineers who had graduated from the University of California at Berkeley and Stanford University. Together they formed the Sun Microsystems Corporation. To build the SunOS operating system, they had obtained the operations code from ATT, which had been mandated by the government to share the operating system with educational institutions. From it they created the first truly innovative Internet communication systems software, known as sockets. The bulk of the innovation was done at Berkeley; yet the SunOS operating system, and the corporation that they created, Sun Microsystems, had been named for the Stanford University network, whose wealthy students had access to investment capital.

The SunOS was and extension of the Berkeley release of ATT Unix, today it thrives as a lesser to Linux, the NetBSD system.

When I started working in financial technology, SunOS, running on the Sun workstations, provided the platform where most of the Internet development occurred. Companies, such as Goldman Sacs, literally invested billions of dollars a year in contractor fees building trading systems that were based on the Internet Protocols such as TCP/IP. They, and others, helped propel the Open Systems methods of computer communication by creating sophisticated interconnected operations from the original Berkeley developed operating system and its sockets networking software.


Solaris[edit]

In the late 1980’s many Unix vendors had embraced the original ATT Unix system in such a way that it surpassed the operation capabilities of the Sun Microsystems's SunOS. The SunOS computers where more suited to the workstation class of computer, but the Unix operating system was now starting take large pieces of large data processing operations, sometimes known as "big iron." Big iron operations are the huge mainframe systems whose operations occupied whole floors. While technically not a mainframe, a network of Unix computers can function with the same capabilities.


HPUX[edit]

Sun Microsystems made the change to the ATT architecture only after it realized that it needed to enter the big iron market. Because Sun Microsystems had converted to the ATT version late, and would therefore go through a period of flux and instability with its new operating system, I decided to migrate my career to a system that had matured with the ATT version of Unix all along. The system that I chose was Hewlett Packard's Unix, HPUX. Moving to HPUX was also a lucrative move for me as it was commonly chosen for well-funded big iron operations, where administrators could expect better salaries.

It also presented gratifying challenges. Because it was a corporate oriented system and, almost void of imaginative innovation. My ability to bring cutting edge applications from the forward-thinking public domain to HPUX made me a valued administrator. Important was my adaptation of the free and openly available GNU C language compiler to HPUX for the Merck pharmaceutical company.


Linux[edit]

Linux is central to the concepts of freedom in computing; its lead developer has copyrighted it in such a way that Linux code is effectively in the public domain. In 1992, its development scheme was presented to the world in such a way that many talented software developers joined the Linux effort and dedicated themselves to its success. Today it is technically robust and at least half the world wide web services use it. Linux’s success is important in understanding changes in the Information Society since the operating system was first introduced in the early 1990's. A culture and community of mutualist programmers and designers now produce the highest quality computer products available. They use a paradigm that goes beyond traditional economic principles; theirs is a mutualist effort, where the majority of the work is done by volunteers, and a lesser portion is gifted by corporations.

Today, Linux contributes to science as the operating system of choice for the very huge super computers, which are actually vast and closely-knit networks of the familiar personal computer. Best known of these huge systems are the Google search engine and the government weather agency's weather prediction system. A parallel contribution may emerge where mutualist groups form to take on some of the most critical challenges to society, such as making medicine in the same way mutualist, or e-mutualist, groups created the Linux system and significant software systems.

I was able to accelerate my career by using Linux at home to develop system administration applications. Because Linux is naturally applicable as a study project for computer clubs, I used it as the basis for my mentoring organization, the Linux Society. The Linux Society gave talented high school students the opportunity to work in a computer environment similar to the ones I had experienced throughout my career. Today the Linux Society exists online, and members are active in developing social purposes for the Internet.


Software Systems[edit]

Object Oriented[edit]

In the mid-90’s at the New York Stock Exchange, I had been given the task of developing a system to take a snapshot of all the specific settings of the trading sessions of specialists on the exchange floor. In the event of a computer systems outage all these settings could be applied to the recovery process so that all the specialists would be brought back to the exact same screens they had before the outage.

The amount of detail that was required to achieve this level of recovery accumulated to the point where simple strings of stored data could no longer contain the information. I discovered that the data was grouped into categories, from which I could create sub categories. Using the Perl language I created a system that used back slashes, "\", to show nesting categories and sub categories within a single string of characters. The level of nesting (or depth) of the data was determined by the number of backslashes in front of the data.

Shortly thereafter, the Object Oriented (OO) version of the Perl language came along, changing the landscape for me and other systems administrators. It offered a far better way of containing information requiring depth of categories; this concept is called complex structures. With this I was able to extend my earlier work with nested information.

Complex structures allow data and associated variable names to be stored in highly ordered locations. The Perl OO language provided a simple syntax and gave facilities to retrieve, update,or delete data. The tree-like data stored within a Perl OO program can be printed to the screen and viewed, making debugging infinitely easier. At important points along the computational process, this data can be written to the disk either in readable text format or in more efficient binary format.

Beyond the complex structures, which are data objects, there are the other components of OO programming which use the complex structures to create small-ish programs called modules. can be plugged virtually anywhere into other pieces of code. This modules may contain procedural (old fashioned, non-object) code, or modular OO code. Modules can be installed within modules, and again within modules, to a nearly endless degree.

In the initial, procedural, pre-object oriented incarnation of programming languages, the module is called a library. In OO programming a module is much more alive; it is not just a repository of code and the modules themselves may very likely need to take a front seat role during the execution of a program. For this to happen, objects of code are allowed to communicate with each other by allowing one code object to set variables within another. As the code, often an object of code, loads modules of different kinds, its abilities and characteristics can change depending on the nature of the modules it uses.

These are the components of OO Perl that were most useful in administering large systems installations. Because systems management is more of a "nuts and bolts" discipline than applied computer science, there is a natural limit to the complexity of software written to increase systems integrity. Success often depends on a cognizance of the abilities of co-workers. A simplistic approach works best, gleaning from object oriented technology only its most useful features; Perl has been developed with this approach.


Decision Support in Information Data Centers[edit]

During the mid-to-late 1990’s, a major buzzword was the term "metric," a well-defined and descriptive characteristic which is given a value. I saw metrics used in economic analysis as well because I was usually working for financial concerns. In computing, a simple example of a metric can be the percentage of use of the basic brain of a computer, the CPU (central processing unit). A high percentage of CPU use would imply delays in the return of information; a low percentage implies the probability of higher performance. Many metrics are available for most computer systems including memory usage, disk usage, available space, and operating system configuration characteristics.

Hewlett Packard made available for its HPUX system a metric sensing tool that gives amazing detail to the perception of performance of a computer system, including the network communication components. This package included delay metrics: metrics that indicate the time taken for a process to occur, while other processes are waiting for its completion. These delays, or waiting periods, often have a one to one relationship with the experiences of the people using the systems for business operations. By creating generalized metrics from combinations of these specific metrical measurements, that is, by creating meta-metrics, data can be produced that accurately predicts when the users were feeling frustration from system delays. Technology staff using these metrics can know exactly when these users will make "help desk" calls. This way, the staff can initiate a timely repair or enhancement before the system performance becomes unacceptable; users never have to suffer frustration, and the operation remains supportive of the business effort.

The most useful tool for helping decision making in systems technology is the predictive graph. Graphs can have linear regression lines, fitted curves, and numerical indicators. A second set of curves and numeric values, called correctness indicators, determine the usefulness of the curves in predicting systems performance behavior. Metrical information is collected from remote computers and devices into a database to be made available for analysis. There are two ways to process these collected metrics. One is to supply the data to an administrator who is equipped with a statistical analysis tool. All desktop tools of this type have the ability to refine the data into accurate predictive curves. Another way is to analyze the data at the server where the data has been collected, and supply information to the administrators in the form of pre-made graphs and numerical charts.

The formulas that create the pre-made graphs often need to be modified as experience improves the techniques for using correction indicators. Since the Perl language is so widely used in research science, sophisticated statistical formulas are well supported by the Perl e-mutualist community.


When managers seek information to help budget the purchase of new equipment, it is exceeding unlikely that they will load a statistics package; they are more likely to seek a pre-made picture from for their web browser. In cases where higher-ups cannot ignore looming disaster, simplified and highly up to date graphic representations are desirable to help them make decisions.

In the technology support scheme, there are always large numbers of queued tasks requiring attention and remediation. Knowing exactly when a performance problem will affect productivity helps administrative groups schedule the efforts at keeping systems at maximal efficiency. Careful planning with respect to prioritization can avert crisis, especially in an administration group suffering from staff shortages.

In many cases, metrical analysis tools are really alert systems to indicate spurious problems. Quickly finding the root cause of the alert and a solution is aided by the graphical and numerical information. When problems are well known because they happen often, reactions can be automated through a process common to databases; this is called triggering.

Over the long term, meta studies of metrics can be used to find areas of general weakness in the over-all operation.


Database[edit]

By far the most common database is the SQL relational model; examples are Oracle and the public domain MySQL. They are built of collections of tables very similar to common spreadsheets. The SQL language queries these tables in many ways to produce or insert specific information values or sets of values. The system protects the tables much as an operating system protects files. Internally it controls users’ access to tables; externally, it controls access from the network with authentication security systems.

The history of the relational database is steeped in extreme math: tuple calculus, domain calculus, first order logic, and relational algebra. While high-level mathematics inspired the first relational databases, the SQL data manipulation language is deliberately designed to be very simple.

It has three basic modes: data manipulation library (DML), data definition language (DDL), and data control language (DCL).

In the DML, there are four "manipulation" commands: Select (find a row), Insert (add a row), Update (modify a row), and Delete.

In the DDL, you have: Create to make a database, table index or stored code snippet; and Drop to destroy one of those entities.

DCL controls user activity inside the database; Grant gives the ability to access data or create structures; and Revoke takes these privileges away.

As an example, the common command Select is modified with: From (meaning the table); Where (something is true or not true about the data to narrow it down); and Order (which is a sorting modifier). There are, of course, deeper functionalities to these technical verbs and modifiers.

Noting the spreadsheet analogy to a relational database, one can instantly assume that relational databases are extremely useful in business. Tables are named for well known accounting components; the types of business information that can be inserted into the tables is specified and well-defined. The retrieval of sequential of sequential information in relational databases is highly efficient, especially in the production of reports. Somewhat more complicated is the process of the insertion, or updating, of information into very large relational tables. Database systems create mapping tools called indexes to speed the process; often the indexes are larger than the entire rest of the dataset.

Missing entirely from the relational model is the concept that types of information maybe undefined and that desirable data maybe distributed across different systems, where the types and locations of the data are unpredictable. My initial work with complex structures grew from a need to discover and store this kind of elusive data. Having the resources of only me to work with (and no budget beyond my salary), the techniques I developed were made from the simplest components; tools available from the Perl language itself.

Had a well-funded corporate team been assigned my data collection tasks, existing products would have been modified and overlays would have been created for relational databases to give them an object oriented appearance. In effect, this is how OO databases work; they layer object oriented technology over the relational database with extensive mapping. Collecting information from distributed sources probably would have also seen the modification of existing network databases or directory systems.

In developing systems data collection and analysis systems, I used all the benefits the Perl had made available with its object oriented version. My Perl programs would collect data, work with it, and, as part of the process, create complex structures. The process of saving data in the OO paradigm is called persistence. The data structures held in memory by the program are serialized into coded data strings and then written to disk in binary or text format. At some point, that data will be collected for analysis. Alternatively, the same data that has been serialized into a string (just as in creating persistent objects), and can be sent across the network to a server process running on a centralized management machine. The server process, technically a data server, would take this small complex data structure in serialized format, and place it a determined place within the much greater body of collected data.

Entirely using Perl, data was collected and added to the central database representing the entire operation. From that vast complex structure, a central machine was able to create charts and graphs giving historical trends or instantaneous snap-shots. Up to date information is supplied to the administrators screen and, in some cases, triggers within the dataserver proved instantaneous solutions to problems. The complex structure on this dataserver was a collection of small persistent object files of about a megabyte in size. The OS’s file systems provided both storage and data-location mapping in its tree-like structure, enhancing the efficiency of this paradigm. Effective OS's speed access to data files by keeping commonly used data-chunks in memory, instead of relegating them to only to disk when a file is updated. Memory access is, of course, many times faster than disk access.


Data Mining and Discovery[edit]

Data mining often refers to the searching of the huge repositories of sales data that have accumulated in corporate database archives over the past few decades. Sales experts saw an opportunity to enhance marketing by sifting through these data heaps hoping to find patterns in buying trends, often to the granularity of a specific individual. From this sifting they could target people more effectively with advertising.

Similar to data mining is data discovery, which to me, it is much more interesting; discovery seeks to find useful data trends from networks, possibly as large as the whole Internet. In the mid-1990’s data discovery presented an exciting prospect. Of specific interest were large amounts of system data, especially error messages, which were lingering in files distributed across the network on computers. They indicated problems that were not being reported, and even error information that was being correctly captured by logging systems was not being processed and understood. By passing these messages through character pattern recognition algorithms, aided by Perl language, I was suddenly able to reveal troves of previously unknown vital information.

Since Perl does not require types of data to be specified in advance, called data typing, and since Perl complex structures require no specific data size; information of any kind can be stored in Perl persistent objects. The database system that I had created for compiling system wide performance trends now had a name, DepthDB. I created a tool set of modules that culminated in a full featured dataserver along with web-based server code giving users a free form repository of search-able information. From this I built a web based editor that could be used to create and manipulate metrical data structures, as well as record more general information.

In a common business use for data mining, companies have scanned records for embedded medical information about employees to determine if employees are a health risk and to fire them for that reason. To many people, this describes a form of data mining that is an incredibly negative use of the Information Society.

Better uses of data mining benefit drug invention; patterns of patient reactions to medication, called toxicity, can be recovered that would otherwise be lost. One of my accomplishments at Merck was in supporting a mathematical group doing just that form of discovery.


Internet Services[edit]

For the world wide web (WWW) to work you need a pair of components: a web browser, and a server system that provides it with information it requested in the HTML format. Apache, an e-mutualist and free project, is the most widely used web service system. An e-mutualist community of developers who, through selfless cooperative work checked by a relentless peer review process, has made Apache popular by providing a system that is usable, safe and elegant. Named for the numerous update patches, or fixes, applied by programmers in its early days, the Apache community has now grown into a collection of Java related products that compete successfully at the corporate conglomeration level, and provide a reference model for the design of protocols and standards.

To provide information for a user from within a database, a web service program has to go through several steps: it has go to a database; collect information from it; and then convert the information into a browser-friendly format such as HTML, or an updated version called XML (extensible markup language). The user at the web browser requests information by sending a query to the web server. That query has the familiar httpd://www.something.com prefix and concludes with a form of code that is complex enough to specify the needed information, yet still trivial to read and interpret. The web server has a facility to allow programmers to embed programs within the Apache server. These programs provide the logic necessary for contacting the database on behalf of the user and providing the code from which the browser forms a readable web page. A common facility of this type is used by the DepthDB system: the CGI (common gateway interface). CGI is the original web information service; Perl developers pioneered and advanced this technology creating web services, as we know them today. The CGI system is still widely used; it is so efficient that, in the case of DepthDB, performance limits have never been reached. The use of Perl in DepthDB's dataserver, as well as the dataserver's reliance on efficiencies provided by the operating system and its file systems, give DepthDB great system scalability.

In very large web services designed to handle vast numbers of users, the software systems used to supply response information utilize a lot of code; they are too complex to develop and maintain from within a webserver. To accommodate these relatively new and quickly growing systems, a new type of server had to be created: the application server. An application server keeps its code base away from the webserver in an area known as a container, which is usually on another computer. The web server is only used as an interface to communicate with the Internet, and all the programming logic is provided from within the application server’s container of code.

Java is the language of these systems; it utilizes all available object technology to enable the new huge web services. The container system allows for increasingly sophisticated systems that are programmed by large teams of developers through the engineering of object oriented modular technology. An important feature of Java is the high-level of control applied to programmer not found in other languages; this type of control is almost absent from the Perl language. Code written in Java is highly structured and the data is specifically typed. This means that it lacks the flexibility in data handling found the Perl language, as well as Perl’s generous freedom in allowing programmers a personal coding style. Java’s coding style is controlled by the compiler for the benefit of management, whereas Perl’s lack of restrictions in style has allowed for the creation of software that borders on anarchy.


Languages[edit]

Shell[edit]

An amazing thing happened in the early 1970’s with the development of the Unix system by Bell Labs, on behalf of the phone monopoly. A language had been developed that allowed a systems operator to actually communicate with the core of the system, and get human-readable answers back from it. Technically a batch language, it differed diametrically from the systems control languages of the same type that had been developed by IBM for operating its mainframes. This added a unique warmth to the previously inaccessible central system of a computer; it was given the descriptive name of Shell and was further developed into an entire programming environment. A host of tools were added making Shell a highly sophisticated programming environment that was easy, and fun, to use.

In the Open Systems environments, typically Unix and Linux, Shell is the language spoken by the very interior of the system: the kernel. The Shell initiates all programs. Even the systems initiation process, called "init", is initiated by Shell. Init, in turn, creates more Shell instances. It provides the glue that binds all of the system, and gives every user access to the system. Small applications can be created out of its text programs, which are called scripts. Shell is always alive somewhere in the system, in at least one instance but usually in dozens.

In its conversational nature, Shell reads and interprets every character as soon as the user hits the new-line key, or as soon as an end-of-line character is met in a script. Since each character is individually examined, it is much less efficient to the computer than a compiled program. By comparison, compiled programs are fed to the computer in the language of the central processing unit (CPU), machine code, and then they are executed at the computer's top speed.


Perl[edit]

The Perl language came about as a response to Shell's slowness and a need by systems administrators to be able to quickly create efficient code to help manage applications in the large, and growing, Open Systems environments of the 1990's. Perl is interpreted in the sense that Shell is, Perl programs are called scripts and they are kept in text format; the Perl program reads the characters just as Shell does. Perl is much faster because the program code is read in one pass, and then effectively pre-digested for the computer into something called bytecode. The bytecode is then compiled by another Perl facility into machine code to be executed by the computer. The existence of bytecode allows a form of unreadable, almost machine code, to be created that is portable across all computers, where only the second phase of the compiling process is specific to the native computer. The Java language has a similar arrangement but goes a step further in that Java programs are stored in this pre-digested bytecode and not as text code scripts.

Perl has been described as a language with the simplicity of Shell; the speed of C; and includes the powers of Shell’s supporting programs, sed and awk. The joke was, "it is too good, get rid of it"

Perl is a wildly successful example of e-mutualism; it was purely democratically created by and for systems administrators. The OO evolution of Perl used only the features of object oriented methodology that would be beneficial to its users, who were often its developers. It gave such freedom of use that I, a systems administrator and not a programmer, created OO code whose elegance pioneered several concepts.

Perl is fascinating to the Information Society because of its popularity and success. Until the emergence of Java as a corporate language, virtually every bit of code to be executed by web servers was written in Perl. Java, on close examination, is very similar to Perl and it is probable that Java's architecture was inspired by Perl.

There is a single repository of Perl modules called the comprehensive Perl archive network (CPAN); it is the most advanced system of its kind. It is a structured single repository of Perl modules containing virtually all the publicly available Perl code. Every officially recognized Perl module can be installed on almost any computer from this network archive. The Perl modules that support the CPAN have the ability to check your Perl installation for completeness before adding functionality. When a module is requested, the CPAN code recursively loads every prerequisite module to assure that the desired module has all the supporting sub-modules. Since much of Perl code is really C code glued within the Perl modules, there is code compiling in the process; and a series of regression tests are run against every installed module.

CPAN is brilliant as a guide and prototype for future public domain software support systems. Using code distribution systems modeled after CPAN, virtually any computer (no matter how small) can be fully supported wherever it goes. I discovered this potential capability when the CPAN was first developed in the late 1990’s; yet there is no hint of this probable future for a CPAN-like architecture in any description or definition anywhere on the web. Presently being developed is a Perl interpreter called the Parrot virtual machine (Parrot VM), which promises to be factors faster than the commonly known Java virtual machine (JVM). Because it is written in a native Parrot language that is very similar to low-level computer assembly language, it can potentially have a closer affinity with the computer than any other virtual machine. Also being developed is a model called ubiquity, which describes a world wide network of a vast number of tiny computers, all with network capabilities, spread out across humanity. These tiny computers are sometimes described as wearables.

These three highly efficient technologies: the CPAN repository of code, the Parrot VM, and ubiquity; could be combined into a single networked paradigm that could completely change the focus of the Information Society. Today, civilization is completely dependent on centralized computer operations that are accessed by bulky computers running wasteful code that is usually corporate owned and proprietary. By combining these three technologies, humanity could be served on a completely personal level; together they could ultimately pull the control of computing out of the hands of proprietary owners and distribute it to the whole of humanity. Computing, and therefore the Information Society, would become user-centric; but since a computer cannot focus directly on a person, more accurately, this form of computer communication would be described as user-data-centric.


Perl’s technological progress over the past five years has been focused on a new and much different version of the language called Perl6, along with the underlying virtual machine, Parrot VM. The new virtual machine was named after a deliberate rumor, an April fool's joke. The design of Parrot is pure genius, but unfortunately, the arrival date of a workable version will be some forty years in the future.

I am not sure why Perl6 is needed; Perl is so amazingly good that replacement seems unnecessary; work being done on it by the e-mutualist community draws volunteer resources away from the Parrot VM project. Competition with Java seems to drive Perl6 development; but there already is a fully developed language directly evolved from Perl called Ruby. On the other hand, the Parrot VM’s technology is so inspired that its value won’t be realized until the Information Society experiences it; I hope a way can be found to accelerate its development.

My curiosity about Ruby prompted me to ask expert Java developers what they thought of Ruby in comparison to Java. After looking it over they all said unequivocally, "Ruby is the perfect language."


HTML, XML, and CSS[edit]

The "ML" in HTML (hyper text markup language) stands for markup language and represents a type of language used for developing computer based documents. Markup languages have existed in computing since the 1970’s; the original example is known as SGML, which introduced the use of angle brackets (“<”, “>”) for controlling text. A desire by the scientific community for an efficient way of sharing information evolved the Internet into the World Wide Web with the introduction of HTML. The most useful feature of HTML is its namesake concept of hyper text, where web page readers can access relevant information about a word, or phrase, by clicking on it, if it is highlighted. The clicking action brings a reader to another hypertext document. A computer that served web pages, a web server, was introduced in 1990; and web services were made free for use over the Internet in 1993. Only two years later, in 1995, large portions of the world were experiencing instant access to hypertext enabled information.

HTML encourages page coders to use relative concepts. Letters can be described as being larger or smaller than previous ones in varying degrees, and the widths of columns are designed to be proportioned in percentages of the width of a page. When loaded into a browser occupying a largish desktop window, the columns will appear wide and short. If the same page is loaded into a much smaller browser window, the columns will be narrow and deep. This allows a variety of computers to access the same pages, displaying them in a way appropriate for their specific capabilities of the computers' hardware. Font types and sizes can be determined by the user by changing settings on the browser. With the use of cascading style sheets, more control is granted to the user. HTML specifications do allow for the types of controls typically used in word processors and typesetting. Unfortunately they are overused, taking away much of the flexibility originally intended for web pages.

Cascading style sheets (CSS) are remarkable in that they take the attributes controlling the graphic layout out of the HTML code, and place them in the head of the document in the form of CSS code. When a browser renders a page with CSS support, it finds ID tags within the markup elements in the HTML code and references the CSS code to see what instructions are needed to lay out the page. The CSS code can be kept in a separate file to be accessed by the browser for any number of related pages, giving them all similar appearance. The CSS code can then be easily modified to give the same set of pages a completely different appearance. Larger fonts or brighter colors can be provided for people who need them for sight reasons. Users can specify their own CSS code and keep a personal CSS file available to enhance pages to their personal requirements.

XML (extensible markup language) is very common now. It was once considered a revolution in information engineering, but it is now considered a successor to HTML; XML it is useful mostly for creating a variety of document types. The excitement that XML experienced started around the Millennium; it was promoted as the interface between all the various data processing systems that cannot communicate with each other because of proprietary restraints in their data coding formats. Unlike HTML, which has specific markup tags, XML's tags are entirely user defined and non-proprietary. Tags can therefore resemble, for instance, the same accounting conventions that are used in the relational database. The references that would determine what those tags are, are usually kept in a separate file called a DTD.

XML provides some of the benefits of object oriented complex structures but without significant support. It was proposed that XML become the storage medium itself, where XML pages would replace the data in databases. This created unrealistic expectations that XML would become a dominant computer language, rather than just a text markup language.

The sophistication that XML has brought to HTML has propelled the publishing industry into the contemporary Information Society. By combining CSS with XML, browsers can render XML data into web pages. The ID tags within the XML structures that link the structures to the definition reference tables can be associated with the familiar HTML printing codes for creating pages with the use of CSS code. Unfortunately the browsers themselves have not been able to fully comply with up to date CSS standards.

XML is commonly used for formatting Internet tools other than browsers, such as chat tools and RSS news readers. Systems already related to the web, such as Apache, use XML for configuration files.

Presently, the popular public domain office suite Open Office saves all its data in a native XML format. Microsoft is promising that by 2006, Microsoft Office documents, including Word's, will be saved in XML. This significant because it will make Microsoft document formats much more available to alternative office suites, giving those suites a better chance in the software market.


Servers[edit]

Services[edit]

Computers typically operate in the client/server mode. When users require information on their personal computers their computer, called a client, makes a connection to a server. Technically speaking, the software client running on the user’s computer binds to a socket running on the server. The server software process, often called a daemon, provides the server socket. The socket is always available and waiting for a computer to bind to it; data can then be returned to the client software being run on the user's computer. This is the common network connection model used by the Information Society, and was designed in Berkeley for the BSD version of Unix.

When accessing a web page, a user's computer makes a single connection and the server returns a single response. The web server will probably handle many requests before that particular user makes another connection. This single request and response scenario describes stateless connections.


In comparison, when administrators communicate with a distant computer, they use a remote Shell to create a stateful connection to that computer using the SSH encrypted communication protocol. The SSH server on the remote computer is waiting for a connection to come to it from an attaching client which is on the administrator's computer. It will carry out an authentication procedure; it will then read the keys as they are typed into user's keyboard. The information from the keyboard is sent to the computer running the server where it is given to the kernel. The response from the kernel is sent by the server through the socket back to the user's computer where the information is written on the monitor. The SSH server then waits for more characters to come from the user's keyboard. This continuous cycle of input by the user and response from the server defines the stateful connection.


File Servers[edit]

A file server often holds the personal work spaces for users in a central location, rather than having these directories spread across a network on personal computers. The file server effectively supplies remote disks for users, allowing them to keep their information and code elsewhere on the network. A file server can contain any kind of data including entire applications. A benefit of the keeping data on a file server, rather than a user's computer, is the ease provided for maintaining data; backups of the data as well as system updates can be performed easily if the data is centrally located. Safety enhancements, such as disk mirroring, can insure the integrity of data sets by protecting them against disk failure. Network disk sharing was important in earlier days of computing when disk space was very expensive.

The typical file service system in Open Systems (Unix and Linux) environments is network file sharing (NFS), and was originally released by Sun Microsystems in 1984. The Microsoft counterpart service (to the Open Systems environments), is called system message block (SMB.) The public domain Samba system provides the same SMB service on Open Systems so that Microsoft workstations may rely on Open Systems servers. NFS is also available for Microsoft machines, both for desktop clients and as a file server.


Data Servers[edit]

Technically speaking, three words define data services. Databases are simply collections of data, whereas relational database management systems (RDMS) are data storage and retrieval products. A typical RDMS system, such as Oracle or MySQL, includes query languages; storage and indexing systems; and a wide variety of specific functionalities. Data servers are generally machines that contain databases, and run the querying and updating systems. Described more accurately, a dataserver is the instance of data storage and access software running on the machine that comprises the RDMS. IT groups have the responsibility of designing the schemes that contain the data, managing the data present on the systems, and controlling access to it. A database would presumably run within a single RDMS, but may include many machines. A data mining operation, for instance, would very likely span several dataservers, where a separate application server would be responsible for dredging the data, and it would probably run on a separate machine.


Web Servers[edit]

A single powerful computer can operate a very large web service. The Apache web server is very sophisticated, yet the actual process of serving a page of HTML code is almost trivial. Web servers have been implemented in microchips no larger than a matchstick head. I found in my DepthDB application that the vast majority of the work done during the whole process, where data is requested and presented to the user, was actually done by the browser in formatting the page on the screen. Features that I have implemented in Apache are the encrypted tunnel (SSL), password protection, CGI Perl scripts, Mod_Perl (enhances Perl server code), and virtual hosts (allowing multiple websites to use a single server).

If one views the multiple requests that a user makes to a Web server as a session, the state of the session (information specific to a user's progress through the system) is usually kept either in the database that is linked to the server, or in a temporary holding unit on a user's machine called a cookie. Alternatively, data specific to the user's session can be encapsulated in the data strings to be sent back and forth as part of the web communication process between the web server and the user's browser. The storage instrument for the back and forth communication is called a hidden form.

The concept of keeping the state (session) information within the "back and forth" communication process, or in hidden forms, rather than in a database or a cookie is the route that I chose for DepthDB. I created a specific tool using complex structures to encapsulate the user's specific information. The HTML standards include a "hidden form", which is where I stored the complex structure for the trip from the browser to the server. For the return trip, the complex structure is provided to the browser in the form of an environmental variable.

The Apache server runs a CGI script every time it is accessed; it then removes the program code from memory, only to have to have to reload it again the next time there is a user request for its services. Having to reload the same code many times is wasteful; when the server is accessed tens of thousands of times, the constant reloading becomes a source of delays. Perl and Apache solve this problem with the Mod_Perl facility, which allows the Perl program to live within the Apache program. As the Apache program is always alive, the commonly used Perl program within it only has to be accessed once. Other facilities like this exist in Apache for most languages.


Application Servers[edit]

The chosen language for very big corporate web programs is Java. The programs that provide web services in Java are called servlets and server pages (jsp). The business logic behind the web services is built into a modular architecture called Enterprise JavaBeans (EJB.) The Java 2 Enterprise Edition (J2EE) provides standards for containing the web components.

Typical of the containers, which really define the application server, are Tomcat (from the Apache community) and JOnAS (from ObjectWeb.) Both application servers boast e-mutually provided code which is freely and openly available.

Servlets are Java programs that descended from the web applets that are often embedded in HTML pages. Applets were the original use for Java, but Java's design principles made it ideal for wide scale modular architectures.

JSP is a way to create HTML pages by embedding the server logic within stored web page code as references. The appearance of the HTML page itself is created by a web designer, the changing information that is in the page, or the content, is represented by references within the pages where that information should appear. When the markup code is created for the user's browser by the server, the values are injected into the page code in these locations; the artistic work on the web page by the web designer and the data developed as part of the business logic system are effectively blended. With this arrangement, HTML coders can just develop web pages, while Java programmer counterparts can work entirely on business logic. This is an improvement over the previous technique; where pages had to be built by the server code itself, requiring Java programmers to be expert HTML coders as well. The server software had to be written to print HTML code while it was printing the output from the business software. The DepthDB system, being based on CGI technology written in perl, uses the latter more awkward process that usually results in less attractive, more utilitarian, web pages. In the case of DepthDB, the utilitarian appearance actually contributed to its attractiveness.

JavaBeans are the independent class component units of the Java2 architecture from Sun Microsystems.

Application servers are referred to as middleware. They provide systems transparency for programmers so they don't have to be concerned with: the operating system; the specifics of network computing; or huge array of interfaces usually required of a modern web based application. The application server communicates with the web in the form of HTML and XML; it links to various kinds of databases; and often it links to systems and devices which can range from huge and irreplaceable legacy applications to home appliances.

Portals are a very common application server system by which organizations can manage information for their users. A portal provides a single point of entry for all users where they can access information services transparently from any device anywhere within the sphere of the organization. They can work flexibly inside or outside of the organization's offices, and they can attach themselves to any part of the organization.


Systems Servers[edit]

Managing all these systems is daunting; there are dozens of services run in an IT department. The most important is the domain name service (DNS) that maps system addresses to the systems’ names; others support services include automated directories of users and systems; monitoring and decision making systems; and security services. Maintenance and updating responsibilities are assigned to well-protected machines; they make modifications to all the remote machines in a highly secure supervisory mode, often as an automated process.

All these small but crucial roles take place from within the systems administration group, usually from inside the machine rooms of the data center.


Systems Development[edit]

Software Systems Analysis and Design[edit]

"I write a unit test before writing or revising any method that is at all complicated, The writing of the tests tends to simplify the design, since it's easier to test a simple design than a complex one. And the presence of the tests tends to reduce over-engineering, since you only implement what you need for tests"


This is a quote from an object oriented programmer. Many people would see this approach of systems development as counter intuitive, as putting the cart before the horse.


The "waterfall" approach is traditionally the way engineers engineering approach a project; it is straight forward and resembles the way any team might approach any complex project. First the requirements are stated; they are then analyzed, a design solution is created, and then a framework is envisioned for that solution. The team would then develop code for the system and test the code as it is being created. Assuming everything works, they would deploy the system to the customer, and be done with it.


While the waterfall approach is probably the most common method of system design, it has recognized short comings. It lacks flexibility for easily correcting design errors as they are found during the development process, and it cannot adapt to changing requirements. The users the system is being designed and built for often discover they have more needs then originally requested as the system is being developed and deployed.


The quote at the beginning shows the opposite of the waterfall approach; it is typical of the philosophies of the newer schools of software engineering called radical approaches. These new software development approaches came into being during the hectic growth of the 1990’s; conservative engineers are leery of them.


The above quote refers to the test driven approach; it is from an individual programmer, not a manager or an engineer. It shows a design technique were a programmer first develops a finite list of the output that a future code module is expected to produce; as the code is being written, the programmer periodically tests the module . The module is written to exactly fit that list of requirements; it assures that it will work perfectly, and that no unnecessary code will be written into the module.


It also enhances one of the greatest values of the modular approach to programming; it assures that all the code that is written for a project is potentially reusable elsewhere in the project, or in other projects. Effectively, the module the programmer is developing will serve two masters: the testing software that he wrote, and the higher-up modules within the system that will call this module. Future users of the code will feel comfortable that module is usable for their purposes because the test software stands as a reference. The potential reusability of a module is as important as its integrity is. When the test is being used to developer the software, the programmer is not looking for proof that it works, but for the parts of the module that do not yet work. The test software seeks to fail the module in ways that tell the programmer what next needs to be created, or modified.


The opening statement describes a unit test being used to help develop a small module for a large system. In comparison, a test of the whole system is called a functional test. Developers sometimes request a set of requirements extensive enough to allow them to build a functional test even before a project is fully designed. This request may seem absurd from the perspective of the customers who are ordering the system, as they cannot know all the requirements before the system has been completely designed. Yet requesting all the requirements upfront helps developers bring home a point by challenging the customers to fully understand what their system needs really are. Most important to understanding the test driven approach is that the development process is only as good as the tests are. A programmer practicing this approach has to be a good test developer as well as a systems engineer.


An effective way to assure the success of complex systems is to use the iterative life cycle. Best known of these are the unified process, and the rational unifed process; these describe a process frame work consisting of a series of four milestones that a software systems project must pass before it can be called successful. With the first milestone, the business case of a system has to be developed; expectations of its benefits have to justify the resources being invested in it. To pass the second milestone, the components of the system have to be well understood with respect to their activities and the system's participants; it is here that the architectural structure is envisioned. The actual construction of the system and its code is done to reach the third milestone. Finally, to pass the forth milestone, the system is delivered to the user community for an acceptance test; the system is operated in parallel to existing systems to assure its integrity. If, in this test, the product cannot be used, it goes back to the drawing board, or the project is possibly halted and forgotten.

In the development of a sophisticated application, the above scenario is actually nearly impossible to achieve in the four step life cycle process. In practice, the development process is broken down into smaller iterations of this process. The project goes through many repetitions of the four step life cycle process; each iteration of the project through the life cycle process produces a working prototype of the final product where the body of the project gains size and sophistication. The lifecycle process is also applied to each of the subset modules as well, allowing an organic growth from within the core of the project. Developers can halt the project anywhere along iterative cycle process knowing it is much cheaper to rework design concepts than rewrite the code.

As the entire application iterates through the life cycles, the design team continually releases increasingly usable code to the user community. All the while, they gather feedback from the users during the user acceptance testing that follows each of the life cycles. As an added benefit, the interim releases bring value to the enterprise in that they provide at least some of the needed functionality. Now that the user community has gained experience with prototype versions of the actual final product, they will almost certainly discover new requirements. Their suggestions for new or improved features are implemented into the next life cycle, giving the project value beyond initial expectations.

Not all projects can be delivered in repeating life cycles. Aircraft software systems, for instance, have to be delivered in whole and in perfect condition before the first flight of he aircraft. This type of software development requires true engineers, not just talented developers. The testing is done exclusively by simulation systems; the testing process is highly mathematical. Simulation testing is the only feedback mechanism available to designers and programmers when building this type of system.


Even aircraft systems design examples describe precisely what the developer in the initial quote seeks from the outset; testing is a driving force of development. In the lone developer's case, the coder is own software engineer, and therefore he is solely responsible for his work.

Some software system development approaches concentrate on the human side, proposing close personal bonding in the coding process. With these approaches, everybody on the team takes ownership of all the code in the project; each team member improves it for integrity and tightens it for efficiency as soon as problems are noticed. Bonding in the group guarantees the equal distribution of knowledge about the system so that there is a universal understanding of how the entire system works; all users understand how to write effective test programs to further drive development. Correct knowledge sharing is important.

My experience using modules from the Perl CPAN was sometimes frustrating in that the regression tests obviously didn't test the real substance of the module because things weren't working. Only by field testing the modules was I able to prove their value.

As can be expected, the modular nature of software systems engineering parallels the nature of object oriented software paradigms. Surprising, however, are similarities between abstracted design concepts and the actual computers and their software. The model view controller (MVC) is an abstracted concept that seeks to separate each web based transaction into three phases; the input, or controller, phase (input from a user's keyboard); the logic, or model, phase (where the systems do their internal work); and the visible results, or view, phase (answers are returned to the users screen). This abstraction layer fits directly over the schematic of an applications server.

These two tables illustrate a progression from the abstracted concept through the logic phase to the hardware layers where the arrows represent the communication components.

Typical Web Example:

Controller -> Model -> View

Input -> Processing -> Output

Keyboard Typing -> Containerized J2EE Code Modules -> Web Page Creation and Delivery


Motorist lockout Example:

Input->Processing->Output

User Computer->Application Server->Remote Device

Hand Held Device->Containerized Code->Car Computer

User Tells of Lockout->Code Logic Confirms->Car Unlocks Door


Developments in the design and creation of large software applications have not been limited to books, paradigms and practice. Actual languages have been created to facilitate development. The universal modeling language (UML) extends the tool set concept to a unique form of code now owned by IBM. The giant successfully sells management packages built from it to well-funded critical projects for huge amounts of money.

Another important approach to design, Aspect engineering, has resulted in languages that specifically add integrity to systems. Aspect programming assures integrity in systems by finding areas of concern that are common to every part of a system, yet don't necessarily relate to any of the specific purposes of the system. These are called cross-cutting concerns. Typical examples of these are logging and debugging. There is a very strong resistance in most programming projects to providing verbose error information about the progress of a system. Likewise, debugging is considered an after thought because it does not directly add to the benefits expected of the system.

Aspect orientation makes these important, yet often ignored, functions part of the internal workings of the modules. Since this functionality is buried in the system, it is transparent during the normal programming process. It does not impede the coding efficiency of the programmers and only minimal effort is needed to enact the capabilities. Programmers insert triggers, called join points, in important locations within the code to enact the added integrity features.

As part of the design focus on the users, attention is given to assure good communication all around. Cards are used to abstract the design of the entire system to pull the focus away from technical details. This helps work groups subjectively model the basic premises of the system's existence, strengthening the project's goals. The cards represent the components of the design model; they can be manipulated in various ways to represent the states and activities of the system, and the progress of the project.

When used in one way, in the Object Model, the placement of the cards represents the structure and substructures of the system's attributes, operations, and associations. Used another way, in the Dynamic model, the cards show the behavior of the system in Sequence (collaboration) diagrams, Activity (work flow) diagrams, and Statechart diagrams (a slice of the system's activity at any moment.)

They cards are useful at meetings: simply stacking them into piles representing levels of progress can assure management that the project is succeeding; flipping through the cards and passing them around can focus discussion to specific areas of concern or modules; the scope of discussion can be widened to the entire project by combining the cards into groups.

There is an irony here; software systems engineering as I have described it, is completely absent in the creation of some of the most significant software there is. I am unaware of any software I am using right now that has benefited from simulators, state diagrams, or cards. The modeling of these systems is done mostly through email communication, at the group level, and a form of meditation is done during the programming process. Maybe this sparse planning scheme is detrimental, especially during the hours of solitude experienced by most e-mutualist programmers.


The Perl community seems to me to have gone astray with its Perl6 and Parrot projects; the arrival dates of usable versions of these projects are so far away as to be meant for another generation. It seems as if the volunteer work force for Perl is spread too thinly for its many lofty goals; yet I am not sure a greater workforce would accomplish the hoped for results more quickly. I can show examples of violent personality clashes over the Usenet between Perl programmers. It is possible that the techniques of the radical developers, including the seeming silliness of using cards, could create for the Perl development team a sense of abstraction to reduce energy dissipation.

Perl has, none the less, built a model for the future management and distribution of the world's code in its CPAN repository. The Linux operating system is waiting for some legal governance to push back the crushing dominance of the Microsoft monopoly; if and when that happens, the free Linux operating system will be given a chance to eclipse the XP system of Microsoft. Very likely, other public domain software projects will then also replace their proprietary counterparts.


Hindsight is a powerful learning tool for software design and a design approach has been created to utilize it. Design patterns are the encapsulated experiences of programming efforts. The experiences are analyzed, documented and then placed in an online repository. Programmers can use these design patterns as templates, just as seamstresses lay patterns over cloth. Patterns can be fitted to any programming challenge or, alternatively, rejected without any cost. The patterns, when viewed as a single repository, represent the sum experiences of software development and are an excellent learning resource. Using these tested and proven paradigms helps illuminate subtleties in designs that can result in major problems later on in projects. With the use and reuse of patterns, experience accumulates improving the patterns. Programmers and engineers who understand design patterns, and participate in their improvement, benefit from them by creating code that is improved in structure, integrity, and readability.


Like all other design approaches, design patterns use abstracted object oriented concepts. They provide generalized solutions that are documented in a format independent of any particular programming language, design paradigm, or problem type.

Administration[edit]

Systems[edit]

When working with common computer systems, observations fall into four distinct categories, CPU (central processing unit), Memory, Disk and Network. These four categories create CMDN, model I created for system analysis for the DepthDB system. Within these categories, a wide variety of metrics are available to quantify the usage, status and configuration details of all types of systems components and sub-components.

The software to run the computer is the operating system (OS). The most important system code in the OS layers just above the CPU; it is called the kernel. Memory is shared by all software, and configuration changes to it are from within the kernel, or in code very close to the kernel's code. Kernel code is so important to the system's operation that it is always kept in memory and is modified by changing active memory. The management of the CPU and the OS is generally known as performance tuning. Updates to the OS usually involve recompiling the kernel and its supporting code, along with their installation into the OS. The boot process connects the initial code of the OS kernel to the start-up code within the computer hardware. The term is derived from the phrase "raising yourself by your own bootstraps." In a more generalized sense, the boot process also attaches the kernel to the file systems of the OS. There the kernel program finds the Shell executable code as well as modules that are loaded directly into the kernel's memory space. When accessed, the kernel makes the modules an extension of itself, usually to allow user programs access to hardware devices.

Disk systems are technically called storage. Storage differs from memory in that storage persists between re-boots, whereas memory is wiped clean. The object oriented term persistence derives from the permanence of disk stored data. Disks are typically divided into a series of large partitions preceded by a tiny slice at the very beginning of the disk called the MBR. This initial portion is given to the boot strap process and contains a wholly independent program. The e-mutualist community has developed the grand unified boot loader (GrUB) to live in the MBR. It functions as an exceedingly small operating system to be able to give increasing capabilities to the boot process.

Disks, for instance, are usually only allowed four usable partitions on a PC computer with the last partition being split up to form extended partitions. For the Open Systems OS's (Unix and Linux), the limitation of four partitions has never been adequate because these operating systems keep different kinds of data on a larger number of different file systems in different partitions. There is a protocol to extend the number partitions by dividing up the last partition in to sub partitions, but this arrangement adds unnecessary complication to the process of disk management.

Logical volumes (LVM) solve problem of a shortage of partitions by allowing the OS to create a software layer above the disks. LVM software breaks up all of the data on the file systems into tiny units and assigns each of them to arbitrary locations on a disk, or across an array of disks. In this way the data is abstracted to allow the user and the system to have complete flexibility in arranging the data across disks. LVM software uses this flexibility to keep concurrent copies of the data, a technique called mirroring, or to provide the ability to resize and move file systems at will.

Really good file system software has existed for decades in the Open Systems area, but high integrity file system software has only recently reached the common desktop computer. Today, computers equipped entirely with disk integrity suites from the public domain can withstand repeated power outages without damage to the data. A technique called journaling is used to provide this capability; the best available example is the public domain ReiserFS file system. When combined with freely available LVM software and an array of cheap disks, an average person can build a storage system that as reliable as any found in a well funded corporate datacenter.

Computer networks, technically speaking, operate outside of the computers that they link together; yet much of the network hardware necessary to operate networks is within the computers. Network configuration is likewise handled locally on the computer, as is performance monitoring. Ethernet is the most common wired communication protocol and has enjoyed two decades of stable and efficient performance.

Sometimes the CMDN categories overlap. Running out of memory is like running out of air. To prevent this form of complete failure, the operating system has algorithms to move pieces of infrequently accessed memory to the disk. Small portions of data in memory continuously trade places with data on the disks as memory becomes scarce, or stored memory is needed to run programs. From the perspective of the systems operation, the data moved to the disk is effectively still memory, but it is referred to as virtual memory. Therefore key memory metrics include references to disk-related performance and capacity data.

In another example of CMDN overlap, file systems seen by the user may not be on the computer; they may be on a file server. The file structure programs are dependent on network performance since all the data used has to travel across the network. Network performance metrics are more relevant in this case than disk performance metrics as networks typically slow data more frequently than disk systems.

User support is handled by administrators. Originally all Open Systems users did all their work within the Shell environment, and many still do as it is very efficient and boasts high integrity. The Shell environment offers an ideal programming and Web administration environment; artifacts of it are visible today as components of the web. Shell users are listed in the password file and they are given their own directories in the file system, which is where they arrive when they log in. These password files are merged into small network databases so that many local computers can recognize users as they traverse the network. Users can also find their familiar home directory on every machine thanks to the data sharing abilities of file servers. The systems capabilities of users, with their login and data storage privileges, are managed by a systems administration group.

Web applications, such as portals, have their own login schemes, which are handled from databases. This is necessary to insulate the application from the operating system so that the applications can be moved to different types of computers and operating systems. The users' privileges are managed in tables in databases that are attached to the Web or application servers. The management of large scale applications tends to fall to systems administration groups because of the complexity in joining the operating systems with the applications. Issues continually arise in accommodating applications, especially during their inception phases.

Closely related to the systems administrator is the database administrator. The work locations of these two types of administrator are rarely far apart. The fields of systems and database operations are so integrated that the large systems management paradigm depends on their blended knowledge and responsibilities. Database administrators manage all the data within the databases; maintain the database functionality; and control access to the stored data. They create and drop databases and tables; add or remove users; grant or revoke access privileges; and manage the custom query code stored within the system. They also manage the transactions made by dataserver as well as the relationships between linked dataserver, either as clusters or widely distributed data storage systems.

Application servers are sophisticated enough to require specialized support; their newness and complexity require frequent proactive training for the support staff. New business code from development groups arrives regularly; and as the science behind their operation continually evolves, new application server software often needs to be installed.

The continual installation of new code modules creates a closeness between the application server administrators and the software developers. Issues surrounding application server support from the technical perspective bonds the applications administrators with the systems administration groups. Application server dependence on relational database systems assures that database administrators are constantly consulted for support by the applications administrators. The management of users who are serviced by the applications servers can fall almost anywhere; frequently, the business community is responsible for user administration if the services users receive are purely financially related. Increased specialization will occur as application servers begin to reach out to the rapidly increasing networks of intelligent devices connected by wireless communications. These small devices are expected to operate most expensive equipment and durable goods in the next few decades.

A class of smaller services handled by systems administrators include mail servers, backup servers and technical support ticketing systems. Directory services such as LDAP (which usually handle people oriented information), or distributed authentication systems (which can create a unified login system) function as small databases but their support falls to systems administrators. Other important services include DNS (which relates Internet names to physical system address), FTP (for moving files around the Internet), HTTP (the technical name for web services), SSH (secure log-in and file transfer system used by administrators), SMTP (email), and SNMP (a systems monitoring protocol).


Network[edit]

Internet Networks are the backbone of today's Information Society, yet the networking staff tends to be somewhat isolated from the rest of the computing environment. Their importance, however, is not trivial. Their activities range from the installation and repair of electrical cables and connecting devices, to intensely difficult tasks; they design and manage the vast spanning topologies of wire and radio pathways connected by sophisticated multiplexing gateways.

Networking is abstracted into the layers of Application, Transport, Network, Datalink, and Physical.

The Application layer is what we encounter personally. It is the layer where the web, email, database, and file sharing activities operate.

Beneath this is the Transport layer. This is the domain of the electrical packets containing application data that travel back and forth between computers and devices. TCP (transport control protocol) is the most common protocol for managing packets of data. SMB (server block message) is the file system sharing protocol used by Microsoft.

The Network layer lies below the transport layer but, in a more accurate sense, the network layer supports the transport layer. On this level, routing devices utilize the address information in the data packets to guide them around the Internet. When the routers send packets along to their destinations, they also add mapping information to the packets to indicate how the data should get there and back again. The routers use mapping tables to describe the effective paths around the Internet; the maintenance these tables is the responsibility of senior network administrators.

The familiar groupings of four numbers, each ranging from 1 to 256, and joined by dots, is used by the Internet protocol (IP) to number all the computers and devices on the Internet. Its range of 4 billion addresses was mistakenly thought to be adequate to give unique numbers to all of the world's network nodes. The original Internet addressing protocol, IPv4, is being supplanted with the IPv6 protocol. IPv6 will offer a large enough array of available addresses to expand the Internet literally into other galaxies. By putting encryption information right into the packets, the IPv6 designers have assured that data is protected at the lowest level.

IPv6, unlike IPv4, has an address scheme that is not readable by humans. IPv6 computers and devices can only be referred to by their associated system names; the management of their addresses requires software tools. IPv4 will always be desirable for local network administration, such as home networks, because of the simplicity of its numbering scheme. Routing technology supports IPv4 contained within IPv6 networks with the use of algorithms that translate addresses automatically, allowing the two protocols to co-exist comfortably. On the original IPv6 interest mailing list, I remember making the initial suggestions for the internal information encryption scheme. This possibly makes me a significant contributor to present Internet.

The Datalink layer brings together the abstracted concept of the Internet protocol with the physical reality of the management of the groups of electrons that make up digitalized packets. For providing this service, Ethernet is the preferred protocol. It was invented at the famous Xerox Palo Alto Research Center, but the Xerox Corporation never thought the protocol was worth the effort of keeping. They allowed its primary researcher, Bert Metcalf, to leave the lab with the Ethernet technology. He formed the 3COM corporation to perfect the use of Ethernet; 3COM built much of the early network interface equipment.

Beneath it all is the physical layer that simply describes the cables, connectors and the silicon microchips that channel the electrons of the packetized data.

Added to the physical layer today is wireless technology. It is very popular and its contribution is significant because it gives users complete freedom in their need to access the Internet backbone. Because wireless exists as waves in open air, its functionality is physically limited. Socially important, the potential exists for wireless to provide completely free and widely available networks. At some point, however, all data packets have to become earth bound to utilize the efficiency of traditional cable networks.


Security[edit]

In the early1990’s, security responsibilities fell to the systems administration groups I worked in. As a profession, we were outstanding at the task. While lecturing the teenage members of the Linux Society, I sensed that they enthusiastically absorbed the synergistic concepts for integrity that I had experienced developing over a decade ago. They seemed to instantaneously benefit from the administrators' culture of responsibility and teamwork, enabling them to be responsible members of the Information Society.

The simplest component of security is the password. As a requirement, it has to be meaningless and cannot appear in any dictionary, or on lists of previously used passwords. Passwords are made up of characters and can be often seen as they pass through networking devices.

The encryption of data traveling across networks provides the next level of security protection, by preventing the viewing of passwords, and other important data. SSL (secure socket layer) is the most common implementation of network data encryption; you know you are using SSL when the URL in your web browser address space says HTTPS rather than HTTPD. In the SSL encryption scheme, every systems user has two cryptographic keys: one public and one private. The public one is provided to everyone expecting to encrypt data to send to a specific user. The private one is held dearly by that user because it is used to decrypt the data. It is impossible to break code protected by recent versions of encryption technology; the technology makes users feel secure in their transactions, yet it gives anxiety to governments. OpenSSL is the public domain version; it is very popular, and it gives the SSH shell its encryption abilities.

Security within the operating system of the computer is like the database control language used by relational database systems. It determines where you can go within the file systems, and what you can look at, change, or move around. In Open Systems machines, files have what are called permission bits set in their basic digital control section, the inode. This is visible with the "ls -l" command. It has three categories: user, group, and world; and three modes: read, write, and execute. Needless to say, you don't want your personal information to be world writable, and probably not world readable as well. System programs are typically world executable, which means that anyone with permission to log on to the system can run these programs. The average user probably does not have access to most of the system files; therefore the usefulness of available programs that modify files on a system is limited by the user's access to files. Typically, users are only allowed to modify or create files in their home directories.

Capabilities are an alternative form of control over the access to files on a computer. In a capabilities control system, access to files on the system is not granted to a user as a privilege; rather, access to the files is granted to programs. The operating system registers access to specific files with the names of specific programs. An unregistered program brought by a user to the computer would initially have no files associated with it; therefore, it could do no harm.

A simple example In a present day Open Systems computer, user access to a file is determined by the access bits sent within the file's inode and the ownership of the file. The very sensitive password file is open to being read by everyone having access to the computer. Existence on an Open Systems computer is defined by a user's operating within a Shell process; Shell, in turn, needs access the password file. Shell needs the reference to the location to the user's home directory in the password file for the initial process of logging in; for the login, it also needs access to the users encrypted pass word. Shell might also need to access to the password file to allow the user to switch to a different user name; such as the supervisory user, root. The passwords in that file are kept in a crypt format to prevent their being read. Crypts are unreadable, of course, but access to crypts gives malicious users the opportunity of trying a variety of techniques to find weak passwords; any system with exposed password crypts is potentially vulnerable. In recent years, this Open Systems vulnerability was resolved by moving password crypts outside of the password file using a complicated and awkward work-around arrangement that is difficult to maintain. The added complications create further potential for vulnerability known as the "dumb deputy" scenario, where a junior administrator may be tricked into giving unauthorized access.

On a system using the capabilities scheme of access control, this important and interesting example of systems vulnerability does not exist. Only the relevant programs would have access to the password file, rather than every user. Only programs that have a legitimate need to see the crypts are given that capability; users will never see the crypts, and malicious users will never be able to use crypt breaking software to attack the computer.