Just What Is The Internet? An Introduction for Student Researchers

Just as students are confounded by online library resources, so are they confused by "The Internet" (also commonly referred to as "The Web"). In fact, these terms are often used interchangeably, even though both refer to different tools; in fact, the World Wide Web is actually one portion of the Internet. Though students tend to lump everything that's available online into a singular category, each individual resource is different, with its own strengths and weaknesses.

Without delving into the history of the Internet, let's briefly define the Internet and discuss its components. Understanding how the Internet and its parts function will be quite useful when you're actually searching its billions of files for a few articles on quantum particles for your physics paper.

"The Internet" consists of millions of individual computers that are "online" (i.e., connected to a single network) at any given moment. Access protocols govern the connections between these machines. Internet access protocols are essentially rules that facilitate communication between individual computers and the Internet. Programs, such as web browsers and search engines, use these Internet access protocols to search for and retrieve desired information. Some of the more customary protocols include HTTP ("The Web"), TELNET, FTP, Usenet, and email. However, no one piece of software has access to every file that's housed on the Internet; thus, it's necessary to build up an arsenal of web sites, subject directories, search engines, and Usenet and email groups for your research needs.

Every file located on the Internet has a unique Uniform Resource Locater, or URL. An URL is similar to your home address - it tells the world where your home is to be found, and there are no two addresses that are exactly the same.

URLs are somewhat standard, though individual URLs can be quite complex. All URLs use the format: protocol://host/path/filename

Let's look at each piece of the URL in turn:

1. The protocol, as we learned, is a rule (or set of rules) that help your computer connect to the Internet. The most common URL is HTTP, but you'll also likely encounter TELNET, FTP, Usenet, and email. Because HTTP is the norm, most URLs begin with this string of characters.

2. The host name (also known as the site name) is the unique name used to refer to a computer on a network.

3. The path consists of both a second-level and a top-level domain, and possibly a directory name as well. The second-level domain is usually referred to simply as the "domain." Like URLs, each second-level domain is unique: you'll never find two identical second-level domains. The second-level domain identifies the owner and administrator associated with an Internet Protocol (IP) address. Sometimes, the second-level domain may be further divided into additional domains, such as is the case with the URL In this URL, the second-level domain is "nypl", while another sub-domain is "catnyp".

The top-level domain, on the other hand, is ".org." Top-level domains were originally used to categorize the web sites that they're attached to. Unfortunately, as the popularity of the Internet has exploded and every Tom, Dick, and Harry has built their own web site, much of the meaning of top-level domains has eroded. For instance, the top-level domain ".org" is supposed to designate web sites belonging to non-profit organizations, but there's no verification process needed for obtaining an ".org" domain. Thus, individuals or even for-profit organizations could register their site as an ".org."

The most widely used top-level domains include the following:

.com Commercial web site
.net Network
.biz Business
.org Non-profit organization
.edu US educational institution
.gov US governmental institution
.mil US military institution
.int International
.pro Professional
.info Informational
.name Individual
.aero Aerospace Industry
.arpa Arpanet
.coop Cooperative

4. Sometimes the path name will also include a directory name. If the site contains a number of files, they might be organized and stored in different directories. Directories are similar to individual folders in a filing cabinet.

5. Finally, the filename is the name that the site's creator assigns to the individual file that he or she uploads to the site.

A basic understanding of the Internet, especially how files are stored and retrieved on the Internet, can come in quite handy when you’re attempting to locate information online. For instance, if you attempt to follow a “dead link” (that is, one that’s broken or no longer takes you anywhere), you can identify the second-level domain, find the home page, and attempt to search for the missing information there.

Students in any field - not just computer science - are well served by knowledge of the ‘Net and its intricacies.

Copyright Kelly Garbato, 2005

