Skip to main content

Basic Internet Knowledge - Internet 101

11 min read

Older Article

This article was published 7 years ago. Some information may be outdated or no longer applicable.

Before jumping into web development, it makes sense to understand what makes the internet tick. What are the foundations that drive the whole thing?

Many people start coding and writing web applications straight away. They skip the basics of what happens beyond their home internet setup and how the internet actually works.

The internet is the foundation of web pages. Having some basic knowledge about how it operates (and how web pages work in general) makes everything else easier to grasp. Without that understanding, building anything on the web gets unnecessarily complicated.

Think of it like being a restaurant manager who can’t cook or judge whether a dish tastes right. Technically possible, but a mess, because you can’t guide the kitchen staff if you don’t understand the kitchen.

Right, enough food analogies. Let’s talk tech. This post isn’t about the history of the internet. It’s about how the thing actually functions.

Webservers

Webservers form a fundamental piece of the internet. Through HTTP and HTTP methods, you can reach web pages served by these servers. They don’t need to be physical (bare-metal) machines; they can be virtualised too. But virtualisation and the inner workings of webservers are outside the scope here.

A webserver is a piece of software running on a server. Its basic function: accept HTTP requests from clients and fire back a response. (This typically happens when the browser sends an HTTP GET request.)

There are several types of web server out there. Some well-known ones: Apache (used mostly with PHP as part of the LAMP stack), IIS from Microsoft, and Nginx, used mostly with modern Node.js applications. Speaking of Node.js, there are web servers like Express built purely for that environment.

The physical distance between the webserver and the requesting client increases the time a user waits for a website to load. A CDN (Content Distribution Network) edge server can cut this round-trip time by caching the requested data close to the client’s location. These edge servers can bridge separate networks, reducing load times.

A few years ago a common practice started to evolve - static assets such as CSS and JavaScript were stored in CDNs for faster access and caching. Many such CDN servers exist even today, such as cdnjs.com.

Back to the food analogies (sorry): think of the edge server like ordering pizza from a big chain. Your order goes to the main restaurant first, gets routed to a branch near you, and they deliver as fast as possible.

If there’s an Apache server, think of it as an “origin server.” If the content comes from a CDN, that’s an “edge server.”

Fun fact, the latest development stack out there referred to as the “JAMstack” is completely “serverless” because the website content is served up entirely from a CDN server. It’s a rather exciting movement that’s sweeping through the developer community like a storm.

An edge server stores static assets of a webpage (CSS, JavaScript) and can send them to the client without needing the origin server at all.

IP addresses

The Internet Protocol (IP) indicates the address of a device connected to the internet. IP addresses look like phone numbers at a glance: “18.130.18.214.”

IP addresses are numbers split by four dots. The first group of numbers represents the “highest level,” and moving left to right, you go a level deeper.

The first group refers to the service provider, then the client, then devices. Devices identify each other with these unique IP addresses to communicate and send/receive data.

There are two types: static and dynamic. Dynamic IP addresses are temporary, assigned when a computer connects to the internet, and they change each time. Static IP addresses, well, stay put.

The IP addresses we’ve been talking about are all IPv4. IPv4 addresses are nearly all distributed, meaning we’re running out. (Back in the ’70s, people thought 2^32 addresses would be enough.)

IPv4 uses 32-bit numbers. With 32 bits, the maximum is 4,294,967,296 addresses, which isn’t nearly enough. IPv6 solves this by using 128 bits (e.g. fe80:0000:0000:0000:0202:b3ff:fe1e:8329), giving us 340 sextillion variations. The longer range makes hexadecimal the obvious choice for displaying addresses rather than just decimals. Beyond the extended address space, IPv6 also supports automatic configuration, identification, and encryption of data traffic.

localhost, 127.0.0.1, ::1

The IP address 127.0.0.1 (or “::1” for IPv6) points to your own computer’s webserver. This works the same way on Linux, Windows, and Mac. The first part (127) refers to a loopback address, meaning you can run a network service on your computer without needing a physical network interface.

Opening this address in your browser won’t take you to a website. It takes you to your own machine (provided you’re running a webserver).

Why would you want this “loopback thing”? Several reasons. The most important one for web developers: testing code. With the loopback address, your computer becomes a webserver where you can run your code and see it in action, effectively serving your site locally.

HTTP

If you’ve visited a website, you’ve likely typed “http://” at some point. HTTP (Hypertext Transfer Protocol) is the fundamental communication protocol of the World Wide Web. It follows the basic client/server model.

TCP (Transmission Control Protocol) matters here too because it defines how information gets packaged up and sent from server to client. HTTP doesn’t control the packaging or sending. Put simply: HTTP is responsible for getting the data, TCP is responsible for how to get it.

HTTP has status codes. Some are familiar to everyday users, like “404 Page not found.” They fall into five groups:

  • 1xx are information level messages such as: “101 Switching Protocols.”
  • 2xx is for success messages such as: “200 OK”.
  • 3xx is for redirections such as: “302 Moved Permanently.”
  • 4xx is for client-side errors such as: “403 Forbidden.”
  • 5xx is for server-side errors such as “503 Service Unavailable.”

Headers are another key feature of HTTP. Using headers, we can send small pieces of information as part of requests and responses. By specifying a header, you can choose what format you’d like the data in (e.g. text format).

Because headers are involved in both requests and responses, servers can use headers like “Upgrade” to ask the client to switch to another protocol.

HTTP versions

A quick note on versions: there are four versions of HTTP (0.9, 1.0, 1.1 and 2.0). HTTP 1.1 is predominant, though many services already run HTTP 2.0. The differences between 1.1 and 2.0 are outside the scope here, but HTTP 2.0 brings real benefits to end users, particularly around speed and performance.

Anatomy of a URL

You need a browser to view websites. When you type an address into it, you’re typing the URL (Uniform Resource Locator).

A URL is built from different pieces, each with a dedicated purpose. Let’s break down a simple one: https://courses.fullstacktraining.com/courses/introduction-to-typescript

The first part, https://, is the scheme. The HTTP part defines the protocol between server and client. The “s” refers to “secured”: a secure layer enables generic data encryption.

Other protocols exist too: ftp:// for transferring files between server and client, or mailto:// which opens the user’s default email programme.

Next comes the subdomain (“courses” in the example above), referring to a specific section of the website.

Subdomains can carve a website into logical components. They’re enabled by DNS entries set via the domain provider. An example entry might look like: courses IN A 192.168.2.10.

The most famous subdomain is “www.” Back in the day, the internet was used for various things (telnet, SMTP), each with a dedicated subdomain. In the ’90s, when organisations started putting websites online, they denoted them with the www subdomain.

Related to the subdomain, we have second-level domains. Despite the name “second-level,” it’s the heart of the URL, the actual name of the website. The domain name lets you find the exact site you’re after, thanks to its uniqueness. Each unique entry has an associated unique domain.

A top-level domain marks the end of a canonical URL. It defines the class of the website: .com for commercial sites, .gov for government, .edu for universities. Some countries have specific domains too: .es for Spain, .co.uk for the UK.

The next section is the path (“courses/introduction-to-typescript”). This could be a physical directory or a virtual mapping. Modern web servers like Apache or Nginx can translate such paths via virtual mappings. For example, “/courses/hello” could point to “/courses/hello.php.” These clean paths help with Search Engine Optimisation (SEO).

Note that by default a web server looks for a so-called “index” file which is the file that it serves. Effectively https://fullstacktraining.com opens up https://fullstacktraining.com/index.html, but there’s no need for us to specify this since web servers do this for us automatically.

There are other notable URL sections, like parameters: key/value pairs that can trigger specific actions based on user behaviour.

Parameters sit in the URL, denoted by ”?” and ”=”. For example, to send a user directly to a given course: https://fullstacktraning.com?productid=1234 (assuming the course has ID 1234). With this, the host can track clicks, manage ads, identify where visitors come from, and load data from a database. But “productid=1234” isn’t SEO-friendly, which is why “/product/name-of-product-1234” is a much better URL strategy.

URLs sometimes contain # symbols too. The string after the # refers to a specific part of the page. The site loads in your browser, and the # jumps you to an anchor. Think of it like a bookmark within a page.

Note that some modern, frontend JavaScript frameworks (such as Angular) also leverage # as they form a crucial part in designing SPAs (Single Page Applications).

Cookies

Cookies store information on your computer after visiting a website: your location, language, and more. They allow web pages to send personalised information to the user.

Cookies can store names, email addresses, phone numbers, and other browsing information. This raises obvious security concerns, though cookies only store information the user consents to, and they don’t have access to other information on the computer.

There are different types of cookies:

  • Session cookies: These help with authentication and are essential for applying authentication and authorisation to a site.
  • Persistent cookies: These handle things like pop-up windows asking if you want to save your password or remember your billing address on eBay. Despite the name, you can still remove them, but their expiry time is much longer than session cookies.
  • Third-party cookies: Classic example: you search for hotels, and suddenly adverts for hotels in that region start appearing everywhere. There’s more going on behind the scenes, but it all boils down to cookies. Third-party cookies help with advertising and analytics. With the introduction of GDPR in the European Union, cookie consent has gained much better control and transparency for end users.

Conclusion

When you enter https://fullstacktraning.com in your browser, you send a request to a web server to get data about the webpage. The browser turns the URL into an IP address to find the server where the data is stored.

Your request travels from your unique IP to a router through your ISP (not all computers connect directly to the internet), which sends the request through the internet, searching for the IP address of the server holding the web page data (or in some cases, the edge servers).

The server receives and processes the HTTP GET request. It then breaks the information into many small packages, sends them through different routes back to your computer, where they’re displayed on screen.

Sending an email follows a longer, more convoluted path: your computer connects via an ISP to your email provider’s server (e.g. Gmail). Gmail then looks for the recipient’s email provider server (e.g. Microsoft). The recipient connects to their provider’s server to get the emails.

These information packages travel through the internet with the support of routers. Routers prevent information from arriving at the wrong computers. Every time a data package passes through a router, it gets a “layer” so routers can identify where to send it next.