Web Data Flow Explained

Every time you click a link, like a photo, comment on a post, or even binge-watch your favorite series, the data for these actions has to reach your computer. Thanks to the rapid advancement of technology, all of these activities happen in the blink of an eye—so fast that we rarely pause to think about it.

In this blog, we’ll take a closer look at what happens during that blink of an eye, the journey of this invisible information, and how various rules (known as "protocols") govern the flow of data across the web, among other fascinating details.

Internet: The Backbone

Before anything else, it’s crucial to understand what the internet really is.

The internet is simply a network of computers connected to each other through networking (cables, wires, or wireless connections). You can think of it like a country made up of various states { computers }, with the states connected by roadways { networking }.

There are several components that make the internet work and allow you to access it through your computer. Let’s understand them before moving forward.

A good question to raise at this point might be: Are the components connecting computers visible, like the roadways that we can see?

Yes and no at the same time.

The Physical Infrastructure of the Internet

At the core, at the hardware level, there are devices such as routers that make internet access possible. These routers are connected via cables, which likely lead to some office.

These offices are the ISPs' (Internet Service Providers') centers. Now, what is an ISP?

An ISP stands for Internet Service Provider. They are responsible for providing you with internet access. Since the internet is just a connection between computers, you might wonder: How do ISPs provide internet access? Can’t we just connect to computers ourselves?

Well, as we’ve established, the internet isn’t just a connection between two computers. There are millions and billions of computers interconnected to form the internet, and it would be impossible for us to connect to all of them ourselves. Hence, ISPs came into existence.

ISPs also use physical cables to establish and maintain these connections. Interestingly, millions of undersea cables form the backbone of the global internet infrastructure. These cables are used by ISPs because they offer high-speed, reliable communication —much like having a straight highway with no traffic in your lane.

Underwater cables

So, the flow works like this: We connect to a router (via Ethernet cable or wirelessly), the router connects to the ISP, and the ISP connects to the global internet.

Protocols

Since computers are now connected to each other across the globe via the internet, we should be able to transfer information the way we transport goods across states, right?

Absolutely. But can we transport anything across states? Not really, right? There are checkpoints at the borders that ensure only certain things are transported, based on predefined rules established by the government.

For the same reason over internet, protocols come into existence. Protocols in computer science are simply the rules that must be followed for transferring information. But who defines these protocols?

These rules are defined by organizations that oversee the web. Some of them include the Internet Engineering Task Force (IETF), the World Wide Web Consortium (W3C), and the Internet Corporation for Assigned Names and Numbers (ICANN). These organizations work together to establish standards, protocols, and policies that ensure the smooth operation and security of the internet

IP And MAC Address

Similar to how states have unique zip and postal codes, each machine on the internet has a unique numeric identity called an IP address. While an IP address might change depending on the network or location, the MAC address is always unique. Every device on a network has both an IP address and a MAC address. These addresses are used at different layers of the data transfer process, as defined by the OSI model.

With all this in mind, let’s dive deeper into the world of web protocols.

HTTP

The protocol defined for transferring information over the web is HTTP — Hypertext Transfer Protocol.

In the early days of the web, it only contained text information. As the amount of text information grew, people started pointing to other text, creating hyperlinks, essentially creating a split in the "roadway."

This concept led to the creation of Hypertext Transfer Protocol (HTTP).

The first public version of HTTP was HTTP/1.0, as HTTP/1.1 came later with several improvements and fixes. HTTP/1.1 defined the standards for information exchange over the web and introduced the request-response architecture, which we’ll explore in a moment.

However, as the web evolved to include images, videos, and other types of content, it became crucial for HTTP to evolve as well. This is where HTTP/2 came in, supporting many more content formats with significant features like multiplexing.

Today, much of the web operates on HTTP/2, while HTTP/1.1 is still used as a fallback protocol. If HTTP/2 is not supported or doesn’t work, HTTP/1.1 will be used instead.

Multiplexing: the ability to handle multiple pieces of information at once.

Recently, HTTP/3 was also introduced with a new transport layer protocol called QUIC ( Quick UDP Internet Connection ), which aims to improve speed and reliability, especially in situations with high latency

Request-Response

As the name suggests, the request-response cycle refers to the process in which one entity requests something, and another responds with a specific answer.

So, what could a computer possibly request? Information, right? So, one computer on the internet requests information, and another computer responds with it.

Now that we understand HTTP, why it exists, and the request-response cycle, let’s explore some key terms related to these cycles.

Client

A client is a computer on the internet that requests information from another computer and processes the received information. In the context of the web, web browsers are typically the clients.

Server

Now that we know what a client is, there must be another computer that responds to the client’s request, completing the request-response cycle. This is where the server comes in. A server is the computer that responds to the client’s request for information.

Headers

Headers are the basic data attached to that information. Think of them like the address and phone number details on a package being delivered. The content of the package can be different, but it can be delivered to the same address. You can view the headers in the network tab while visiting any website

Have you noticed the Request Method and Status Code with green circle ? What are they ? let’s explore a bit

Methods

A method is similar to a mode of transport. In the context of the web, methods define the purpose for which information is being requested. These methods are used on the client side to determine what kind of action the client is requesting from the server. Some common methods and their purposes are outlined below:

METHOD	Purpose
GET	Requests data from a specified resource, commonly used to retrieve information without modifying it.
POST	Sends data to the server to create or update a resource. Often used when submitting forms or uploading files.
PUT	Replaces a current resource with a new one.
DELETE	Removes a specified resource from the server.
PATCH	Partially updates a resource.

Response Code

When a server responds with information, there are certain codes that indicate the status of that information. These are known as response codes. These codes help the client understand whether the request was successful, if there was an error, or if further action is needed. Some common response codes include:

Response Code	Description
200 OK	The request was successful, and the server has returned the requested data.
201 Created	The request was successful, and a new resource has been created (commonly used with POST requests).
400 Bad Request	The server could not understand the request due to invalid syntax.
401 Unauthorized	The request requires user authentication.
404 Not Found	The requested resource could not be found on the server.
500 Internal Server Error	The server encountered an error while processing the request.

Ports

Going back to the analogy, each state can have multiple cities, and each city serves a specific purpose. Similarly, a machine has different ports that other machines can connect to. There are a total of 65,535 ports available on a machine, and just like specific cities are known for specific things, specific ports on a machine are used for different tasks.

HTTPS { HTTP + TLS }

Many of you might have noticed a lock symbol while you visit websites. Well, they indicate that the site is currently encrypted with a TLS certificate. It is similar to the HTTP protocol but with an added layer of security. This ensures that any data exchanged between your browser and the website is encrypted and secure from middleman. It's like putting a lock in the goods while transporting and the key is only with you and the man whom you are delivering to. HTTPS is essential for protecting sensitive information such as login credentials, payment details, and personal data.

Conclusion

To sum it all up: the connection of computers, known as the internet, is responsible for the flow of data. This flow of data is governed by protocols. The internet uses the HTTP protocol for information transfer. HTTP uses the request-response cycle, where a client requests information, and a server responds based on the availability of that information.

This is the architecture that powers the entire web.

If you found the article helpful, consider giving it a like and sharing it!

Data flow over web