A quick note: In the time since I first wrote this piece, some of this material has been covered by other authors. Repetition is key to absorbing information, so here it is again.

A Brief Primer on IP Networking

IP (Internet Protocol) networking is, safe to say, the most prevalent form of data networking on Earth. It’s what makes the Internet work1. It was certainly not the first form of data networking, and it’s not the only form today, but it has spread into pretty much every corner. My goal today is to provide an overview of how IP networking works, because I believe it will help with understanding some of the topics that will come up when I actually get around to talking about email.

Stacked

Any data networking scheme, Internet or otherwise, is made of up a number of elements that work together to transfer data from hither to yon. The combination of elements is usually described as a “stack.” A given element in the stack accepts data from the element above it, performs some operation that furthers the goal of transmitting the data, and passes it off to the element below it2. Elements that are closer to the user are typically described as at the top of the stack. Elements that are closer to the physical wire are at the bottom.

A layer in the stack at one end “talks to” its counterpart layer at the other end. Each layer doesn’t really know, or care, what other layers are doing. Layer 4 (for example) doesn’t care how layers 1-3 work, as long as they are getting the job done of enabling communication with Layer 4 at the other end.

 

Those funny Europeans have a designed-by-committee model that divides networking into seven distinct layers3. Here in ‘Murica, it’s more like four-ish layers.

Protocols

The set of rules for polite and correct communication at any layer is called a protocol. These function very much like protocols in interpersonal communications. Some things are appropriate to say, some inappropriate but tolerated, and some forbidden. IT is chock full to the gills with protocols. Pretty much any acronym you see that ends in “P” refers to a protocol4. I’ve already mentioned one: Internet Protocol. All of these protocols are described in excruciating detail5 in documents misleadingly called “Requests For Comments” (RFCs). I won’t discuss those here.

Physical / Data Link Layers

I’m not going to talk about these bottom-most layers, except to say that the data link protocol most people have heard of is Ethernet, using either twisted pair cabling6 or radio (“Wi-fi”) as the “physical” medium.

Network Layer (IP)

IP sits on top of the data link layer and its primary job is getting data from one computer to another computer. Computers are identified by an “IP address,” examples of which most of you surely have seen. It’s four numbers, between 0 and 255, separated by periods, like so: 192.168.0.1317. This nominally allows for 232 (2 to the 32nd power, ~4 billion) unique addresses, but for various reasons, the actual number of usable addresses is much lower. A significant feature of IP addresses is that they are “routable.” That is, kind of like a telephone number, you can look at the first part of the address, and say generally where a device with that address must be located8. Network devices, strangely called “routers,” move data across the Internet. Central routers have only a general idea where a device with a specific IP address might be (“over there, somewhere”), but as data moves closer to the device, the routers have more detailed information about exactly where that device is.

Transport Layer (TCP and UDP)

The Transport Layer sits above IP, and its job is to provide the infrastructure for specific applications to communicate. In IP networks, there are primarily two protocols that operate at this layer: Transmission Control Protocol (TCP) and User Datagram Protocol (UDP).

TCP is a “reliable, connection-oriented” protocol. It’s like a phone call: when you attempt to establish a connection, you can tell when the connection is made, and when it ends, and you can tell when the data has been garbled. UDP, in contrast, is an “unreliable, datagram oriented” protocol. It’s like US Postal Mail: you put an address on a letter and drop it in the box, and usually it gets to the destination in predictable time, but sometimes it takes longer, and occasionally it is lost altogether.

On any computer, there will be multiple applications that need to communicate over the network. They all have to share the same IP address, so there needs to be a way to identify which network data goes to which application. TCP and UDP both use numbered “ports” for this purpose. The combination of IP address and port number (and protocol, TCP or UDP) uniquely identifies any application running on an Internet-connected computer. Port numbers run from 1 to ~65000, and each application on a computer uses a distinct port. Which port an application uses depends on the role of the application.

An application that provides a service, something that other applications or users will come looking for, is called a “server,”9 like a “web server,” “mail server”, etc. Applications that consume those services, like web browsers or email readers, etc., are called “clients.” When a client wants to establish a new connection to a server, the client needs both the IP address and the port number of the server. IP address can be looked up based on the name of the computer, but servers almost always use “well known,” port numbers, according to the application. Web servers use port 80, mail servers use port 2510, etc. In this way, the client knows the server port number based on what kind of client it is. The client tells the server about its own IP address and port number as part of the connection request, so the port that the client uses is usually unimportant, and typically clients use random-ish port numbers in the 20,000 – 30,000 range.

Application Layer (So many protocols)

This top-most layer is where applications that users actually care about operate: web, email, print, file and directory services, etc. But, for the purposes of this article, the protocols used by the specific applications aren’t important. There are a number of application layer protocols that are specific to email, but I’ll talk about those in a later article.

The Key Learning

After all that, what I really want readers to take away11 is the concept of TCP and UDP port numbers, and that specific application-layer protocols use specific ports, and only those ports. For any protocol, there may be lots of different programs that implement that protocol, but they always use the same ports.

Next installment, I’ll talk about some of the groundwork that needs to be done before you can host an internet connected email service.


1  You can tell because it’s right there in the name.

2  Or the other way ‘round, at the other end.

3  The ISO seven-layer salad. One of the few things I actually remember from Data Networks class, thanks to a juvenile but effective mnemonic devised by my roommate.

4  Just like any acronym that starts with “S” usually means “simple”. Spoiler alert: simple is relative.

5  And yet some people still fuck up protocol implementation. Sometimes accidentally, sometimes not. See also, Microsoft.

6  These days, “Category 5” or “Category 6” cabling is optimized for high-speed Ethernet, but Ethernet over twisted pair was created to run over crappy telephone wiring already present in commercial buildings.

7  This is an IP version 4 (IPv4) address. There is a version 6 of IP (IPv6), which uses longer addresses, to allow for more unique devices in the world.

8  Well, not as much like a phone number since number portability fucked everything up.

9  No to be confused with a rack-mount data center type computer. This is why I avoided that word in Part 2.

10  And other port numbers, too, depending on circumstances, but these are the basic ports. Peruse a complete list here: https://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers.

11 Like a hunk of batter fried cod wrapped in greasy newspaper.