Messed up between ports and sockets
Ok, so when I tried to investigate ip addresses, ports and sockets, this is what I got from it:
IP addresses are used to map different devices across a network.
Port numbers are used to access a specific application on hosts.
Sockets are a combination of the two.
What I don't understand is that if ports connect you to a specific application, you should only have one port number per application? But, for example, port 80 is used for HTTP, so if the application is using that port, is it listening for HTTP requests correctly? So what happens if more than one person tries to access it? The sockets and ports have confused me a lot.
source to share
A socket is an abstraction used in software that makes it easier for programmers to send and receive data over networks. It is the interface that you use in application-level code to access the underlying network protocol implementations provided by your operating system and language version.
TCP, IP, and other popular networking protocols have no concept of "sockets" by themselves. "Sockets" is the concept with which the TCP / IP developers originated.
So what exactly is a "socket"? Basically, an object from which you can write data and read data. "Opening" a socket means creating one of these objects in your program memory. You can also "close" the socket, which means freeing up any system resources that this object is using behind the scenes.
Certain kinds of sockets can be "bound" to local and remote addresses, which you can think of as setting data fields or properties on a socket object. The meaning of these fields affects what happens when you read or write to the socket.
There are various types of sockets on Unix. If you "open" a TCP socket , bind it to local and remote addresses (and ports) and write some data to it, your libraries / OS will pack this data into a TCP segment and send it through any network interface corresponding to the local address to to which you "bind" the socket. If you "open" an IP socket and write some data to it, that data will be packed into an IP packet (without any added TCP headers) and sent. If you open a raw, link-level socket and write to it, the data will be sent as the payload of a link-level frame, minus the IP and TCP headers. There are also "Unix Domain Sockets"... If you open one of them and write to it, the data will flow directly through the system memory to another process on the same computer.
So, although they are often used in languages ββother than OO, such as C, sockets are a great example of what OO languages ββcall "polymorphism". If you ever have trouble explaining what "polymorphism" is to someone, just teach them about network sockets.
Ports are a completely different concept. The idea of ββ"ports" is built into TCP and other transport protocols.
Others may provide more high-volume and possibly more technically accurate definitions of "port". Here's the one that goes all the way down to earth:
"Port" is the number that appears in the TCP headers on the TCP segment. (Or UDP headers on a UDP segment.)
Just a number. Nothing more, nothing less.
If you are using a "socket" based interface to do network programming, the significance of this number is that each of your TCP or UDP sockets has a "local port" property and a "remote port" property. As I said, setting these properties is called "binding".
If your local port property is "bound" to 80, then all TCP segments you send will have "80" in the "source port" header. Then, when others reply to your messages, they put "80" in their "destination port" headers.
Moreover, if your socket is "bound" to local port 80, then when data comes from elsewhere, addressed to your port 80, the OS will pass it to your application process, not some other. Then, when you try to read from the socket, that data will be returned.
Obviously, the OS needs to know which port each of your sockets is bound to. Therefore, when "binding", system calls must be made. If your program does not run with sufficient privileges, the OS may refuse to bind to a specific port. Then, depending on the language you are using, your networking library will throw an exception or return an error code.
Sometimes the OS may refuse to bind to a specific port, not because you don't have permission, but because another process is already bound to it. However, and this is what some of the other answers are wrong, if certain flags are set when you open a socket, your OS may allow more than one socket to bind to the same local address and port.
You still don't know what "listening" and "connected" sockets are. But once you understand the above, it is just a little jump.
The above explains the difference between what we call a "socket" today and what we call a "port". What else may be unclear: why should we make this distinction?
You really made me think here (thanks)! Can we call a software abstraction that is called "socket" "port" instead, so instead of calling socket_recv
you call port_recv
?
If you're only interested in TCP and UDP, this might work. Remember, the socket abstraction isn't just for TCP and UDP. It is also for other network protocols as well as for communication between processes on the same computer.
Then again the TCP socket is not only displaying the port. The TCP socket connection maps a local IP address, a local port, a remote address, and a remote port. It also has other related data, including various flags, send and receive buffers, sequence numbers for incoming / outgoing data streams, and various other variables used for congestion control (rate limiting), etc. This data does not belong only to the local port.
Thousands of TCP connections can be connected simultaneously through the same "port". Each of these connections has its own associated data, and the software object that encapsulates the data for each connection is a "TCP socket".
Even if you only use TCP / UDP, and even if you only have one process using any local port at a time, and even if you only have one connection going through each local port at a time, I think an abstraction " socket "still makes sense. If we only called sockets "ports", there would be more meanings in one word. Reusing the same word for too many meanings makes communication difficult.
Ports are transport-level identifiers for an application process. "Sockets" are objects used in software to send / receive messages addressed from these identifiers.
The differentiation between "my address" and "the thing that sends emails addressed as coming from me" is a useful distinction. "My address" is just a shortcut. The label is not something active, which does things like sending data. It is logical to give "the thing that is used to send data" its own name, different from the name, which means "the address of the sender to which the data is marked".
source to share
When an application (like a web server like Apache or Nginx) listens on port 80, it creates a so-called listening socket.
When some client arrives, that listening socket receives an update (which can be seen through the select
or poll
API) and our application creates a communication socket. This socket is uniquely identified by tuple (src_addr, src_port, dst_addr, dst_port)
- it is very possible that many clients will have the same (dst_addr, dst_port) combination.
Our web server can then talk over that communication socket to deliver the web page and eventually close that socket. When many clients come in parallel, the web server can either create a thread / process for each client (Apache model), or serve all sockets one by one (Nginx model).
Note that in this situation there may be only one listening socket per port - multiple applications cannot bind to the same port as 80. But it is quite normal to have many communication sockets (some people report successfully serving over a million concurrent requests).
source to share
Every time you accept a socket connection in a listening state (for example, on port 80), you will receive a new socket in an established state that represents the connection.
On the client side, every time a new connection is created with this address and port (new socket that connects), the operating system assigns a random port on your side.
For example, if you connect two times:
your-host:22482 <---> remote-host:80
your-host:23366 <---> remote-host:80
source to share