Transferring data in HTTP

I am trying to understand the basics of backend HTTP servers and clients in relation to how they transfer data. I have read many articles on how HTTP works, but I have not found an answer to some of my questions. I would like to go through the process of loading a web page as I understand it, and I would appreciate it if you could make me notice where I went wrong.

  • When I visit the site, my browser requests an HTML file on the server, as my browser creates a socket, binds it to my ip address and connects it to the server listening socket of the site I am visiting. To connect my browser socket to the server I need the port number and hostname, the port number is 80 because it is HTTP and the hostname is obtained through DNS resolution. Now that there is a connection between the sockets, my browser sends a GET request. This request is an ASCII file with content corresponding to the HTTP request. My browser writes raw ASCII bytes to the socket and writes to the server socket.

  • The server is writing back the HTML file I requested on the socket. The HTML server sent by the server is just an ASCII file that the server will write byte by byte to the socket.

  • My browser receives an ASCII file and parses it. Suppose here that it finds an image label. The browser makes an HTTP request for this image file. Here's something I don't understand. How does the server respond? As far as I can tell, the server should send back an ASCII file formed by a set of headers followed by a CRLF and then a message body. In this case, assuming my browser requested .jpeg, does the server write the headers as plain ASCII text to the socket and then write the raw image bytes to the socket?

  • If there are multiple images in the HTML file, do we open a socket per image (for each request)?

  • Let's assume my browser now finds the javascript tag. When the server responds to my request for this script, does the server write the ASCII bytes of the script source to the socket? What's going on with js libraries? Is a server required to send all the source code for each one?

  • When writing data to sockets: is writing (2) the correct way to do all this between sockets?

  • When transferring large files: If I click a button on the site that allows you to upload a large PDF, how is this achieved by the server? I am assuming the server is trying to stream this in chunks. As far as I can tell, there is an option for interleaved encoding. Is it so? If so, then the file is split into chunks and they are appended to the ASCII response and written byte by byte to the socket?

Finally, how is the video transmitted? I know encoding and transferring videos would require whole books to explain in detail, but if you could say something about the general features of video transfer (e.g. on YouTube), I would appreciate it.

Anything you could say about HTTP at the socket level would be appreciated. Thank.

+3


source to share


3 answers


All of my answers below are for HTTP / 1.1, not HTTP / 2:

3.-My browser receives the ASCII file and parses it. Suppose here that it finds an image label. The browser makes an HTTP request for this image file. Here's something I don't understand. How does the server respond? As far as I can tell, the server should send back an ASCII file formed by a set of headers followed by a CRLF and then a message body. In this case, assuming my browser requested .jpeg, does the server write the headers as plain ASCII text to the socket and then write the raw image bytes to the socket?

yes, as usual. It is possible that it is encoded in a different format (gzip, brotli), or it can be flagged if not installed Content-Length

.

4.- If there are multiple images in the HTML file, do we open a socket to image (for each request)?

In HTTP / 1, modern browsers will open up to 6 sockets per host, but no more. If more than 6 requests are sent to the same host, it will wait until other responses are received.

5.- Suppose my browser now finds the javascript tag. When the server responds to my request for this script, does the server write the ASCII bytes of the script source to the socket? What's going on with js libraries? Is a server required to send all the source code for each one?

Usually yes, you need 1 HTTP request per javascript file. There are some server side tools out there that bundle javascript sources together with their dependencies in a single javascript file. Note that javascript sources are usually UTF-8, not ASCII.



6.- When writing data to sockets: is writing (2) the correct way to do all this writing between sockets?

Dunno! Not guy C

7.- When transferring large files: If I click a button on the site that allows you to upload a large PDF, how is this achieved by the server? I am assuming the server is trying to stream this in chunks. As far as I can tell, there is an option for interleaved encoding. Is it so? If so, is the file split into chunks and they are appended to the ASCII response and written byte by byte to the socket?

No, it is chunked

used for HTTP responses for which the content length is unknown in advance. The "decoupling" you're talking about is done at the IP / TCP layer, not the HTTP protocol layer. From the point of view of HTTP, this is just one continuous stream.

Finally, how is the video transmitted? I know encoding and transferring videos would require whole books to explain in detail, but if you could say something about the general features of transferring videos (e.g. on YouTube), I would appreciate it.

Too broad for me to answer.

+1


source


HTTP, sockets, streaming and packet transfer are different topics.

HTTP is a communication protocol for requesting or sending data. Sockets are not used regularly by web developers because they are not very network friendly due to the need for a persistent connection. How your browser handles HTTP requests usually shouldn't bother you.



For large chunks of data such as video, streaming is perhaps the best technique because you don't need synchronization between the client and server or an always-on connection, such as with sockets. The streaming method is entirely up to you and the language you have on the server to share your content.

If you want to learn more about HTTP, I recommend you read up a bit on RFCs like RFC 7230 or RFC 7231 . To understand how data is transferred, you should really know the basics of Abstraction Layers and for video streaming, you can learn how to make one streaming video server with NodeJs (you can choose another language as you like) or just search and install the NPM package. which is already doing the job for you.

0


source


It is highly recommended that you read High performance web browser .

About HTTP

HTTP is a message structuring protocol. It can be built on top of TCP / IP or UDP or any other communication protocol.

IP solves the problem of determining which computer on the network should have a message, and TCP solves the problem of ensuring that the message is received despite the interference. UDP does what TCP does, but without any important guarantees that make it better in some situations, such as streaming video.

HTTP only solves the problem of what messages should look like so everyone can understand what you mean. An HTTP message consists of a header and a body. The body is the message you want to send; the header contains meta information about the status of the message itself. HTTP allows you to structure your applications in a meaningful, context-oriented way through a standard set of terms.

For example, you can send your body's character encodings using HTTP, how long is your content, whether you agree to receive it in a compressed format, etc. etc. So no, HTTP is not limited to ASCII texts - you can send UTF-8 encoded characters with BOM tags, or no encoding at all. All HTTP is allowing you to ask for things the way you want it and tell the recipients how you packaged the message.

The actual responsibility for handling messages, not structured messages, is TCP / IP and UDP. HTTP has nothing to do with this. Both TCP / IP and UDP add overhead, but it's worth it to keep the communication going smoothly.

About sockets

Computers are listening on "sockets", which is just a fancy name to refer to a communication channel. It doesn't matter which socket is just a common name used to refer to a communication channel, be it a wire or a wireless radio. All that matters is what the socket can do. Computers can send bytes to a socket (called a "flush") and read bytes sent over the socket. Sockets always contain a certain amount of memory reserved for incoming messages (like a mailbox) called a buffer, and can even bind many messages together and send them together in one shot to save time.

Hardware-level sockets are usually passed to a network card, which makes it possible to talk to a wireless network or with an Ethernet cable. Note that there can be many more sockets on a computer than cables. This is because a socket is a common name for one link, and one NIC / Ethernet card can handle multiple links. The ability to process multiple channels simultaneously is called multiplexing.

TCP / IP and UDP are just blueprints - in fact, the operating system is responsible for what they do, and most operating systems have some kind of software designed to implement these standards. At the software level, how information is read and written becomes a little more complicated than just transferring bytes, since the computer must also interrupt its running programs when a hardware event occurs, including when transferring from a socket - here is the link for how the kernel Linux implements TCP / IP .

All operating systems provide a set of calls to start listening (bind) a socket, read a socket, and write to a socket. However, you can read from a socket in several ways. These range from basic select()

and [ poll()

] in most Linux distributions, which forces the program to wait until all the requested data for has been received and then read into epoll () on Linux, which allows the program to ask for notification when data has been received before reading it.

Windows exports a completely different set of system calls, so it might be a good idea to consult the reference guide if you plan on building Windows applications.

About TCP / IP

TCP / IP is a combination of two protocols that have become the norm in most cases to ensure reliable communications.

IP is responsible for the term IP address. Each computer has a unique address associated with it, specified as both a 32-bit number (IPv4) and a 128-bit number (IPv6 or IP version 6). Note that these addresses do not exist outside the network: a network is just a collection of computers, and a computer address only makes sense in this collection. The network from which the computer comes is part of the computer's IP address; the network itself is assigned a unique address; and a network can be composed of multiple networks. IP introduces the concept of a port, which is essentially synonymous with the concept of a socket.

I'm just tossing the term "network" willy-nilly as an abstract concept, but physically it boils down to a router. A router is a special computer responsible for figuring out who is being referenced in a message using the IP address attached to the message to assign IP addresses to computers that know it (a network is literally a collection of computers that the router knows about ), as well as for forwarding messages to other computers or routers. The internetwork (or simply the Internet) is simply a bunch of routers, each with its own network, capable of communicating with each other to form one giant network of connected networks. In fact, the router implements the IP standard.

TCP and UDP are designed to solve a different problem: how to ensure that all your messages get through. Sending any message over a common communication channel, for example, wireless or even wired channels organized as a bus topology , is inherently messy - different messages can overlap, messages can get lost unexpectedly, messages can get corrupted, and so on. TCP seeks to solve these problems by ensuring that all messages get through. On the other hand, UDP does not provide such guarantees and thus saves time by skipping many TCP steps.

TCP and UDP put the message in packets of a certain size so that the message can be sent as quickly as possible. TCP additionally adds some additional structure to the exchange called a three-way handshake:

  • It sends a TCP specific message, called a SYN packet, to the computer it wants to send the message to and waits for a response.
  • If the target computer accepts it, it responds with a SYN ACK. Upon receipt of this, the source computer responds with an ACK packet. This allows both computers to know each other by listening and they can start sending packets.
  • On the other hand, if some source or target computer does not hear anything after a while, they wait for a while and then send again and wait a few more. Each time they have to wait, they wait twice as long as the last time, until the maximum timeout period is reached and they terminate the connection. This is called exponential shutdown and is the key to TCP.

A three-way handshake ensures everyone is ready and willing to listen. However, the fun doesn't stop here:

  • As part of the handshake, the source computer indicates that it will launch an initial specified number of packets, each of a specified size.
  • After the handshake, the source computer drops the specified packets and waits for an ACK for each packet sent. If it doesn't receive an ACK for any packet, it goes into exponential shutdown before resending that packet.
  • At the same time, the target computer was asked to wait for a certain number of packets, so it waits until all of them are found. Packets can fail, depending on how the intermediate network routers decide to optimize the path for each packet, so each packet is appended with a specific message indicating their order, and the target computer sorts them together into one neat message.
  • Once the source receives the ACK, it uses the total time it takes to see how much it can send next. The better the response time, the more packets TCP is ready to send.

UDP skips the 3-way handshake. He only sends pieces. It is not guaranteed that your entire message will reach you. It is not guaranteed that it will be sent in order (as opposed to received in order). This is ideal for cases where high network reliability means that most of your messages are likely to arrive, but where it doesn't matter if it all arrives (for example, it's okay if some frames in the video don't arrive).

About video

Video is fundamentally no different from any other content format. It is perfectly possible to use HTTP for video. Whether TCP is appropriate is another matter, but not a bad thing - Skype uses both UDP and TCP .

The entire video consists of several bytes. How these bytes are to be interpreted is the job of encoding. Videos can have many encodings: avi

and mp4

come to mind. With HTTP, you can specify the content encoding as part of the message headers.

HTTP allows compression of content, including video. HTTP also allows you to request that the connection be kept alive - i.e. The three-way handshake does not need to be repeated after sending the full message. An HTTP extension called websockets has been developed that leverages these two features to provide support for real-time video transmission. This only optimizes the arrival of the video, so it doesn't look laggy, but it doesn't change the way the video is received.

Of course, sometimes you need more guarantees about the video, and there are many, many tricks that can be used to maintain high quality video in low internet speeds, or to get multiple people to subscribe to a live stream, etc. This is when you need to get creative. But otherwise, video content will not fundamentally differ from any other type of content.

Answer your questions

When I visit a site, my browser requests an HTML file from the server, for which my browser creates a socket, binds it to my ip address and connects to the visiting site's server listening socket. To connect my socket to the server I need the port number and hostname, the port number is 80 because it is HTTP and the hostname is obtained through DNS resolution. Now when it's a socket-to-socket connection my browser is sending a GET request. That request is an ASCII file with content corresponding to the HTTP request. My browser writes raw ASCII bytes to a socket and writes to a server socket.

HTTP does not require port 80. This is the convention that port 80 is the default port for servers using HTTP and 443 for HTTPS, but either port can be used if the other port is not busy.

You will not get the hostname from DNS. In fact, the opposite is true: you supply the hostname and get the IP address from DNS. This is the IP address that is used to identify a location on another network.

No ASCII required for response. Headers, yes, should be interpreted as ASCII as they are part of an international standard that was developed before UTF-8 was known, but such restrictions are not required for the body. In fact, content encoding has traditionally been passed as the header itself, which a browser or client can use to automatically decode the body's content.

The server writes back the HTML file I requested for the socket. The HTML sent by the server is just an ASCII file that the server will write byte by byte to the socket.

Yes, except that it is not needed for ASCII.

My browser receives an ASCII file and parses it. Suppose here that it finds an image label. The browser sends an HTTP request for this image file. Here's something I don't understand. How does the server respond? As far as I can tell, the server should send back an ASCII file formed by a set of headers followed by a CRLF and then a message body. In this case, assuming my browser asked for .jpeg, does the server write the headers as plain ASCII text in and then write the raw image bytes to the socket?

Yes.

If there are multiple images in the HTML file, do we open a socket to the image (per request)?

See this answer . HTML is always loaded first before image requests are fired, and images are always requested in the order they appear in the DOM. If you have 24 images in Chrome, 6 of them will download in parallel at a time, which means four parallel connections.

You can additionally answer this yourself by opening the Networking tab in the Chrome console and checking the parallel disabling of image requests.

Let's assume my browser now finds the javascript tag. When the server responds to my request for this script, does the server write the ASCII bytes of the script source to the socket? What's going on with js libraries? Does the server have to send all the source code for each one?

In the HTML spec, you can choose in which order you want to load the Javascript files .

Yes, the server writes bytes. Bytes do not have to be ASCII encoded. The titles will be in ASCII. Yes, the server has to send the source code for each library. This is why an important part of web optimization is minimizing Javascript file sizes and combining all libraries into one file to reduce the number and size of requests.

When writing data to sockets: is writing (2) the correct way to do all this writing between sockets?

This is by far the easiest way to write to an open file descriptor on Linux kernels. Everything in Linux is treated as a file, including sockets, so yes, sockets have file descriptors and can be written that way.

There are more sophisticated ways to accomplish this, all of which are listed in the man page for write

. However, most languages ​​support writing to sockets using glue code for manual invocation write()

through a friendlier interface. Perhaps the only time you'll need to explicitly call write()

in C is if you're writing programs at the kernel level or on embedded hardware.

When transferring large files: if I click a button on a site that allows you to upload a large PDF, how is this achieved by the server? I am assuming the server is trying to stream this in chunks. As far as I can tell, there is an option for interleaved coding. Is this the way to go? If so, the file is split into chunks and they are appended to the ASCII response and write byte by byte to the socket?

See the TCP / IP section I wrote above. The HTTP standard allows you to get away with breaking the message into higher order chunks before allowing TCP to re-place it, so you can get by with the small segments that arrive at a time.

Finally, how is the video transmitted?

See the video section I wrote above.

0


source







All Articles