How to split a block into pieces when they overlap

Question

How to split a block into pieces when they overlap

Some input I'm looking for to create a simple minimal bittorrent client. I have been reading the protocol specification for 2-3 days.

that's what my understanding is on it so far. Assuming the torrent is chunk length 26000 bytes

and according to the unofficial spec the chunk size is 16384

. Something like that.

Now, when requested by the message block, the part will look like this:

piece 0 
block offset 0
block length 16484

So far so good.

Now, for the next block, which overlaps at chunk 0 and 1, what the query should look like

piece 0  ## since the start of byte is in piece 0 use piece 0 instead of piece 1
block offset 16384
block length 16384

Now on the receiving end, I need to recreate a chunk of 26000 bytes so that I can compare that with chunks (hash) to match the chunk for correctness.

Do I understand correctly?

Also I assume that the check of the part failed and maybe it is due to the first block ie Block 0 (which is faulty or damaged) then I have to request Block 0 and Block 1 (which was valid as well as part of part 1) for re- transfer again.

And now all of a sudden the snippet and block allocation gets a little tricky than I guess. and I hope there is an easier solution.

Any thought

+3

bittorrent

Ratatouille 02 june 17 at 8:48

source to share

3 answers

The maximum block size generally accepted by clients is 16KB. Customers can make small inquiries.

Pieces are usually a multiple of 16KiB, but the current spec does not require it (this changes with BEP52 ) and some people use prime numbers or similar things for fun, which is why they exist in the wild.

Blocks only exist in the sense that you need multiple queries to get a full chunk larger than 16KB. In other words, blocks are the same as anything you choose to request. You can ask for 500 bytes, then 1017 bytes, and then 13016 bytes, ... until you get the full part. They are arbitrary subdivisions within the part - there is no overlap - you need to keep track of when the part starts loading and ends with the part.

They do not participate in hashing, they do not honor HAVE or BITFIELD messages. Only requests, PIECE, CANCEL and REJECT messages are block related. And instead of blocks, you can also call them length offset chunks or something.

0

the8472 02 june 17 at 14:05

source to share

Will use the clearer term "chunk" instead of the ambiguous "block".

The torrent is split into chunks.
The piece is divided into pieces.
A piece is cut from one piece.

The torrent is split into chunks when it is created. With the Request message, the fragment, in turn, is further divided into chunks by downloading the BitTorrent client.
How the client cuts the chunks out of the chunk doesn't matter as long as no chunks are larger than 16KB (16384 bytes).
The easiest and most rational way to split a chunk is to do it as few chunks as possible by dividing it into 16K chunks and let the last chunk of the chunk be smaller if needed.

Message format Request :<len=0013><id=6><Piece_index><Chunk_offset><Chunk_length>

<Piece_index >

an integer specifying a zero-based piece index

<Chunk_offset>

an integer specifying a zero-based byte offset within a chunk

<Chunk_length>

an integer indicating the number of bytes requested

When requesting a snippet:

the whole fragment must be inside the fragment indicatedPiece_index

,
i.e. Chunk_offset

The + Chunk_length

must be less than or equal to the size of that particular part *.
Chunk_length

cannot be more than 16 KB (16384 bytes) and must be at least 1 byte
the partner that receives the request must have the entity specified Piece_index

If any of the conditions are not met, then the recipient receiving the request will close the connection.

* For all parts except the last one, which is 'piece length'

defined in the info dictionary.
The size of the last part can be calculated as:
size_last_piece = size_of_torrent - (number_of_pieces - 1) * 'piece length'

0

Encombe 05 june 17 at 21:33

source to share

Andrei Tomashpolskiy · Accepted Answer · 2017-06-02T09:49:50+0000

The last block in a chunk may be less than the size of the transfer block. That is, 26000 - 16384 = 9616

bytes must be requested in the second PIECE message. Once all the bytes 26000

have been received, the SHA-1 hash must be calculated and compared with the corresponding checksum from the pieces

metainfo dictionary section. If the checksum does not match, you have no means of knowing which block contains invalid data and must reload all blocks from that part.

My advice would be not to depend on any particular part splitting because: 1) peers may use a different transmission block size when requesting data 2) SHA-1 algorithm is block based and it is better to use in a digester larger block size (otherwise the calculation will take longer).

A native abstraction for a chunk would be common data range

with the following ways:

read(from:int, length:int):byte[]
write(offset:int, block:byte[]):()

Then you will be able to read / write arbitrary subranges of data.

How to split a block into pieces when they overlap

The torrent is split into chunks.

The piece is divided into pieces.

A piece is cut from one piece.

More articles: