Preg_split without search pattern

I have several thousand text files to parse, this is a product directory that follows a certain pattern.

It has two serial numbers, one of which I have split all the text into an array, each of which is a product.

The problem is that the serial number I used in preg_split is being removed from the product and I need it.

Here's the raw product:

1532.000028-01532.213.00010875-8
TRES ANÉIS, DOIS PENDENTES, DOIS BRINCOS, SENDO UM 
COM 
TARRACHA DE METAL NÃO NOBRE, DE: OURO, OURO BRANCO BAIXO; 
CONTÉM: diamantes, pérola cultivada, pedra, massa; CONSTAM: amassada(s), 
incompleta(s), PESO LOTE: 13,50G (TREZE GRAMAS E CI NQUENTAR$ 901,00
Valor Grama: 66,74

      

The first numbers are two serials, they stick together with the flaws of the PDF parser.

Here's REGEX I'm using to split the array into products:

$texto = preg_split("/([0-9]{4}[.][0-9]{6}[-][0-9]{1})+/",$texto);

      

Output:

1532.213.00010875-8
TRES ANÉIS, DOIS PENDENTES, DOIS BRINCOS, SENDO UM 
COM 
TARRACHA DE METAL NÃO NOBRE, DE: OURO, OURO BRANCO BAIXO; 
CONTÉM: diamantes, pérola cultivada, pedra, massa; CONSTAM: amassada(s), 
incompleta(s), PESO LOTE: 13,50G (TREZE GRAMAS E CI NQUENTAR$ 901,00
Valor Grama: 66,74

      

As you can see, the first serial number is removed from the output. I need it. How can I split these products while keeping both arrays?

+3


source to share


1 answer


Change the capture group to lookahead , for example:



$texto = preg_split("/(?=[0-9]{4}[.][0-9]{6}[-][0-9]{1})/",$texto);

      

+6


source







All Articles