Blocks of similar text for test data

For testing purposes, I need to create sets of text files with similar but not identical text. Each set should be different from the other, but also have a common commonality.

For example, I might need to create 10 sets of 20 documents for 200 documents. Each document requires about 250 words.

If one of the sets of documents refers to dogs, then it would be appropriate for the documents of the other sets to be, for example, about animals, so that there was a weak link between each set (in this case, animals) and a strong link between documents within the set (for example, dogs in one set and cats in another set).

Words in documents do not have to be in any particular order, and they do not have to be in sentences or make sense.

Does anyone know how I can generate or receive data of this type for my unit tests?

0


source to share


2 answers


How about grabbing some text from Project Gutenberg ?



+3


source


I need a text indexing test case to compare Solr indexing speed. I have downloaded the source code from github to a zip file. for example this huge one, https://github.com/spring-projects/spring-framework



"download as zip".

0


source







All Articles