Java - splitting text into array without obvious delimiter
I need to split each line of text into an array using a loop. The problem is that there is no obvious delimiter used when formatting the text file (which I cannot change):
Adam Rippon New York, NY 77.58144.6163.6780.94
Brandon Mroz Broadmoor, CO 70.57138.1266.8471.28
Stephen Carriere Boston, MA 64.42138.8368.2770.56
Grant Hochstein New York, NY 64.62133.8867.4468.44
Keegan Messing Alaska, AK 61.15136.3071.0266.28
Timothy Dolensky Atlanta, AL 61.76123.0861.3063.78
Max Aaron Broadmoor, CO 86.95173.4979.4893.51
Jeremy Abbott Detroit, MI 99.86174.4193.4280.99
Jason Brown Skokie Value,IL 87.47182.6193.3489.27
Joshua Farris Broadmoor, CO 78.37169.6987.1783.52
Richard Dornbush All Year, CA 92.04144.3465.8278.52
Douglas Razzano Coyotes, AZ 75.18157.2580.6976.56
Ross Miner Boston, MA 71.94152.8772.5380.34
Sean Rabbit Glacier, CA 60.58122.7656.9066.86
Lukas Kaugars Broadmoor, CO 64.57114.7550.4766.28
Philip Warren All Year, CA 55.80113.2457.0258.22
Daniel Raad Southwest FL 52.98108.0358.6151.42
Scott Dyer Brooklyn, OH 55.78100.9744.3357.64
Robert PrzepioskiRochester, NY 47.00100.3449.2651.08
Ideally I would like each name to be in [0] (or a name from the name of [0] in [1]), each location should be in [2] or also in two different indexes for city and state, and then each the score must be in its own index. There are four separate rooms for each person. For example, Adam Ripton's estimates are 77.58, 144.61, 63.67, 80.94
I cannot split the spaces because some cities have a space between their name (for example, New York will be split into New and York on two different array elements, while Broadmoor will be on the same element). It is not possible to separate cities with commas because Southwest FL does not have a comma. I also cannot split the numbers by decimal point, because those numbers would be wrong. So is there an easy way to do this? How possibly a way to divide numbers by the number of decimal places?
source to share
It looks like there is a fixed size for each column. So in your case, column 1 is 17 characters long, the second is 16 characters and the last is 21 characters long.
Now you can just iterate over the lines and use the method substring()
. Something like...
String firstColumn = line.substring(0, 17).trim();
String secondColumn = line.substring(17, 33).trim();
String thirdColumn = line.substring(33, line.length).trim();
To extract numbers, we could use a regular expression that searches for all numbers with two decimal places.
Pattern pattern = Pattern.compile("(\\d+\\.[0-9]{2})");
Matcher matcher = pattern.matcher(thirdColumn);
while(matcher.find())
{
System.out.println(matcher.group());
}
So in this case it 47.00100.3449.2651.08
will output
47.00
100.34
49.26
51.08
source to share
It looks like each column has a fixed size (number of characters). As you said, you cannot split by tabs or spaces because of the last line where there is no bookmark or space between name and city.
I suggest reading one line and then breaking the String into line.substring(startIndex,endIndex)
. For example line.substring(0,18)
for the name (if I calculated correctly). Then you can separate that name in the first and last name using a space as a separator.
source to share
Assuming the fields are fixed width, which is what it appears to be, you can perform substring operations to get each field and then parse accordingly. Something like:
String name = line.substring(0,x)
String city_state = line.substring(x, y)
String num1 = line.substring(y,z)
Etc. where x, y and z are column breaks.
source to share
It seems to be the old old fixed position format. It was very popular in the days of reading punch cards.
So, basically, you read this file line by line and then:
String name = line.substring(0,17).trim();
String location = line.substring(17,33).trim();
String[] scores = new String[4];
scores[0] = line.substring(33,38);
scores[1] = line.substring(38,44);
scores[2] = line.substring(44,49);
scores[3] = line.substring(49,54);
Then you can go ahead and split the name by space, location by ,
, convert scores to numbers, etc.
If you want to make all of the above more general, you can prepare a list of indices and create an array based on those indices:
int[] fieldIndexes = { 0, 17,33,38,44,49,54 };
String values[] = new String[fieldIndexes.length - 1];
And then in your read loop (again, I'm assuming you read line in line
):
for ( int i = 1; i < fieldIndexes.length; i++ ) {
values[i-1] = line.substring(fieldIndexes[i-1],fieldIndexes[i]).trim();
}
And then go to work with the array values
.
Of course, make sure that every line you read has the appropriate number of characters, etc., to avoid limit issues.
source to share
Why don't you split by index? The coordinates are tricky, but if you always have two numbers after the decimal points, then this example might help.
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
public class Split {
public static void main(String[] args) throws IOException {
List<Person> lst = new ArrayList<Split.Person>();
BufferedReader br = new BufferedReader(new FileReader("c:\\test\\file.txt"));
try {
String line = null;
while ((line = br.readLine()) != null) {
Person p = new Person();
String[] name = line.substring(0,17).split(" ");
String[] city = line.substring(17,33).split(" ");
p.setName(name[0].trim());
p.setLastname(name[1].trim());
p.setCity(city[0].replace(",","").trim());
p.setState(city[1].replace(",","").trim());
String[] coordinates = new String[4];
String coor = line.substring(33);
String first = coor.substring(0, coor.indexOf(".") + 3);
coor = coor.substring(first.length());
String second = coor.substring(0, coor.indexOf(".") + 3);
coor = coor.substring(second.length());
String third = coor.substring(0, coor.indexOf(".") + 3);
coor = coor.substring(third.length());
String fourth = coor.substring(0, coor.indexOf(".") + 3);
coordinates[0] = first;
coordinates[1] = second;
coordinates[2] = third;
coordinates[3] = fourth;
p.setCoordinates(coordinates);
lst.add(p);
}
} finally {
br.close();
}
for(Person p : lst){
System.out.println(p.getName());
System.out.println(p.getLastname());
System.out.println(p.getCity());
System.out.println(p.getState());
for(String s : p.getCoordinates()){
System.out.println(s);
}
System.out.println();
}
}
public static class Person {
public Person(){}
private String name;
private String lastname;
private String city;
private String state;
private String[] coordinates;
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getLastname() {
return lastname;
}
public void setLastname(String lastname) {
this.lastname = lastname;
}
public String getCity() {
return city;
}
public void setCity(String city) {
this.city = city;
}
public String getState() {
return state;
}
public void setState(String state) {
this.state = state;
}
public String[] getCoordinates() {
return coordinates;
}
public void setCoordinates(String[] coordinates) {
this.coordinates = coordinates;
}
}
}
source to share
Read line by line, then adjust the appropriate limits on each line. eg:.
private static String[] split(String line) {
return new String[] {
line.substring(0, 16).trim(),
line.substring(17, 32).trim(),
line.substring(33, 37).trim(),
line.substring(38, 43).trim(),
line.substring(44, 48).trim(),
line.substring(49, 53).trim(),
};
}
source to share