Word count with index
I have to count the first 1o words in a blog post that is being read ... but my code won't allow that to happen. I can't use .split or string isempty or arrays ... which leaves me with indexof and substrings. my code right now only gets the first 3 words ... any help for me there .....
This is what I should be using ....
String getSummary () Method 1. Returns up to the first ten words of the entry as a summary of the entry. If the entry is 10 words or less, the method returns the entire entry. 2. Possible logic. The indexOf method of string classes can find the position of the space. Use this in conjunction with a loop design to find the first 10 words.
public class BlogEntry
{
private String username;
private Date dateOfBlog;
private String blog;
public BlogEntry()
{
username = "";
dateOfBlog = new Date();
blog = "";
}
public BlogEntry(String sName, Date dBlogDate, String sBlog)
{
username = sName;
dateOfBlog = dBlogDate;
blog = sBlog;
}
public String getUsername()
{
return username;
}
public Date getDateOfBlog()
{
return dateOfBlog;
}
public String getBlog()
{
return blog;
}
public void setUsername(String sName)
{
username = sName;
}
public void setDateOfBlog(Date dBlogDate)
{
dateOfBlog.setDate(dBlogDate.getMonth(), dBlogDate.getDay(), dBlogDate.getYear());
}
public void setBlog(String sBlog)
{
blog = sBlog;
}
public String getSummary()
{
String summary = "";
int position;
int wordCount = 0;
int start = 0;
int last;
position = blog.indexOf(" ");
while (position != -1 && wordCount < 10)
{
summary += blog.substring(start, position) + " ";
start = position + 1;
position = blog.indexOf(" ", position + 1);
wordCount++;
}
return summary;
}
public String toString()
{
return "Author: " + this.getUsername() + "\n\n" + "Date posted: " + this.getDateOfBlog() + "\n\n" + "Text body: " + this.getBlog();
}
}
Add this to your code:
public static void main(String[] args)
{
BlogEntry be = new BlogEntry("" , new Date(), "this program is pissing me off!");
System.out.println( be.getSummary() );
}
Produces this output:
this program is pissing me
What's not 3 words, it's 5. You should have 6. And that makes your mistake a lot easier to understand. You are experiencing a typical error in one go . You only add and count the words that appear before the spaces. This leaves the last word as it doesn't appear before the space, only after the last space.
Here's a code close to where you started, can see all 6 words:
public String getSummary()
{
if (blog == null)
{
return "<was null>";
}
String summary = "";
int position;
int wordCount = 0;
int start = 0;
int last;
position = blog.indexOf(" ");
while (position != -1 && wordCount < 10)
{
summary += blog.substring(start, position) + " ";
start = position + 1;
position = blog.indexOf(" ", position + 1);
wordCount++;
}
if (wordCount < 10)
{
summary += blog.substring(start, blog.length());
}
return summary;
}
which when testing with this:
public static void main(String[] args)
{
String[] testStrings = {
null //0
, ""
, " "
, " "
, " hi"
, "hi "//5
, " hi "
, "this program is pissing me off!"
, "1 2 3 4 5 6 7 8 9"
, "1 2 3 4 5 6 7 8 9 "
, "1 2 3 4 5 6 7 8 9 10"//10
, "1 2 3 4 5 6 7 8 9 10 "
, "1 2 3 4 5 6 7 8 9 10 11"
, "1 2 3 4 5 6 7 8 9 10 11 "
, "1 2 3 4 5 6 7 8 9 10 11 12"
, "1 2 3 4 5 6 7 8 9 10 11 12 "//15
};
ArrayList<BlogEntry> albe = new ArrayList<>();
for (String test : testStrings) {
albe.add(new BlogEntry("" , new Date(), test));
}
testStrings[0] = "<was null>";
for (int i = 0; i < albe.size(); i++ ) {
assert(albe.get(i).getSummary().equals(testStrings[Math.min(i,11)]));
}
for (BlogEntry be : albe)
{
System.out.println( be.getSummary() );
}
}
will produce this:
<was null>
hi
hi
hi
this program is pissing me off!
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
Also, I don't know where you are importing Date
, but neither import java.util.Date;
will nor import java.sql.Date;
will it make your code a mistake. I had to comment on your code setDate
.
If your instructor allows it, you can of course try the ideas in these other answers, but I thought you wanted to know what's going on.
I'm not sure how efficient that would be, but can you just truncate the string every time you grab the index? For example:
TempBlog content:
This test
is a test
test
test
Summary content:
This
is a
test
public String getSummary()
{
String summary = "";
int wordCount = 0;
int last;
//Create a copy so you don't overwrite original blog
String tempBlog = blog;
while (wordCount < 10)
{
//May want to check if there is actually a space to read.
summary += tempBlog.substring(0, tempBlog.indexOf(" ")) + " ";
tempBlog = tempBlog.substring(tempBlog.indexOf(" ")+1);
wordCount++;
}
return summary;
}
String.indexOf
and also provides an overload that allows you to search from a specific point ( API link ). With this method, it's pretty easy:
public int countWort(String in , String word){
int count = 0;
int index = in.indexOf(word);
while(index != -1){
++count;
index = in.indexOf(word , index + 1);
}
return count;
}
Try this logic ...
public static void main(String[] args) throws Exception {
public static void main(String[] args) throws Exception {
String data = "This one sentence has exactly 10 words in it ok";
int wordIndex = 0;
int spaceIndex = 0;
int wordCount = 0;
while (wordCount < 1 && spaceIndex != -1) {
spaceIndex = data.indexOf(" ", wordIndex);
System.out.println(spaceIndex > -1
? data.substring(wordIndex, spaceIndex)
: data.substring(wordIndex));
// The next word "should" be right after the space
wordIndex = spaceIndex + 1;
wordCount++;
}
}
Results:
This
one
sentence
has
exactly
10
words
in
it
ok
UPDATE
Isn't regex
it an option? With help regex
you can try the following:
public static void main(String[] args) throws Exception {
String data = "The quick brown fox jumps over the lazy dog The quick brown fox jumps over the lazy dog";
Matcher matcher = Pattern.compile("\\w+").matcher(data);
int wordCount = 0;
while (matcher.find() && wordCount < 10) {
System.out.println(matcher.group());
wordCount++;
}
}
Results:
The
quick
brown
fox
jumps
over
the
lazy
dog
The
The regular expression returns words with the following characters [a-zA-Z_0-9]
I think we can find the index of the first 10 words by checking if the character is a space character. Here's an example:
public class FirstTenWords
{
public static void main( String[] args )
{
String sentence = "There are ten words in this sentence, I want them to be extracted";
String summary = firstOf( sentence, 10 );
System.out.println( summary );
}
public static String firstOf( String line, int limit )
{
boolean isWordMode = false;
int count = 0;
int i;
for( i = 0; i < line.length(); i++ )
{
char character = line.charAt( i );
if( Character.isSpaceChar( character ) )
{
if( isWordMode )
{
isWordMode = false;
}
}
else
{
if( !isWordMode )
{
isWordMode = true;
count++;
}
}
if( count >= limit )
{
break;
}
}
return line.substring( 0, i );
}
}
Output on my laptop:
There are ten words in this sentence, I want