How do I write a java program to filter all commented lines and print only the java encoding lines?

I tried using regex to filter out single and multi-line comments from my text file. I can filter all comments like

//it works
/*
* welcome
*/
/* hello*/

      

but i cant delete the following comment

/*
sample
*/

      

This is my code:

import java.io.*;
import java.lang.*;


class TestProg
{
public static void main(String[] args) throws IOException {
    removeComment();
}
static void removeComment() throws IOException
{
    try {
        BufferedReader br = new BufferedReader(new FileReader("d:\\data.txt"));
        String line;
        while((line = br.readLine()) != null){
            if(line.contains("/*") && line.contains("*/") || line.contains("//")) {

                System.out.println(line.replaceAll("(?:/\\*(?:[^*]|(?:\\*+[^*/]))*\\*+/)|(?://.*)","")); 
            }
            else if(line.contains("/*") || line.contains("*") || line.contains("*/")) {

                continue;
            }
            else
                System.out.println(line); 
        }
        br.close();
    }

    catch(IOException e) {
        System.out.println("OOPS! File could not read!");
    }
}
}

      

Please help me to solve this problem ...

Thanks in advance.

+3


source to share


4 answers


Using javaparser you can solve it as shown in this PoC.

RemoveAllComments

import japa.parser.JavaParser;
import japa.parser.ParseException;
import japa.parser.ast.CompilationUnit;
import japa.parser.ast.Node;
import java.io.File;
import java.io.IOException;

public class RemoveAllComments {

    static void removeComments(Node node) {
        for (Node child : node.getChildrenNodes()) {
            child.setComment(null);
            removeComments(child);
        }
    }

    public static void main(String[] args) throws ParseException, IOException {
        File sourceFile = new File("Test.java");
        CompilationUnit cu = JavaParser.parse(sourceFile);
        removeComments(cu);
        System.out.println(cu.toString());
    }
}

      

TestClass.java is used as example input source



/**
 * javadoc comment
 */
class TestClass {

    /*
     * block comment
     */
    static class Cafebabe {
    }

    // line comment
    static interface Commentable {
    }

    public static void main(String[] args) {
    }
}

      

output to stdout (save it to a file is up to you)

class TestClass {

    static class Cafebabe {
    }

    static interface Commentable {
    }

    public static void main(String[] args) {
    }
}

      

+1


source


Try this code

import java.io.*;
import java.lang.*;

class Test {

 public static void main(String[] args) throws IOException {
 removeComment();
 }

 static void removeComment() throws IOException {
  try {
      BufferedReader br = new BufferedReader(new FileReader("d:\\fmt.txt"));
      String line;
      boolean comment = false;
      while ((line = br.readLine()) != null) {
      if (line.contains("/*")) {
          comment = true;
          continue;
      }
      if(line.contains("*/")){
          comment = false;
          continue;
      }
      if(line.contains("//")){
          continue;
      }
      if(!comment){
      System.out.println(line);
      }
    }
    br.close();
 }

 catch (IOException e) {
    System.out.println("OOPS! File could not read!");
  }
 }
}

      

I have provided below code as input:



package test;
public class ClassA extends SuperClass {
 /**
 * 
 */
    public void setter(){
    super.set(10);
    }
  /*  public void printer(){
    super.print();
    }
*/    
    public static void main(String[] args) {
//  System.out.println("hi");
    }    
}

      

My conclusion:

package test;
public class ClassA extends SuperClass {
    public void setter(){
    super.set(10);
    }
    public static void main(String[] args) {
    }    
}

      

0


source


Since you are reading each line separately, you cannot apply a single regex to it. Instead, you will have to search for comments one line at a time ( //.*

), and the beginning and end of multi-line comments ( /\*.*

and .*\*/

). If you find a multi-line start of a comment, keep this in mind and treat everything as a comment until you encounter an end match.

Example:

boolean inComment = false;
while((line = br.readLine()) != null){
  //single line comment, remove everything after the first //
  if( line.contains("//") ) {
     System.out.println(line.replaceAll("//.*","")); 
  } 
  //start of multiline, remove everthing after the first /*
  else if( line.contains("/*") ) { 
    System.out.println(line.replaceAll("/\*.*","")); 
    inComment = true;
  }
  //end of multiline, remove everthing until the first */
  else if( line.contains("*/") {
    //note the reluctant quantifier *? which is necessary to match as little as possible 
    //(otherwise .* would match */ as well)
    System.out.println(line.replaceFirst(".*?\*/","")); 
    inComment = true;
  }
  //inside a multiline comment, ignore the entire line
  else if( inComment ) {
    continue;
  }

      

Edit: important addition

In your question, you are talking about text files, which usually have a regular structure and so you can apply my answer.

But as you pointed out in the title, if the files contain Java code then you have an irregular problem domain i.e. Java code. In this case, you cannot apply regex safely and it is better to use a Java parser.

For more information see here: RegEx match open tags other than XHTML standalone tags While this is about applying regular expressions to HTML, this is true for applying regular expressions in Java, since both are irregular problem domains.

-1


source


Try using the following code:

// Read the entire file into a string 
BufferedReader br = new BufferedReader(new FileReader("filename"));
StringBuilder builder = new StringBuilder();
int c;
while((c = br.read()) != -1){
    builder.append((char) c);
}
String fileData = builder.toString();


// Remove comments
String fileWithoutComments = fileData.replaceAll("([\\t ]*\\/\\*(?:.|\\R)*?\\*\\/[\\t ]*\\R?)|(\\/\\/.*)", "");
System.out.println(fileWithoutComments);

      

It first reads the entire file into a line and then removes all comments from it. An explanation of regex can be found here: https://regex101.com/r/vK6lC4/3

-1


source







All Articles