Fixed bug with java.lang.OutOfMemoryError: GC high limit exceeded?

This is my java code where I am requesting data from Hive using apache excel sql.

JavaSparkContext ctx = new JavaSparkContext(new SparkConf().setAppName("LoadData").setMaster("MasterUrl"));
HiveContext sqlContext = new HiveContext(ctx.sc());
List<Row> result = sqlContext.sql("Select * from Tablename").collectAsList();

      

when i run this code it throws java.lang.OutOfMemoryError: GC high limit exceeded. How to fix this problem or how to increase memory in Spark configuration.

+3


source to share


2 answers


If you use spark-shell

to run it, you can use driver-memory

to limit the memory limit:

spark-shell --driver-memory Xg [other options]

If the performers are having problems, you can adjust their memory limits with --executor-memory XG



You can find more details on how to set them exactly in the guides: view for executor memory, configuration for driver memory.

@Edit: since you are running it from Netbeans, you must pass them as arguments to JVM -Dspark.driver.memory=XG

and -Dspark.executor.memory=XG

. I think it was Project Properties

under Run

.

0


source


Have you found solutions to your problem yet? please share them if you have: D

and here is my idea: rdd as well as javaRDD has a method toLocalIterator()

, spark doc said that

The iterator will consume as much memory as the largest section in this RDD.



this means the iterator will consume less memory than the List, if rdd is divided into many sections, you can try this:

Iterator<Row> iter = sqlContext.sql("Select * from Tablename").javaRDD().toLocalIterator();
while (iter.hasNext()){
    Row row = iter.next();
    //your code here
}

      

ps: this is just an idea and I haven't tested it yet

0


source







All Articles