Fixed bug with java.lang.OutOfMemoryError: GC high limit exceeded?
This is my java code where I am requesting data from Hive using apache excel sql.
JavaSparkContext ctx = new JavaSparkContext(new SparkConf().setAppName("LoadData").setMaster("MasterUrl"));
HiveContext sqlContext = new HiveContext(ctx.sc());
List<Row> result = sqlContext.sql("Select * from Tablename").collectAsList();
when i run this code it throws java.lang.OutOfMemoryError: GC high limit exceeded. How to fix this problem or how to increase memory in Spark configuration.
source to share
If you use spark-shell
to run it, you can use driver-memory
to limit the memory limit:
spark-shell --driver-memory Xg [other options]
If the performers are having problems, you can adjust their memory limits with --executor-memory XG
You can find more details on how to set them exactly in the guides: view for executor memory, configuration for driver memory.
@Edit: since you are running it from Netbeans, you must pass them as arguments to JVM -Dspark.driver.memory=XG
and -Dspark.executor.memory=XG
. I think it was Project Properties
under Run
.
source to share
Have you found solutions to your problem yet? please share them if you have: D
and here is my idea: rdd as well as javaRDD has a method toLocalIterator()
, spark doc said that
The iterator will consume as much memory as the largest section in this RDD.
this means the iterator will consume less memory than the List, if rdd is divided into many sections, you can try this:
Iterator<Row> iter = sqlContext.sql("Select * from Tablename").javaRDD().toLocalIterator();
while (iter.hasNext()){
Row row = iter.next();
//your code here
}
ps: this is just an idea and I haven't tested it yet
source to share