Parquet: reading individual columns into memory
If you can use a hive, creating a hive table and issuing a simple select query would be the easiest option.
create external table tbl1(<columns>) location '<file_path>' stored as parquet;
select col1,col2 from tbl1;
//this works in hive 0.14
You can use the JDBC driver for this and from the java program.
Otherwise, if you want to stay completely in java, you need to change the avro schema to exclude all fields except the ones you want to extract. Then, when you read the file, supply the modified circuit as the reader circuit, and it will only read the included columns. But you will get the original avro entry with the fields excluded, but not the 2D array.
To change the schema, look at org.apache.avro.Schema and org.apache.avro.SchemaBuilder. make sure the modified schema is compatible with the original schema.
source to share