How to generate UUID in Mapreduce?

Question

How to generate UUID in Mapreduce?

I want to write a MapReduce java program where I need to generate a UUID for a dataset in a csv / txt file. The data will be customer data with a set of rows and columns. The csv input is in the HDFS directory.

You just need to generate the UUID using Mapreduce. I have an input file that has colors a, b and c and has 5 lines. I want column d with a UUID with 5 rows, i.e. 5 different UUIDs

How can i do this?

Here is the code for the Mapper class:

public class MapRed_Mapper extends Mapper {

public void map(Text key, Text value, Context context) throws IOException, InterruptedException
{
     Text uuid = new Text(UUID.randomUUID().toString());
    context.write(key, uuid);
}

}

+3

java mapreduce hadoop bigdata apache-spark

Rishab Oberoi 07 jul. 17 at 19:48

source to share

2 answers

Maybe I'm not asking the question, but you can just create a UUID for each call to the card by doing:

@Override
public void map(Text key, Text value, Context context) throws IOException, InterruptedException
{
    context.write(key, new Text(UUID.randomUUID().toString());
}

0

Thomas jungblut 10 jul. 17 at 9:20 am

source to share

Ram Ghadiyaram · Accepted Answer · 2017-07-08T08:42:57+0000

Approach using Mapreduce java

1) Read your lines in mapper class map method from text file

2) add the UUID as below in the minify method as an extra column (use one reducer to shrink your csv as an extra column)

3) emit it through context.write

java.util.UUID

available with JDK 5.

Generate a random UUID (universally unique identifier).

To get the value of the generated random string, we need to call the method UUID.toString()

.

    UUID uuid = UUID.randomUUID();
    String randomUUIDString = uuid.toString();

    System.out.println("Random UUID String = " + randomUUIDString);
   // System.out.println("UUID version       = " + uuid.version());
   // System.out.println("UUID variant       = " + uuid.variant());

For CSV Generation:
Use TextOutputFormat

. The default key / value separator is a tab character. Change the delimiter by setting the property mapred.textoutputformat.separatorText

to your driver.

conf.set("mapred.textoutputformat.separatorText", ",");

Original approach (since you added the spark shortcut which I thought of pointing below the pointer):

There is an already existing answer on SO, see pls.

add-a-new-column-to-a-dataframe-new-column-i-want-it-to-be-a-uuid-generator

Then you can do below to convert to csv format.

df.write.format("com.databricks.spark.csv").save(filepath)

How to generate UUID in Mapreduce?

Approach using Mapreduce java

Generate a random UUID (universally unique identifier).

Original approach (since you added the spark shortcut which I thought of pointing below the pointer):

More articles: