Scala MapReduce Framework giving type mismatch

Question

Scala MapReduce Framework giving type mismatch

I have a MapReduce framework in Scala that is based on several org.apache.hadoop libraries. It works great with a simple wordcount program. However, I want to apply it to something useful, and I ended up at the roadblock. I want to take a csv file (or whatever delimiter really is) and pass everything in the 1st column as a key, then count the frequency of the keys.

This code looks like this:

class WordCountMapper extends Mapper[LongWritable, Text, Text, LongWritable] with HImplicits {
  protected override def map(lnNumber: LongWritable, line: Text, context: Mapper[LongWritable, Text, Text, LongWritable]#Context): Unit = {
  line.split(",", -1)(0) foreach (context.write(_,1))  //Splits data
  }
}

The problem occurs in the 'line.split' code. When I try to compile it, I get an error:

found: char required :. org.apache.hadoop.io.Text

line.split ... should return a string that is passed in _ in the (_, 1) record, but because of soem it thinks it is char. I even added .toString to explicitly make it a string, but that didn't work either.

Any ideas are appreciated. Let me know what additional details I can provide.

Update:

Here is the list of imports:

import org.apache.hadoop.io.{LongWritable, Text}
import org.apache.hadoop.mapreduce.{Reducer, Job, Mapper}
import org.apache.hadoop.conf.{Configured}
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
import scala.collection.JavaConversions._
import org.apache.hadoop.util.{ToolRunner, Tool}

Here is the build.sbt code:

import AssemblyKeys._ // put this at the top of the file

assemblySettings

organization := "scala"

name := "WordCount"

version := "1.0"

scalaVersion:= "2.11.2"

scalacOptions ++= Seq("-no-specialization", "-deprecation")

libraryDependencies ++= Seq("org.apache.hadoop" % "hadoop-client" % "1.2.1",
                        "org.apache.hadoop" % "hadoop-core" % "latest.integration" exclude ("hadoop-core", "org/apache/hadoop/hdfs/protocol/ClientDatanodeProtocol.class") ,
                        "org.apache.hadoop" % "hadoop-common" % "2.5.1",
                        "org.apache.hadoop" % "hadoop-mapreduce-client-core" % "2.5.1",
                        "commons-configuration" % "commons-configuration" % "1.9",
                        "org.apache.hadoop" % "hadoop-hdfs" % "latest.integration")


 jarName in assembly := "WordCount.jar"

 mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
  {case s if s.endsWith(".class") => MergeStrategy.last
case s if s.endsWith(".xsd") => MergeStrategy.last
case s if s.endsWith(".dtd") => MergeStrategy.last
case s if s.endsWith(".xml") => MergeStrategy.last
case s if s.endsWith(".properties") => MergeStrategy.last
case x => old(x)
  }
}

+3

java scala mapreduce hadoop

J calbreath 05 nov. '14 at 21:00

source to share

2 answers

I'm guessing line

implicitly converted to String

here (thanks HImplicits

?). Then we have

line.split(",", -1)(0) foreach somethigOrOther

split the string into multiple lines - .split(...)
take zero of these lines - (0)
then move somethingOrOther

over the characters of that line -foreach

This way you get yours char

.

0

Michał Politowski 06 nov. '14 at 9:00

source to share

J calbreath · Accepted Answer · 2014-11-06T16:52:09+0000

I actually solved this by not using the notation and just by specifying the value in context.write. So instead of:

line.split(",", -1)(0) foreach (context.write(_,1))

I used:

context.write(line.split(",", -1)(0), 1)

I found an element online that said that sometime Scala gets confused about datatypes when using _, and recommends only explicitly specifying the value in place. Not sure if this is true, but in this case it solved the problem.

Scala MapReduce Framework giving type mismatch

More articles: