Errors Resolving org.apache.hadoop dependencies with SBT offline

I am trying to freeze dependencies for a spark project so that it can work offline (sbt could no longer load dependencies). This is the process I followed:

  • Create an sbt project and compile using an internet connection.
  • Stop internet connection
  • Make sure projects keep compiling
  • Duplicate SBT project and delete TARGET folder
  • Tell the Build.sbt file the dependency permissions from the /.ivy2/cache folder

This is the build.sbt file:

name := "Test"

version := "1.0"

scalaVersion := "2.10.4"

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.0"

resolvers += Resolver.file("Frozen IVY2 Cache Dependences", file("/home/luis/.ivy2/cache")) (Resolver.ivyStylePatterns) ivys "/home/luis/.ivy2/cache/[organisation]/[module]/ivy-[revision].xml"  artifacts  "/home/luis/.ivy2/cache/[organisation]/[module]/[type]s/[module]-[revision].[type]"

      

In fact, the process that should come to this Build.sbt was exactly the same as described here (and not answered):

Problems with sbt compilation offline using links org.apache.hadoop / *

I have included the appropriate ivy style templates to point to the right ivy- [revision] .xml file.

When I compile, sbt finds the correct path to the frozen .ivy2 / cache repository for each dependency, however I get warnings and errors parsing the "ivy- [revision] .xml. Original" file for these four dependencies:

[warn]  Note: Unresolved dependencies path:
[warn]          org.apache.hadoop:hadoop-mapreduce-client-app:2.2.0
[warn]            +- org.apache.hadoop:hadoop-client:2.2.0
[warn]            +- org.apache.spark:spark-core_2.10:1.3.0 (/home/luis/Test/build.sbt#L7-8)
[warn]            +- Test:Test_2.10:1.0
[warn]          org.apache.hadoop:hadoop-yarn-api:2.2.0
[warn]            +- org.apache.hadoop:hadoop-client:2.2.0
[warn]            +- org.apache.spark:spark-core_2.10:1.3.0 (/home/luis/Test/build.sbt#L7-8)
[warn]            +- Test:Test_2.10:1.0
[warn]          org.apache.hadoop:hadoop-mapreduce-client-core:2.2.0
[warn]            +- org.apache.hadoop:hadoop-client:2.2.0
[warn]            +- org.apache.spark:spark-core_2.10:1.3.0 (/home/luis/Test/build.sbt#L7-8)
[warn]            +- Test:Test_2.10:1.0
[warn]          org.apache.hadoop:hadoop-mapreduce-client-jobclient:2.2.0
[warn]            +- org.apache.hadoop:hadoop-client:2.2.0
[warn]            +- org.apache.spark:spark-core_2.10:1.3.0 (/home/luis/Test/build.sbt#L7-8)
[warn]            +- Test:Test_2.10:1.0

      

Let's focus on one of these dependencies, because the warnings and errors are the same for all of them. Let's say org.apache.hadoop: hasoop-mapreduce-client-app: 2.2.0

Example of warnings parsing the "ivy- [revision] .xml.original" file:

[warn] xml parsing: ivy-2.2.0.xml.original:18:69: schema_reference.4: Failed to read schema document 'http://maven.apache.org/xsd/maven-4.0.0.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
[warn] xml parsing: ivy-2.2.0.xml.original:19:11: schema_reference.4: Failed to read schema document 'http://maven.apache.org/xsd/maven-4.0.0.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
[warn] xml parsing: ivy-2.2.0.xml.original:20:17: schema_reference.4: Failed to read schema document 'http://maven.apache.org/xsd/maven-4.0.0.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
.......
.......

[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
[warn]  ::          UNRESOLVED DEPENDENCIES         ::
[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
[warn]  :: org.apache.hadoop#hadoop-mapreduce-client-app;2.2.0: java.text.ParseException: [xml parsing: ivy-2.2.0.xml.original:18:69: cvc-elt.1: Cannot find the declaration of element 'project'. in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[warn] , unknown tag project in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[warn] , unknown tag parent in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[warn] , unknown tag artifactId in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[warn] , unknown tag groupId in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[warn] , unknown tag version in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[warn] , unknown tag modelVersion in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[warn] , unknown tag groupId in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[warn] , unknown tag artifactId in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[warn] , unknown tag version in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[warn] , unknown tag name in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[warn] , unknown tag properties in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[warn] , unknown tag applink.base in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[warn] , unknown tag mr.basedir in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original

      

Result of errors:

[error] (*:update) sbt.ResolveException: unresolved dependency: org.apache.hadoop#hadoop-mapreduce-client-app;2.2.0: java.text.ParseException: [xml parsing: ivy-2.2.0.xml.original:18:69: cvc-elt.1: Cannot find the declaration of element 'project'. in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[error] , unknown tag project in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[error] , unknown tag parent in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[error] , unknown tag artifactId in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[error] , unknown tag groupId in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[error] , unknown tag version in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[error] , unknown tag modelVersion in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[error] , unknown tag groupId in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[error] , unknown tag artifactId in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[error] , unknown tag version in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[error] , unknown tag name in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[error] , unknown tag properties in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[error] , unknown tag applink.base in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[error] , unknown tag mr.basedir in file:/home/luis/.ivy2/cache/org.apache.hadoop/hadoop-mapreduce-client-app/ivy-2.2.0.xml.original
[error] ]

      

To clarify, the content of the ivy-2.2.0.xml.original file looks like this:

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
                      http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <parent>
    <artifactId>hadoop-yarn</artifactId>
    <groupId>org.apache.hadoop</groupId>
    <version>2.2.0</version>
  </parent>
  <modelVersion>4.0.0</modelVersion>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-yarn-api</artifactId>
  <version>2.2.0</version>
  <name>hadoop-yarn-api</name>

  <properties>
    <!-- Needed for generating FindBugs warnings using parent pom -->
    <yarn.basedir>${project.parent.basedir}</yarn.basedir>
  </properties>

  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-maven-plugins</artifactId>
        <executions>
          <execution>
            <id>compile-protoc</id>
            <phase>generate-sources</phase>
            <goals>
              <goal>protoc</goal>
            </goals>
            <configuration>
              <protocVersion>${protobuf.version}</protocVersion>
              <protocCommand>${protoc.path}</protocCommand>
              <imports>
                <param>${basedir}/../../../hadoop-common-project/hadoop-common/src/main/proto</param>
                <param>${basedir}/src/main/proto</param>
                <param>${basedir}/src/main/proto/server</param>
              </imports>
              <source>
                <directory>${basedir}/src/main/proto</directory>
                <includes>
                  <include>yarn_protos.proto</include>
                  <include>yarn_service_protos.proto</include>
                  <include>applicationmaster_protocol.proto</include>
                  <include>applicationclient_protocol.proto</include>
                  <include>containermanagement_protocol.proto</include>
                  <include>server/yarn_server_resourcemanager_service_protos.proto</include>
                  <include>server/resourcemanager_administration_protocol.proto</include>
                </includes>
              </source>
              <output>${project.build.directory}/generated-sources/java</output>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>

</project>

      

And after all this introduction ...... These are my questions:

  • What is the labor of the file "ivy- [revision] .xml.original" and who is trying to parse it.
  • Why the xml tags are not recognized.

Any help would be appreciated!

SBT version: 0.13.8

Thank.

+3


source to share


2 answers


I asked the unanswered question you linked to in your post and I'm happy to report that it was answered a few days ago and the suggested solution worked for me.

Try upgrading to sbt 0.13.9-RC3 (follow instructions at http://www.scala-sbt.org/release/tutorial/Manual-Installation.html and get the jar at https://dl.bintray.com/typesafe/ ivy-releases / org.scala-sbt / sbt-launch / 0.13.9-RC3 / ).



Respectfully,

/ Martin

+1


source


I finally managed to compile / package / build using sbt OFFLINE with a subset of frozen libraries. To summarize this process, I rewrote the description above a bit.

These are the steps to create the problem:

  • On a computer called ORIGIN, let's create an sbt project with scala sources and compile it with an internet connection
  • Stop internet connection. Make sure projects continue to compile.
  • Duplicate the SBT project or copy it to another computer (DESTINATION) without an internet connection.
  • Try compiling. This won't work because sbt will try to download dependencies online, and DESTINATION is the OFFLINE computer.

These are the steps to fix the problem:

  • Assuming we are copying sbt project to new computer (DESTINATION) without internet connection. We need to make sure our sbt and scala versions of the DESTINATION version are the same as the ORIGIN version. If the version of sbt or scala is different, then when running SBT in DESTINATION, sbt will try to load the correct versions, resulting in connection failures.
  • If the SBT and scala versions are the same, then we must copy the following folders from ORIGIN to DESTINATION:
    • ORIGIN: /home/userA/.ivy2
    • ORIGIN: /home/userA/.sbt/boot
  • Make sure the environment variables pointing to SBT and scala are set correctly.
  • Use a build.sbt file like this:


Build.sbt:

name := "ProjectNAME"
version := "1.1"
scalaVersion := "2.10.5"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.0" % "provided"
libraryDependencies += "joda-time" % "joda-time" % "2.3" 
libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.3.0" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "1.3.0" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-hive" % "1.3.0" % "provided"

// Optional if you are using the assembly plugin
jarName in assembly := "ProjectoEclipseScala.jar"
// Optional to avoid that assembly includes scala
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)

      

  1. I'm not sure if "provided" is required, because the joda dependency reads correctly.
  2. MAKE SURE that all dependencies that you can use in DESTINATION should have been loaded earlier in ORIGIN and copied to DESTINATION.
  3. I tried to compile a simple project that only uses the Spark context (not spark-sql). Therefore it should be able to compile with a unique dependency:

    libraryDependencies + = "org.apache.spark" %% "spark-core"% "1.3.0"% "provided

    However, we have tested that it DOES NOT COMPOSE !!. SBT complains about the Jackson package. Maybe package "jackson" is deployed in spark sql dependecy ... Regardless, including spark-sql, project compilation / build / build.

CLOSING COMMENT: If even after this procedure you fail to compile, there is a "manual" alternative .... I will also be able to work standalone, WITHOUT standalone SBT compiler ... using Eclipse for Scala. In eclipse you can manually select dependencies in the GUI and you can select all sparks, hadoop, mapreduce ... dependencies manually. Once Eclipse recognizes these dependencies, it will compile your classes in the path: "workspace / eclipse_project_name / bin" folder. Then you can pick them up and pack them manually into a jar (MANIFEST may be needed, but I think it is not necessary). This jar can be included in the cluster if all dependencies are already running on the cluster.

0


source







All Articles