Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to run SMV Example App #1392

Closed
yw-yang opened this issue Oct 9, 2018 · 8 comments
Closed

Fail to run SMV Example App #1392

yw-yang opened this issue Oct 9, 2018 · 8 comments

Comments

@yw-yang
Copy link
Collaborator

yw-yang commented Oct 9, 2018

After I installed latest smv and tried to run the example app
smv-init -s MyApp
smv-run --run-app

Below error occurs. I can trigger other project successfully, is the example app out-dated? Since it is noted in the SMV homepage, could we have a fix on it?

Modules to run/publish
----------------------
stage1.employment.EmploymentByState
----------------------
Traceback (most recent call last):
  File "/Users/Documents/Programming/smv/SMV_package/SMV_master/tools/../src/main/python/scripts/runapp.py", line 17, in <module>
    SmvDriver().run()
  File "/Users/Documents/Programming/smv/SMV_package/SMV_master/src/main/python/smv/smvdriver.py", line 54, in run
    self.main(app, driver_args)
  File "/Users/Documents/Programming/smv/SMV_package/SMV_master/src/main/python/smv/smvdriver.py", line 35, in main
    app.run()
  File "/Users/Documents/Programming/smv/SMV_package/SMV_master/src/main/python/smv/smvapp.py", line 629, in run
    or self._generate_output_modules(mods, collector)
  File "/Users/Documents/Programming/smv/SMV_package/SMV_master/src/main/python/smv/smvapp.py", line 604, in _generate_output_modules
    self._module_rdd(m, collector)
  File "/Users/Documents/Programming/smv/SMV_package/SMV_master/src/main/python/smv/smvapp.py", line 564, in _module_rdd
    False # quick run
  File "/Users/Documents/Programming/smv/spark-2.3.1-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/Users/Documents/Programming/smv/spark-2.3.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
  File "/Users/Documents/Programming/smv/spark-2.3.1-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o151.rdd.
: org.tresamigos.smv.SmvRuntimeException: There was an error executing Python code
 at org.tresamigos.smv.python.InterfacesWithPy4J$class.getPy4JResult(Py4JInterface.scala:28)
 at org.tresamigos.smv.SmvExtModulePython.getPy4JResult(SmvDataSet.scala:739)
 at org.tresamigos.smv.SmvExtModulePython.doRun(SmvDataSet.scala:761)
 at org.tresamigos.smv.SmvDataSet$$anonfun$computeDataFrame$1.apply(SmvDataSet.scala:577)
 at org.tresamigos.smv.SmvDataSet$$anonfun$computeDataFrame$1.apply(SmvDataSet.scala:571)
 at org.tresamigos.smv.SmvLock$.withLock(SmvLock.scala:70)
 at org.tresamigos.smv.SmvDataSet.computeDataFrame(SmvDataSet.scala:568)
 at org.tresamigos.smv.SmvDataSet.rdd(SmvDataSet.scala:224)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
 at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
 at py4j.Gateway.invoke(Gateway.java:282)
 at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
 at py4j.commands.CallCommand.execute(CallCommand.java:79)
 at py4j.GatewayConnection.run(GatewayConnection.java:238)
 at java.lang.Thread.run(Thread.java:748)
Caused by: org.tresamigos.smv.python.SmvPythonException: The following error occurred whiling calling back to Python code
Traceback (most recent call last):
  File "/Users/Documents/Programming/smv/SMV_package/SMV_master/src/main/python/smv/py4j_interface.py", line 35, in interface_method
    result = impl_method(*args)
  File "/Users/Documents/Programming/smv/SMV_package/SMV_master/src/main/python/smv/smvdataset.py", line 492, in doRun
    i = self._constructRunParams(known)
  File "/Users/Documents/Programming/smv/SMV_package/SMV_master/src/main/python/smv/smvdataset.py", line 485, in _constructRunParams
    jdf = urn2df[dep.urn()]
AttributeError: 'NoneType' object has no attribute 'urn'
@yw-yang yw-yang added the bug label Oct 9, 2018
@ninjapapa
Copy link
Contributor

@yw-yang I can't reproduce.
Here is what I did on my Mac laptop:

cd SMV
git checkout v2r2
sbt assembly
cd ..
smv-init -s MyApp
cd MyApp/
smv-run --run-app

No error.

Could you provides more context?

  • On Mac laptop, or client-L's server?
  • Which version of SMV? v2r2 or master head?
  • From source or using pre-build?

If all above still can't figure out the problem, also need:

  • What python version?
  • What JAVA version?
  • What Spark version?

@guangningyu
Copy link
Contributor

I cannot reproduce neither following Bo's steps.

My environment:

OS: macOS mojave
SMV: v2r2 (built from source)
Python: 2.7.13
Java: 1.8.0_121
Spark: 2.1.1

@guangningyu
Copy link
Contributor

Running with latest master of SMV threw this error:

$ smv-run --run-app
Using spark-submit to submit jobs
Using pyspark to start shells
APP_JAR = /Users/guangningyu/Softwares/SMV/tools/../target/scala-2.11/smv-2-SNAPSHOT-jar-with-dependencies.jar
START RUN ==============================
Wed Oct 10 10:05:32 CST 2018
Spark Command: /Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents/Home/bin/java -cp /Users/guangningyu/Softwares/SMV/tools/../target/scala-2.11/smv-2-SNAPSHOT-jar-with-dependencies.jar:smv-2-SNAPSHOT-jar-with-dependencies.jar:/Users/guangningyu/Softwares/spark-2.1.1-bin-hadoop2.7/conf/:/Users/guangningyu/Softwares/spark-2.1.1-bin-hadoop2.7/jars/* -Xmx1g org.apache.spark.deploy.SparkSubmit --conf spark.driver.extraClassPath=/Users/guangningyu/Softwares/SMV/tools/../target/scala-2.11/smv-2-SNAPSHOT-jar-with-dependencies.jar:smv-2-SNAPSHOT-jar-with-dependencies.jar: --jars /Users/guangningyu/Softwares/SMV/tools/../target/scala-2.11/smv-2-SNAPSHOT-jar-with-dependencies.jar, /Users/guangningyu/Softwares/SMV/tools/../src/main/python/scripts/runapp.py --run-app
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/10/10 10:05:33 INFO SparkContext: Running Spark version 2.1.1
18/10/10 10:05:33 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/10/10 10:05:33 INFO SecurityManager: Changing view acls to: guangningyu
18/10/10 10:05:33 INFO SecurityManager: Changing modify acls to: guangningyu
18/10/10 10:05:33 INFO SecurityManager: Changing view acls groups to:
18/10/10 10:05:33 INFO SecurityManager: Changing modify acls groups to:
18/10/10 10:05:33 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(guangningyu); groups with view permissions: Set(); users  with modify permissions: Set(guangningyu); groups with modify permissions: Set()
18/10/10 10:05:34 INFO Utils: Successfully started service 'sparkDriver' on port 60646.
18/10/10 10:05:34 INFO SparkEnv: Registering MapOutputTracker
18/10/10 10:05:34 INFO SparkEnv: Registering BlockManagerMaster
18/10/10 10:05:34 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
18/10/10 10:05:34 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
18/10/10 10:05:34 INFO DiskBlockManager: Created local directory at /private/var/folders/_k/1pmphpw113l7v1sy9n1x8nj00000gn/T/blockmgr-db19e230-5eba-46ac-974f-1c0f95169366
18/10/10 10:05:34 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
18/10/10 10:05:34 INFO SparkEnv: Registering OutputCommitCoordinator
18/10/10 10:05:34 INFO Utils: Successfully started service 'SparkUI' on port 4040.
18/10/10 10:05:34 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://169.254.66.202:4040
18/10/10 10:05:34 INFO SparkContext: Added JAR file:/Users/guangningyu/Softwares/SMV/tools/../target/scala-2.11/smv-2-SNAPSHOT-jar-with-dependencies.jar at spark://169.254.66.202:60646/jars/smv-2-SNAPSHOT-jar-with-dependencies.jar with timestamp 1539137134375
18/10/10 10:05:34 INFO SparkContext: Added file file:/Users/guangningyu/Softwares/SMV/tools/../src/main/python/scripts/runapp.py at file:/Users/guangningyu/Softwares/SMV/tools/../src/main/python/scripts/runapp.py with timestamp 1539137134559
18/10/10 10:05:34 INFO Utils: Copying /Users/guangningyu/Softwares/SMV/src/main/python/scripts/runapp.py to /private/var/folders/_k/1pmphpw113l7v1sy9n1x8nj00000gn/T/spark-a5692228-d3f8-4f81-8bb2-65891d9b6897/userFiles-852308b8-e7e8-4764-b1e3-7cc1648a766f/runapp.py
18/10/10 10:05:34 INFO Executor: Starting executor ID driver on host localhost
18/10/10 10:05:34 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 60651.
18/10/10 10:05:34 INFO NettyBlockTransferService: Server created on 169.254.66.202:60651
18/10/10 10:05:34 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/10/10 10:05:34 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 169.254.66.202, 60651, None)
18/10/10 10:05:34 INFO BlockManagerMasterEndpoint: Registering block manager 169.254.66.202:60651 with 366.3 MB RAM, BlockManagerId(driver, 169.254.66.202, 60651, None)
18/10/10 10:05:34 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 169.254.66.202, 60651, None)
18/10/10 10:05:34 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 169.254.66.202, 60651, None)
18/10/10 10:05:34 INFO SharedState: Warehouse path is 'file:/Users/guangningyu/Softwares/MyApp/spark-warehouse'.
18/10/10 10:05:35 INFO HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
18/10/10 10:05:35 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/10/10 10:05:35 INFO ObjectStore: ObjectStore, initialize called
18/10/10 10:05:35 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/10/10 10:05:35 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
18/10/10 10:05:36 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
18/10/10 10:05:37 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/10/10 10:05:37 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/10/10 10:05:38 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/10/10 10:05:38 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/10/10 10:05:38 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/10/10 10:05:38 INFO ObjectStore: Initialized ObjectStore
18/10/10 10:05:38 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/10/10 10:05:38 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
18/10/10 10:05:38 INFO HiveMetaStore: Added admin role in metastore
18/10/10 10:05:38 INFO HiveMetaStore: Added public role in metastore
18/10/10 10:05:38 INFO HiveMetaStore: No user is added in admin role, since config is empty
18/10/10 10:05:38 INFO HiveMetaStore: 0: get_all_databases
18/10/10 10:05:38 INFO audit: ugi=guangningyu	ip=unknown-ip-addr	cmd=get_all_databases
18/10/10 10:05:38 INFO HiveMetaStore: 0: get_functions: db=default pat=*
18/10/10 10:05:38 INFO audit: ugi=guangningyu	ip=unknown-ip-addr	cmd=get_functions: db=default pat=*
18/10/10 10:05:38 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/10/10 10:05:39 INFO SessionState: Created local directory: /var/folders/_k/1pmphpw113l7v1sy9n1x8nj00000gn/T/6b75d426-97bc-4127-9e22-7541ef0ebc57_resources
18/10/10 10:05:39 INFO SessionState: Created HDFS directory: /tmp/hive/guangningyu/6b75d426-97bc-4127-9e22-7541ef0ebc57
18/10/10 10:05:39 INFO SessionState: Created local directory: /var/folders/_k/1pmphpw113l7v1sy9n1x8nj00000gn/T/guangningyu/6b75d426-97bc-4127-9e22-7541ef0ebc57
18/10/10 10:05:39 INFO SessionState: Created HDFS directory: /tmp/hive/guangningyu/6b75d426-97bc-4127-9e22-7541ef0ebc57/_tmp_space.db
18/10/10 10:05:39 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/Users/guangningyu/Softwares/MyApp/spark-warehouse
18/10/10 10:05:39 INFO HiveMetaStore: 0: get_database: default
18/10/10 10:05:39 INFO audit: ugi=guangningyu	ip=unknown-ip-addr	cmd=get_database: default
18/10/10 10:05:39 INFO HiveMetaStore: 0: get_database: global_temp
18/10/10 10:05:39 INFO audit: ugi=guangningyu	ip=unknown-ip-addr	cmd=get_database: global_temp
18/10/10 10:05:39 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Starting Py4j callback server on port 25334
Modules to run/publish
----------------------
stage1.employment.EmploymentByState
----------------------
Traceback (most recent call last):
  File "/Users/guangningyu/Softwares/SMV/tools/../src/main/python/scripts/runapp.py", line 17, in <module>
    SmvDriver().run()
  File "/Users/guangningyu/Softwares/SMV/src/main/python/smv/smvdriver.py", line 54, in run
    self.main(app, driver_args)
  File "/Users/guangningyu/Softwares/SMV/src/main/python/smv/smvdriver.py", line 35, in main
    app.run()
  File "/Users/guangningyu/Softwares/SMV/src/main/python/smv/smvapp.py", line 664, in run
    or self._generate_output_modules(mods, collector)
  File "/Users/guangningyu/Softwares/SMV/src/main/python/smv/smvapp.py", line 639, in _generate_output_modules
    self._module_rdd(m, collector)
  File "/Users/guangningyu/Softwares/SMV/src/main/python/smv/smvapp.py", line 599, in _module_rdd
    False # quick run
  File "/Users/guangningyu/Softwares/spark-2.1.1-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
  File "/Users/guangningyu/Softwares/spark-2.1.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
  File "/Users/guangningyu/Softwares/spark-2.1.1-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o133.rdd.
: org.tresamigos.smv.SmvRuntimeException: There was an error executing Python code
	at org.tresamigos.smv.python.InterfacesWithPy4J$class.getPy4JResult(Py4JInterface.scala:28)
	at org.tresamigos.smv.SmvExtModulePython.getPy4JResult(SmvDataSet.scala:739)
	at org.tresamigos.smv.SmvExtModulePython.doRun(SmvDataSet.scala:761)
	at org.tresamigos.smv.SmvDataSet$$anonfun$computeDataFrame$1.apply(SmvDataSet.scala:577)
	at org.tresamigos.smv.SmvDataSet$$anonfun$computeDataFrame$1.apply(SmvDataSet.scala:571)
	at org.tresamigos.smv.SmvLock$.withLock(SmvLock.scala:70)
	at org.tresamigos.smv.SmvDataSet.computeDataFrame(SmvDataSet.scala:568)
	at org.tresamigos.smv.SmvDataSet.rdd(SmvDataSet.scala:224)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:280)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:214)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.tresamigos.smv.python.SmvPythonException: The following error occurred whiling calling back to Python code
Traceback (most recent call last):
  File "/Users/guangningyu/Softwares/SMV/src/main/python/smv/py4j_interface.py", line 35, in interface_method
    result = impl_method(*args)
  File "/Users/guangningyu/Softwares/SMV/src/main/python/smv/smvdataset.py", line 492, in doRun
    i = self._constructRunParams(known)
  File "/Users/guangningyu/Softwares/SMV/src/main/python/smv/smvdataset.py", line 485, in _constructRunParams
    jdf = urn2df[dep.urn()]
AttributeError: 'NoneType' object has no attribute 'urn'

@yw-yang
Copy link
Collaborator Author

yw-yang commented Oct 10, 2018

Yes, latest master threw this error. v2r2 looks good to me.

@ninjapapa
Copy link
Contributor

I see. Will fix master. For the project, please use v2r2

ninjapapa added a commit that referenced this issue Oct 10, 2018
@ninjapapa
Copy link
Contributor

This is the one of the most interesting bug I ever had.

The fix is very simple. Just to make the following change:
BEFORE

    def _generate_dot_graph(self):
        ...
        dot_graph_str = SmvAppInfo(self).create_graph_dot()
        if(self.cmd_line.graph):
            ....
            return True
        else:
            return False

AFTER

    def _generate_dot_graph(self):
        ...
        if(self.cmd_line.graph):
            dot_graph_str = SmvAppInfo(self).create_graph_dot()
            ...
            return True
        else:
            return False

However there are 2 questions left:

  • Why integration test in CI didn't capture this?
  • Why the bug failed the --run-app in the first place?

Since our CI config specified python: 3.6.5. The answer of the first question is that the bug actually didn't fail run-app on python 3.6.5, although it failed on 2.7.*.

The error message indicated that the requiresDS of EmploymentByState returns [None] instead of [Employment]. My hypothesis is that there are 2 version of Employment get loaded, and python (especially 2.7) can't figure out which one the code is point to.

@ninjapapa
Copy link
Contributor

Will do the fix and create a new issue to investigate further.

@ninjapapa
Copy link
Contributor

Close this, since #1398 was opened to continue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants