I'm trying to run the mahout canopy clustering algorithm through a
Python-embedded Pig script. The embedded Pig part of the script works (using
compileFromFile, bind, runSingle), but I can't figure out how to run mahout
from the same script. Originally I tried running mahout via subprocess.call,
but when trying to import subprocess, I get:
ImportError: No module named subprocess
Similar errors occur when I try to import sys or os modules.
Next I tried just instantiating the CanopyClustering class, but got a
similar error when using the following import statement:
from org.apache.mahout.clustering.canopy import CanopyDriver
#=> ImportError: No module named mahout
The ImportErrors don't occur when I run Python interactively. Is this a
Jython problem? Am I not setting some path properly?
Other possibly useful info:
- I'm including the mahout jars in the pig.additional.jars property.
- I'm running the script using Pig, i.e., `pig myscript.py`