mirror of
https://github.com/optim-enterprises-bv/kubernetes.git
synced 2025-11-26 19:35:10 +00:00
This adds a very basic Zeppelin image that works with the existing Spark example. As can be seen from the documentation, it has a couple of warts: * It requires kubectl port-forward (which is unstable across long periods of time, at least for me, on this app, bug incoming). See * I needed to roll my own container (none of the existing containers exactly matched needs, or even built anymore against modern Zeppelin master, and the rest of the example is Spark 1.5). The image itself is *huge*. One of the further refinements we need to look at is how to possibly strip the Maven build for this container down to just the interpreters we care about, because the deps here are frankly ridiculous. This might be a case where, if possible, we might want to open an upstream request to build things dynamically, then use something like probably the cut the image down considerably. (This might already be possible, need to poke at whether you can late-bind interpreters later.)
34 lines
1.3 KiB
Docker
34 lines
1.3 KiB
Docker
FROM java:openjdk-8-jdk
|
|
|
|
ENV hadoop_ver 2.6.1
|
|
ENV spark_ver 1.5.1
|
|
|
|
# Get Hadoop from US Apache mirror and extract just the native
|
|
# libs. (Until we care about running HDFS with these containers, this
|
|
# is all we need.)
|
|
RUN mkdir -p /opt && \
|
|
cd /opt && \
|
|
wget http://www.us.apache.org/dist/hadoop/common/hadoop-${hadoop_ver}/hadoop-${hadoop_ver}.tar.gz && \
|
|
tar -zvxf hadoop-${hadoop_ver}.tar.gz hadoop-${hadoop_ver}/lib/native && \
|
|
rm hadoop-${hadoop_ver}.tar.gz && \
|
|
ln -s hadoop-${hadoop_ver} hadoop && \
|
|
echo Hadoop ${hadoop_ver} native libraries installed in /opt/hadoop/lib/native
|
|
|
|
# Get Spark from US Apache mirror.
|
|
RUN mkdir -p /opt && \
|
|
cd /opt && \
|
|
wget http://www.us.apache.org/dist/spark/spark-${spark_ver}/spark-${spark_ver}-bin-hadoop2.6.tgz && \
|
|
tar -zvxf spark-${spark_ver}-bin-hadoop2.6.tgz && \
|
|
rm spark-${spark_ver}-bin-hadoop2.6.tgz && \
|
|
ln -s spark-${spark_ver}-bin-hadoop2.6 spark && \
|
|
echo Spark ${spark_ver} installed in /opt
|
|
|
|
# Add the GCS connector.
|
|
RUN wget -O /opt/spark/lib/gcs-connector-latest-hadoop2.jar https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop2.jar
|
|
|
|
ADD log4j.properties /opt/spark/conf/log4j.properties
|
|
ADD start-common.sh /
|
|
ADD core-site.xml /opt/spark/conf/core-site.xml
|
|
ADD spark-defaults.conf /opt/spark/conf/spark-defaults.conf
|
|
ENV PATH $PATH:/opt/spark/bin
|