Apache Nifi on Google Cloud Kubernetes Engine (GKE)

Apache Nifi on GKE can be a good solution, if you want to have a low code solution for processing streaming data. If you set it up on GKE, a managed version of Kubernetes, you have a managed scalable environment and do not need to worry about handling the actual servers.

Setup of the Apache Nifi cluster

Apache Nifi setup on Google Cloud Kubernetes Engine
Apache Nifi setup on Google Cloud Kubernetes Engine

Setting up the Apache Nifi on GKE can be managed by using Terraform, to make the deployment automated. This creates an easy way to manage changes and keep track of everything.

Above you see an example architecture for an Apache Nifi on GKE setup. This setup uses Terraform to deploy the cluster and stores the processed data from Nifi into Bigquery and Cloud Storage. There are Nifi processors for both of these as data sinks.

The Nifi Cluster has a Helm Chart for easy management of all the needed components, like:

The chart is provided by Cetic and can be installed on your cluster using the following code.

helm repo add cetic https://cetic.github.io/helm-charts
helm repo update
helm install esb cetic/nifi -f custom_values.yaml

Customizing the Helm Chart

To customize your Apache Nifi on GKE deployment, there is the possibility to adapt the values.yaml file provided in github. It contains information on e.g. how many Nifi nodes to deploy or to set up authentification for the NifiUI.

One of the important things to set here, is to enable Nifi Registry. If this is not enabled and set up, a crash of the cluster might result in you losing your flows.

Apache Nifi Registry Setup on GKE

Nifi Registry is an additional Nifi service, that provides a version control for your Nifi Flows and also provides two options on how to store the versions. Git and Storage are the provided options. The XML configuration for the provider.xml is shown below.

<flowPersistenceProvider>    <class>org.apache.nifi.registry.provider.flow.FileSystemFlowPersistenceProvider</class>
<property name="Flow Storage Directory">/opt/nifi-registry/nifi-registry-current/flow_storage/</property>
</flowPersistenceProvider>
<flowPersistenceProvider>
<class>org.apache.nifi.registry.provider.flow.git.GitFlowPersistenceProvider</class>
<property name="Flow Storage Directory">/opt/nifi-registry/nifi-registry-current/git/</property>
<property name="Remote To Push">origin</property>
<property name="Remote Access User">USERNAME</property>
<property name="Remote Access password">PASSWORD</property>
</flowPersistenceProvider>

In Google Cloud it is also possible to use Cloud Storage as a persistence backend. If you want to set this up, you need to customize the the container by adding GCSFuse to it. After adding this to the container, you need to adapt the start.sh file, to actually mount the bucket on container startup.

echo "Mounting GCS Fuse."
gcsfuse -file-mode=777 -dir-mode=777 nifi-repository pt/nifi-registry/nifi-registry-current/storage/
echo "Mounting completed."

Connecting Nifi to Registry

To connect Nifi to a registry you need to add a registry controller to Nifi under “Options” -> “Controller Settings” -> “Registry Clients”. Use the Kubernetes cluster internal IP for Nifi Registry here.
Then add a bucket to Nifi Registry. This bucket can then be selected in Nifi when setting up version control and will appear as a directory in the git repository.
To version control a flow, it needs to be nested inside a “Process group”. Once this is done, right click on the “Process Group” and under “Version” click “Start version controll”.
More documentation can be found here.

Code examples

You can find examples for the Nifi Registry customization in this snippet.

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close