Chapter 14. Fabric Maven Proxies

Abstract

Container hosts often have limited or no access to the Internet, which can make it difficult for Fabric containers to download and install Maven artifacts. This problem can be mitigated using a Maven proxy, which serves as a central cache of Maven artifacts for the Fabric containers. Managed containers try to download from the Maven proxy, before trying to download from the Internet. This chapter explains how the Maven proxy works and how to customize the configuration of the Maven proxy to suit your network environment.

14.1. Cluster of Fabric Maven Proxies

Overview

Fabric Maven proxies are deployed only on Fabric servers (ensemble members), not on regular managed containers. So, if there is just a single Fabric server in your fabric, there will be just one Maven proxy. But if your Fabric ensemble consists of multiple servers (for example, three or five), a Maven proxy is deployed on each server, and this cluster of Maven proxies is configured automatically as a master-slave cluster.
Figure 14.1, “Maven Proxy Cluster” shows the outline of a Maven proxy cluster consisting of three Fabric servers (which constitute the Fabric ensemble).

Figure 14.1. Maven Proxy Cluster

Maven Proxy Cluster

Master-slave cluster

Each Maven proxy is deployed inside a Fabric server (a container that belongs to the Fabric ensemble) and the Maven proxies together are organized as a master-slave cluster. This means that one of the Maven proxies in the cluster is elected to be the master, while all of the other Maven proxies remain as slaves. Only the master proxy is available to serve up Maven artifacts, while the slave proxies remain in a suspended state.
The master-slave architecture is implemented with the help of Apache Zookeeper distributed locking. At start-up time, each of the Maven proxies attempts to acquire a Zookeeper lock: the proxy that succeeds becomes the master, while the remaining proxies remain as slaves.

Maven proxy

A Maven proxy is a HTTP Web server that behaves very much like a standard Maven repository, such as Maven Central.
The purpose of the Maven proxy is to serve Maven artifacts on the local network. It has its own local cache of Maven artifacts, which it can serve up quickly. But if necessary, the Maven proxy can also download artifacts from remote repositories (in a proxy role). This architecture offers a number of advantages:
  • The Maven proxy builds up a large cache over time, which can be served up quickly to other containers in the Fabric.
  • It is not necessary for every container to download Maven artifacts from remote repositories—the Maven proxy performs this service for the other containers.
  • In a network with limited Internet access, you can arrange to deploy the Maven proxy on a host with Internet access, while the other containers in the fabric are deployed on hosts without Internet access.

Managed container

A managed container is a regular Fabric container (not part of the Fabric ensemble), whose contents are managed by a Fabric8 agent. The Fabric8 agent is responsible for ensuring that the bundles deployed in the container are consistent with what is specified in this container's Fabric profiles. Whenever necessary, the Fabric8 agent will contact the Maven proxy to download new Maven artifacts for deploying inside the container.

Resolving a Maven artifact

The Fabric8 agent attempts to locate a Maven artifact roughly as follows:
  1. The Fabric8 agent searches its local Maven repository for the artifact.
  2. If that fails, the Fabric8 agent contacts the Maven proxy to request the artifact.
  3. If that fails, the Fabric8 agent attempts to contact remote Maven repositories directly to request the artifact.
For a more detailed outline of this process, see Section 14.2, “How a Managed Container Resolves Artifacts”.

Endpoint discovery

Before the Fabric8 agent can connect to the Maven proxy, it needs to discover the HTTP address of the current master instance (only the master instance is usable, because the slave instances are dormant). The discovery mechanism is based on the Apache Zookeeper registry: by querying Zookeeper, the Fabric8 agent can discover the URL of the current master instance.

Which Fabric server is the current master?

You can query Zookeeper manually (using console commands) to discover the URL of the current Maven proxy master instance. To discover the URLs for the current Maven proxy master, invoke the fabric:cluster-list console command, as follows:
JBossFuse:karaf@root> cluster-list servlets/io.fabric8.fabric-maven-proxy
[cluster]                           [masters]  [slaves]  [services]                            []
1.2.0.redhat-621084/maven/download                                                               
   root                             root       -         http://127.0.0.1:8181/maven/download    
1.2.0.redhat-621084/maven/upload                                                                 
   root                             root       -         http://127.0.0.1:8181/maven/upload
The preceding example is trivial, because there is only one Fabric server (the root container) in this Fabric ensemble. This command returns two URLs: one for downloading artifacts (http://127.0.0.1:8181/maven/download), and another for uploading artifacts (http://127.0.0.1:8181/maven/upload). For more details about uploading artifacts, see Section 14.6, “Automated Deployment”.

What happens during failover?

Normally, the master instance remains the master instance for as long as the Maven proxy is deployed and running in its container. However, if the container hosting the master Maven proxy gets shut down (for whatever reason), the master instance releases the Zookeeper lock, and one of the slave instances has the opportunity to be promoted to master. Each of the slave instances retries the Zookeeper lock at regular time intervals and the first slave that retries the lock will acquire the lock and become the new master.
When the cluster fails over and a former slave becomes the new master, this has important consequences:
  • The URLs for the master Maven proxy are changed. Clients must now connect to a different URL to connect to the Maven proxy. For Fabric8 agents, this failover is transparent, because the Fabric8 agent automatically rediscovers the new URLs.
  • If you have been automatically uploading artifacts to the Maven proxy as part of your build process (see Section 14.6, “Automated Deployment”), you will need to reconfigure the upload URL. In this case, failover is not transparent.
  • It is likely that the new master has a much smaller cache of Maven artifacts than the old master. This could result in noticeable delays, because many previously cached artifacts have to be downloaded again.

No replication

Within the Maven proxy cluster, there is no automatic replication of artifacts between different Maven proxies in the cluster. You will probably notice the effects of this, when the cluster fails over to a new Maven proxy.

Managing the Maven artifact data

Although Fabric does not support replication of the local Maven caches, there are some strategies you can adopt to compensate for this. The Maven proxy caches its artifacts in the local Maven repository (normally in UserHome/.m2/repository). You could simply do a manual copy of the contents of the local Maven repository from one Maven proxy host to another. Or for a more sophisticated approach, you can try storing the local Maven repository on a networked file system.