Chapter 8. Configuring APIcast for better performance

This document provides general guidelines to debug performance issues in APIcast. It also introduces the available caching modes and explains how they can help in increasing performance, as well as details about profiling modes. The content is structured in the following sections:

8.1. General guidelines

In a typical APIcast deployment, there are three components to consider:

  • APIcast
  • The 3scale back-end server that authorizes requests and keeps track of the usage
  • The upstream API

When experiencing performance issues in APIcast:

  • Identify the component that is responsible for the issues.
  • Measure the latency of the upstream API, to determine the latency that APIcast plus the 3scale back-end server introduce.
  • With the same tool you are using to run the benchmark, perform a new measurement but pointing to APIcast instead of pointing to the upstream API directly.

Comparing these results will give you an idea of the latency introduced by APIcast and the 3scale back-end server.

In a Hosted (SaaS) installation with self-managed APIcast, if the latency introduced by APIcast and the 3scale back-end server is high:

  1. Make a request to the 3scale back-end server from the same machine where APIcast is deployed
  2. Measure the latency.

The 3scale back-end server exposes an endpoint that returns the version: https://su1.3scale.net/status. In comparison, an authorization call requires more resources because it verifies keys, limits, and queue background jobs. Although the 3scale back-end server performs these tasks in a few milliseconds, it requires more work than checking the version like the /status endpoint does. As an example, if a request to /status takes around 300 ms from your APIcast environment, an authorization is going to take more time for every request that is not cached.

8.2. Default caching

For requests that are not cached, these are the events:

  1. APIcast extracts the usage metrics from matching mapping rules.
  2. APIcast sends the metrics plus the application credentials to the 3scale back-end server.
  3. The 3scale back-end server performs the following:

    1. Checks the application keys, and that the reported usage of metrics is within the defined limits.
    2. Queues a background job to increase the usage of the metrics reported.
    3. Responds to APIcast whether the request should be authorized or not.
  4. If the request is authorized, it goes to the upstream.

In this case, the request does not arrive to the upstream until the 3scale back-end server responds.

On the other hand, with the caching mechanism that comes enabled by default:

  • APIcast stores in a cache the result of the authorization call to the 3scale back-end server if it was authorized.
  • The next request with the same credentials and metrics will use that cached authorization instead of going to the 3scale back-end server.
  • If the request was not authorized, or if it is the first time that APIcast receives the credentials, APIcast will call the 3scale back-end server synchronously as explained above.

When the authentication is cached, APIcast first calls the upstream and then, in a phase called post action, it calls the 3scale back-end server and stores the authorization in the cache to have it ready for the next request. Notice that the call to the 3scale back-end server does not introduce any latency because it does not happen in request time. However, requests sent in the same connection will need to wait until the post action phase finishes.

Imagine a scenario where a client is using keep-alive and sends a request every second. If the upstream response time is 100 ms and the latency to the 3scale back-end server is 500 ms, the client will get the response every time in 100 ms. The total of upstream response and the reporting would take 600 ms. That gives extra 400 ms before the next request comes.

The diagram below illustrates the default caching behavior explained.The behavior of the caching mechanism can be changed using the caching policy.

Default caching behavior

8.3. Asynchronous reporting threads

APIcast has a feature to enable a pool of threads that authorize against the 3scale back-end server. With this feature enabled, APIcast first synchronously calls the 3scale back-end server to verify the application and metrics matched by mapping rules. This is similar to when it uses the caching mechanism enabled by default. The difference is that subsequent calls to the 3scale back-end server are reported fully asynchronously as long as there are free reporting threads in the pool.

Reporting threads are global for the whole gateway and shared between all the services. When a second TCP connection is made, it will also be fully asynchronous as long as the authorization is already cached. When there are no free reporting threads, the synchronous mode falls back to the standard asynchronous mode and does the reporting in the post action phase.

You can enable this feature using the APICAST_REPORTING_THREADS environment variable.

The diagram below illustrates how the asynchronous reporting thread pool works.

Asynchronous reporting thread pool behavior

8.4. 3scale Batcher policy

By default, APIcast performs one call to the 3scale back-end server for each request that it receives. The goal of the 3scale Batcher policy is to reduce latency and increase throughput by significantly reducing the number of requests made to the 3scale back-end server. In order to achieve that, this policy caches authorization statuses and batches reports.

Section 4.1.2, “3scale Batcher” provides details about the 3scale Batcher policy. The diagram below illustrates how the policy works.

3scale Batcher policy behavior