Rate limits

We have various safeguards in place against burtsts of incoming traffic to help maximize the stability. Clients who send many requests in quick succession may see error responses that show up as status code 429.

For most APIs, Mnemonic allows up to 30 requests per second, which apply broadly to all APIs regardless of the protocol used (gRPC or REST).

Treat these limits as maximums and don't generate unnecessary load. If you suddenly see a rising number of rate limited requests, please contact support.

We may increase limits to enable high-traffic applications for Enterprise tier. To request an increased rate limit, please contact support.

Common causes and mitigations

Rate limiting can occur under a variety of conditions, but it's most common in these scenarios:

Running a large volume of closely-spaced requests can lead to rate limiting. You should control the request rate on the client side.
Issuing many long-lived requests can trigger limiting. Requests vary in the amount of Mnemonic’s server resources they use, and more resource-intensive requests tend to take longer and run the risk of causing new requests to be rate limited. Resource requirements vary widely across Mnemonic API, but requests that include large offsets take longer to run in general. We suggest profiling the duration of API requests and watching for timeouts to try and spot those that are unexpectedly slow.
If you encounter a rising number of long running requests, please contact support.

Handling limiting gracefully

A basic technique for integrations to gracefully handle limiting is to watch for 429 status codes and build in a retry mechanism. The retry mechanism should follow an exponential backoff schedule to reduce request volume when necessary. We’d also recommend building some randomness into the backoff schedule to avoid a thundering herd effect. Refer to the Best practices guide to learn more.

You can only optimize individual requests to a limited degree, so an even more sophisticated approach would be to control traffic to Mnemonic API at a global level, and throttle it back if you detect substantial rate limiting. A common technique for controlling rate is to implement something like a token bucket rate limiting algorithm on the client-side. Ready-made and mature implementations for token bucket are available in almost any programming language.