If you’ve used the OpenAI API, you may have come across the term “rate limits” but are unsure exactly what they refer to. This quick guide will provide more insight into what are ChatGPT rate limits and why do they matter? Rate limits can be a bit tricky to navigate if you’re new to them. If you’re finding yourself frequently hitting the limit, you may need to evaluate your usage and adjust accordingly. You may even need to consider submitting a request for a rate limit increase.
What are ChatGPT rate limits?
Rate limits refer to the maximum number of times a user or client can access the server within a set period. Essentially, they are restrictions imposed by an API.
Rate limits are a common practice across APIs and they are implemented for a number of reasons:
- To protect against abuse or misuse: This comes in handy to deter a rogue actor from overloading the API with requests, which could disrupt the service.
- To ensure fair access: This ensures that no single person or organization can hog the service by making an excessive number of requests, thereby slowing down the API for everyone else.
- To manage load on infrastructure: An API can be taxed if requests increase dramatically. This can cause performance issues. Thus, rate limits help maintain a smooth and consistent experience for all users.
OpenAI rate limits
OpenAI enforces rate limits at the organization level, based on the specific endpoint used and the type of account you have. You can view the rate limits for your organization on the account management page. Rate limits are measured in two ways: RPM (requests per minute) and TPM (tokens per minute). The table below shows the default rate limits:
- Free trial users
- Text & Embedding: 3 RPM, 150,000 TPM
- Chat: 3 RPM, 40,000 TPM
- Edit: 3 RPM, 150,000 TPM
- Image: 5 images / min
- Audio: 3 RPM
- Pay-as-you-go users (first 48 hours)
- Text & Embedding: 60 RPM, 250,000 TPM
- Chat: 60 RPM, 60,000 TPM
- Edit: 20 RPM, 150,000 TPM
- Image: 50 images / min
- Audio: 50 RPM
- Pay-as-you-go users (after 48 hours)
- Text & Embedding: 3,500 RPM, 350,000 TPM
- Chat: 3,500 RPM, 90,000 TPM
- Edit: 20 RPM, 150,000 TPM
- Image: 50 images / min
- Audio: 50 RPM
The rate limits can be increased based on your use case after you fill out a Rate Limit increase request form.
The TPM (tokens per minute) unit varies depending on the model version:
- Davinci: 1 token per minute
- Curie: 25 tokens per minute
- Babbage: 100 tokens per minute
- Ada: 200 tokens per minute
In simple terms, this means you can send approximately 200x more tokens per minute to an Ada model versus a Davinci model.
GPT-4 rate limits
During the limited beta rollout of GPT-4, the model has more stringent rate limits to keep up with demand. For pay-as-you-go users, the default rate limits for gpt-4/gpt-4-0613 are 40k TPM and 200 RPM. For gpt-4-32k/gpt-4-32k-0613, the limits are 150k TPM and 1k RPM. OpenAI is currently unable to accommodate requests for rate limit increases due to capacity constraints.
If your rate limit is 60 requests per minute and 150k davinci tokens per minute, you’ll be constrained by either reaching the requests/min cap or running out of tokens—wh
ever happens first. If you manage to hit your rate limit, you’ll need to pause your program slightly to allow for the next request. For example, if your max requests per minute is 60, that equates to sending one request per second. If you send one request every 800 milliseconds, once you reach your rate limit, you would only need to pause your program for 200 milliseconds before you could send another request.
However, hitting a rate limit does come with consequences. You might encounter an error that looks like this:
Rate limit reached for default-text-davinci-002 in organization org-{id} on requests per min. Limit: 20.000000 / min. Current: 24.000000 / min.
This means you’ve made too many requests in a short period, and the API is refusing to fulfill further requests until enough time has passed.
Tokens and Rate Limits
Each model offered has a maximum number of tokens that can be passed in as input when making a request. For instance, if you are using text-ada-001, the maximum number of tokens you can send to this model is 2,048 tokens per request. You cannot increase the maximum number of tokens a model takes in.
While rate limits can seem complex, they are critical to maintaining the smooth operation of APIs and ensuring everyone gets fair access. By understanding and working within your allocated limits, you’ll be able to use the OpenAI API efficiently without disruptions. And remember, you’re not alone in this – support is always available if you run into any difficulties. For more specific information on the OpenAI rate limits jump over to the official documentation.
Other articles you may find interesting on the subject of ChatGPT:
- Learn how to code with Chat GPT
- AI alternatives to ChatGPT
- ChatGPT glossary
- Write the perfect resume with ChatGPT for any job position
- ChatGPT plugins what can they do?
- What is OpenAI Playground?
Latest Geeky Gadgets Deals
Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, Geeky Gadgets may earn an affiliate commission. Learn about our Disclosure Policy.