Skip to content

Google Vertex AI / Gemini API Reference

Status: Partial

Verified against official sources on: 2026-03-23

Confidence: Medium. Endpoint and schema coverage are broadly aligned with Google documentation, but several advanced capability claims still require tighter verification.

Official sources:


Endpoints

Vertex AI

POST https://{LOCATION}-aiplatform.googleapis.com/v1/projects/{PROJECT_ID}/locations/{LOCATION}/publishers/google/models/{MODEL_ID}:generateContent
POST https://{LOCATION}-aiplatform.googleapis.com/v1/projects/{PROJECT_ID}/locations/{LOCATION}/publishers/google/models/{MODEL_ID}:streamGenerateContent

Gemini AI Studio (Google AI)

POST https://generativelanguage.googleapis.com/v1/models/{MODEL_ID}:generateContent?key={API_KEY}
POST https://generativelanguage.googleapis.com/v1/models/{MODEL_ID}:streamGenerateContent?key={API_KEY}

Request Structure

{
  "contents": [...],
  "systemInstruction": {...},
  "tools": [...],
  "toolConfig": {...},
  "safetySettings": [...],
  "generationConfig": {...},
  "cachedContent": "cachedContents/{id}",
  "labels": {"key": "value"}
}

Contents & Parts

Content Object

{
  "role": "user",
  "parts": [...]
}

Roles: "user", "model".

Part Types

Text

{"text": "Hello, world!"}

Inline Data (images, audio, video)

{
  "inlineData": {
    "mimeType": "image/jpeg",
    "data": "<base64>"
  }
}

Supports up to 3,000 images per request (Gemini 2.0 Flash).

File Data (Cloud Storage, URLs)

{
  "fileData": {
    "mimeType": "video/mp4",
    "fileUri": "gs://bucket/video.mp4"
  }
}

Also supports https:// URLs and YouTube video URLs.

Function Call (model output)

{
  "functionCall": {
    "name": "get_weather",
    "args": {"city": "Paris"}
  }
}

Function Response (user input)

{
  "functionResponse": {
    "name": "get_weather",
    "response": {"temperature": 22}
  }
}

Executable Code (code execution output)

{
  "executableCode": {
    "language": "PYTHON",
    "code": "print('hello')"
  }
}

Code Execution Result

{
  "codeExecutionResult": {
    "outcome": "OUTCOME_OK",
    "output": "hello"
  }
}

Thought (model reasoning)

{
  "thought": true,
  "text": "Let me think about this..."
}

Boolean flag indicating this part contains model reasoning.

Thought Signature

{
  "thoughtSignature": "<opaque_base64>"
}

Opaque signature for replaying thoughts in multi-turn conversations.

Video Metadata

{
  "videoMetadata": {
    "startOffset": {"seconds": 0},
    "endOffset": {"seconds": 30}
  }
}

System Instruction

{
  "systemInstruction": {
    "parts": [
      {"text": "You are a helpful assistant."}
    ]
  }
}

Text-only content for system-level directives.


GenerationConfig

Parameter Type Default Description
temperature float 1.0 Sampling randomness (0.0–2.0)
topP float 0.95 Nucleus sampling threshold (0.0–1.0)
topK integer Limits token selection to top-K candidates
maxOutputTokens integer Maximum response length in tokens
candidateCount integer 1 Number of response variations (1–8 for Gemini 2.0+)
stopSequences string[] Up to 5 case-sensitive stop strings
presencePenalty float 0 Penalize already-present tokens (-2.0 to 2.0)
frequencyPenalty float 0 Penalize repeated tokens (-2.0 to 2.0)
responseMimeType string text/plain Output format: text/plain, application/json, text/x.enum
responseSchema Schema JSON Schema for structured output (requires non-plain MIME type)
seed integer Reproducible output (best-effort)
responseLogprobs boolean false Enable token probability logging
logprobs integer Return top candidate tokens (1–20)
audioTimestamp boolean false Enable timestamp understanding for audio-only files
thinkingConfig object Control reasoning process (Gemini 2.5+)
mediaResolution enum Reduce token usage per media item: HIGH, MEDIUM, LOW
speechConfig object Voice output config: {"voiceConfig": {"prebuiltVoiceConfig": {"voiceName": "Kore"}}}

ThinkingConfig

Gemini 2.5 Models (Token Budget)

{
  "thinkingConfig": {
    "thinkingBudget": 8192,
    "includeThoughts": true
  }
}
Field Type Description
thinkingBudget integer Token budget for reasoning. 0 = disabled, -1 = dynamic.
includeThoughts boolean Include thought text in response parts.

Special cases:

  • gemini-2.5-pro: Cannot disable thinking (budget=0 is invalid). Use -1 for dynamic.
  • gemini-2.5-flash: budget=0 disables thinking.

Gemini 3+ Models (Thinking Level)

{
  "thinkingConfig": {
    "thinkingLevel": "HIGH",
    "includeThoughts": true
  }
}
Field Type Description
thinkingLevel string Reasoning depth enum.
includeThoughts boolean Include thought text in response parts.

ThinkingLevel values by model variant:

Level Flash / Flash-Lite Pro
MINIMAL Supported Not supported (use LOW)
LOW Supported Supported
MEDIUM Supported Not supported (use HIGH)
HIGH Supported (default) Supported (default)

Important: Cannot specify both thinkingLevel and thinkingBudget in the same request for Gemini 3 models (will error). thinkingBudget is accepted for backwards compatibility on Gemini 3 but may produce unexpected results on Pro.


Tool Types

Each tool should contain exactly one type.

Function Declarations

{
  "functionDeclarations": [
    {
      "name": "get_weather",
      "description": "Get current weather",
      "parameters": {
        "type": "OBJECT",
        "properties": {
          "city": {"type": "STRING", "description": "City name"}
        },
        "required": ["city"]
      }
    }
  ]
}
{
  "googleSearch": {}
}

For Gemini 2.0+ models. Enables grounded web search.

Google Search Retrieval (Legacy)

{
  "googleSearchRetrieval": {
    "dynamicRetrievalConfig": {
      "mode": "MODE_DYNAMIC",
      "dynamicThreshold": 0.7
    }
  }
}

Code Execution

{
  "codeExecution": {}
}

Model can write and execute Python code.

URL Context

{
  "urlContext": {}
}

Fetches live web page content for URLs in the prompt. Two-stage process: Google index first, then direct fetch if needed.

Google Maps

{
  "googleMaps": {}
}

Enables location-aware responses with Google Maps data.


ToolConfig

{
  "toolConfig": {
    "functionCallingConfig": {
      "mode": "AUTO",
      "allowedFunctionNames": ["get_weather"]
    }
  }
}
Mode Description
AUTO Model decides whether to call functions
ANY Must call at least one function
NONE Never call functions

allowedFunctionNames: When mode is ANY, restrict to these specific functions.


Response Schema

{
  "candidates": [...],
  "usageMetadata": {...},
  "modelVersion": "gemini-2.5-flash-001",
  "createTime": "2026-03-20T...",
  "responseId": "abc123",
  "promptFeedback": {...}
}

Candidate Object

Field Type Description
index integer Position in response list
content Content Generated content (parts array)
finishReason string Why generation stopped
safetyRatings array Per-category harm ratings
citationMetadata object Source attributions with URI, title, date
groundingMetadata object Grounding sources
urlContextMetadata object Retrieved URL information
avgLogprobs float Average log probability (confidence score)
logprobsResult object Detailed token probability data
finishMessage string Human-readable stop reason

FinishReason Values

Value Description
STOP Natural stop or hit stop sequence
MAX_TOKENS Hit maxOutputTokens limit
SAFETY Blocked by safety filter
RECITATION Blocked due to recitation/copyright
BLOCKLIST Blocked by term blocklist
PROHIBITED_CONTENT Blocked prohibited content
SPII Blocked sensitive personally identifiable info
MALFORMED_FUNCTION_CALL Invalid function call generated
OTHER Other/unspecified reason
FINISH_REASON_UNSPECIFIED Not set
MODEL_ARMOR Blocked by Model Armor
IMAGE_SAFETY Image blocked by safety
IMAGE_PROHIBITED_CONTENT Image contains prohibited content
IMAGE_RECITATION Image blocked for recitation
IMAGE_OTHER Image blocked for other reason
UNEXPECTED_TOOL_CALL Unexpected tool call
NO_IMAGE No image generated
TOOL_CALL Model wants to invoke a tool

UsageMetadata

{
  "promptTokenCount": 100,
  "candidatesTokenCount": 50,
  "totalTokenCount": 170,
  "cachedContentTokenCount": 30,
  "thoughtsTokenCount": 20,
  "toolUsePromptTokenCount": 10,
  "promptTokensDetails": [
    {"modality": "TEXT", "tokenCount": 80},
    {"modality": "IMAGE", "tokenCount": 20}
  ],
  "candidatesTokensDetails": [
    {"modality": "TEXT", "tokenCount": 50}
  ],
  "cacheTokensDetails": [
    {"modality": "TEXT", "tokenCount": 30}
  ],
  "toolUsePromptTokensDetails": [
    {"modality": "TEXT", "tokenCount": 10}
  ],
  "trafficType": "ON_DEMAND"
}

Token Count Fields

Field Type Description
promptTokenCount integer Total input tokens
candidatesTokenCount integer Output tokens (Vertex AI: excludes thoughts; Gemini API: includes thoughts)
totalTokenCount integer Sum of prompt + candidates + thoughts + toolUse
cachedContentTokenCount integer Tokens served from cache
thoughtsTokenCount integer Tokens used for model reasoning
toolUsePromptTokenCount integer Tokens from tool execution results fed back to model
promptTokensDetails array Per-modality breakdown of input tokens
candidatesTokensDetails array Per-modality breakdown of output tokens
cacheTokensDetails array Per-modality breakdown of cached tokens
toolUsePromptTokensDetails array Per-modality breakdown of tool use tokens
trafficType string ON_DEMAND, PROVISIONED_THROUGHPUT, etc.

Modality Values

TEXT, IMAGE, AUDIO, VIDEO, DOCUMENT.

Important difference: On Vertex AI, candidatesTokenCount does NOT include thinking tokens (they are separate in thoughtsTokenCount). On the Gemini API (AI Studio), candidatesTokenCount INCLUDES thinking tokens.


LogprobsResult

{
  "topCandidates": [
    {
      "candidates": [
        {"token": "Hello", "tokenId": 12345, "logProbability": -0.1},
        {"token": "Hi", "tokenId": 67890, "logProbability": -2.3}
      ]
    }
  ],
  "chosenCandidates": [
    {"token": "Hello", "tokenId": 12345, "logProbability": -0.1}
  ]
}

Two parallel arrays:

  • topCandidates[] — alternative tokens considered at each position
  • chosenCandidates[] — actual tokens selected

Safety Settings

{
  "safetySettings": [
    {
      "category": "HARM_CATEGORY_HATE_SPEECH",
      "threshold": "BLOCK_MEDIUM_AND_ABOVE",
      "method": "PROBABILITY"
    }
  ]
}

Categories

Category
HARM_CATEGORY_HATE_SPEECH
HARM_CATEGORY_DANGEROUS_CONTENT
HARM_CATEGORY_HARASSMENT
HARM_CATEGORY_SEXUALLY_EXPLICIT

Thresholds

Threshold Description
BLOCK_NONE / OFF No blocking
BLOCK_ONLY_HIGH Block only high probability
BLOCK_MEDIUM_AND_ABOVE Block medium and above
BLOCK_LOW_AND_ABOVE Block low and above (strictest)

Methods

Method Description
PROBABILITY Probability-based (default)
SEVERITY Severity-based

Streaming

The streamGenerateContent endpoint returns chunks as they are generated. Each chunk has the same GenerateContentResponse structure but with partial content.

Token usage (usageMetadata) is typically included in the final chunk only. Thought parts (thought: true) appear in streaming chunks as they are generated, followed by thoughtSignature parts.


Supported MIME Types

Images

image/jpeg, image/png, image/gif, image/webp

Video

video/mp4, video/mpeg, video/mov, video/avi, video/x-flv, video/mpg, video/webm, video/wmv, video/3gpp

Audio

audio/wav, audio/mp3, audio/mpeg, audio/aiff, audio/aac, audio/ogg, audio/flac

Documents

application/pdf, text/plain