Skip to main content
Reference

Advanced Troubleshooting

Beyond basic errors. This page covers the harder-to-diagnose PI Web API issues that surface in production: Kerberos delegation, batch edge cases, memory pressure, IIS tracing, and performance degradation.

Start with the basics

For common errors like SSL, 401, 403, 404, and timeouts, see the Common Errors & Troubleshooting page first.

Kerberos double-hop / delegation failures

Kerberos delegation fails when PI Web API tries to access a back-end PI Data Archive on behalf of the user, but the server does not have delegation rights. This is one of the most common production issues in enterprise PI deployments.

Symptoms

  • Authentication works when hitting PI Web API directly
  • Requests that touch PI Data Archive return 401 or empty results
  • The same request works when run directly on the PI Web API server
  • Event Viewer on the PI Web API server shows Kerberos audit failures (Event ID 4625 or 4771)
  • Works with Basic auth but fails with Kerberos

Root cause

Kerberos does not forward your credentials to a second server by default. PI Web API authenticates you, but when it connects to PI Data Archive on your behalf, the Data Archive sees an anonymous connection. This is the "double-hop" problem -- your credentials hop from your machine to PI Web API (hop 1) but cannot hop from PI Web API to Data Archive (hop 2) without explicit delegation configuration.

Diagnostic steps

Diagnose delegation on PI Web API serverpowershell
# 1. Check the service account's SPN registration
setspn -L PI-WEB-API-SERVICE-ACCOUNT

# Expected output should include:
#   HTTP/pi-web-api-server.domain.com
#   HTTP/pi-web-api-server

# 2. Verify delegation is configured in AD
# Open Active Directory Users and Computers > find the service account
# > Properties > Delegation tab
# Should show: "Trust this computer for delegation to specified services only"
# With the PI Data Archive SPN listed

# 3. Check for duplicate SPNs (common cause of silent failures)
setspn -X
# Any duplicates will break Kerberos entirely

# 4. On the PI Web API server, check the current tickets
klist sessions
# Look for the service ticket to the PI Data Archive

# 5. Check Windows Event Viewer for Kerberos failures
# Event Viewer > Windows Logs > Security
# Filter for Event IDs: 4625, 4768, 4769, 4771
# Event 4771 with failure code 0x7 = delegation not permitted

Fix

  1. Configure constrained delegation in Active Directory for the PI Web API service account. The account needs "Trust this computer for delegation to specified services only" with the PI Data Archive SPN listed (typically piserver/DATA-ARCHIVE-HOST).
  2. Set the correct SPN for the PI Web API service. Run setspn -S HTTP/pi-web-api-host PI-SERVICE-ACCOUNT. Use -S (not -A) to check for duplicates first.
  3. Verify with klist on the PI Web API server to confirm the service ticket includes a forwarded TGT.
  4. Use protocol transition if PI Web API also accepts Basic auth. Set "Use any authentication protocol" in the delegation tab to allow protocol transition from Basic to Kerberos for the back-end hop.
  5. Restart IIS after making delegation changes. Kerberos ticket caches are not refreshed automatically.

Delegation is a domain admin task

Configuring Kerberos delegation requires Active Directory permissions. You will need your domain administrator to make these changes. Provide them the specific SPN values and the service account name to minimize back-and-forth.

PI Web API log files

PI Web API writes detailed logs that are essential for diagnosing issues that do not produce clear HTTP error messages. Knowing where to find them and what to look for saves hours of guessing.

Log fileLocationWhat it contains
PI Web API log%ProgramData%\OSIsoft\WebAPI\Logs\Application-level errors, AF SDK exceptions, configuration warnings
IIS log%SystemDrive%\inetpub\logs\LogFiles\HTTP request/response log with status codes, timing, and client IPs
Windows Event LogEvent Viewer > ApplicationService start/stop, unhandled exceptions, configuration errors
HTTPERR log%SystemRoot%\System32\LogFiles\HTTPERR\Requests rejected by HTTP.sys before reaching IIS (connection resets, queue overflows)
AF SDK log%ProgramData%\OSIsoft\AF\Logs\AF Server connection failures, SDK-level exceptions
Quick log search for recent errorspowershell
# Search PI Web API logs for errors in the last hour
$logPath = "$env:ProgramData\OSIsoft\WebAPI\Logs"
Get-ChildItem $logPath -Filter "*.log" |
  Sort-Object LastWriteTime -Descending |
  Select-Object -First 1 |
  Get-Content -Tail 200 |
  Select-String -Pattern "ERROR|Exception|WARN"

# Search IIS logs for non-200 responses
$iisPath = "$env:SystemDrive\inetpub\logs\LogFiles"
Get-ChildItem $iisPath -Recurse -Filter "*.log" |
  Sort-Object LastWriteTime -Descending |
  Select-Object -First 1 |
  Get-Content -Tail 500 |
  Select-String -Pattern " (4|5)\d{2} "

# Check HTTPERR for connection-level failures
Get-Content "$env:SystemRoot\System32\LogFiles\HTTPERR\httperr*.log" -Tail 50

IIS request tracing (Failed Request Tracing)

When PI Web API returns errors but the logs do not explain why, IIS Failed Request Tracing (FREB) captures the complete request pipeline -- every module, every handler, every authentication step. This is the most powerful debugging tool for PI Web API server-side issues.

Enable Failed Request Tracingpowershell
# 1. Install the IIS tracing feature (if not already installed)
Install-WindowsFeature Web-Http-Tracing

# 2. Enable tracing for the PI Web API site via appcmd
#    Replace "Default Web Site" with your site name if different
$site = "Default Web Site"
appcmd configure trace "$site" /enablesite

# 3. Add a tracing rule for status codes 400-599
appcmd configure trace "$site/piwebapi" /enable /path:"*" /statusCodes:"400-599" /timeTaken:"00:00:30"

# Traces are written to:
# %SystemDrive%\inetpub\logs\FailedReqLogFiles\
# Open the XML files in a browser for a formatted timeline view

# 4. IMPORTANT: Disable tracing when done (performance impact)
appcmd configure trace "$site" /disablesite

Performance impact

Failed Request Tracing adds overhead to every matching request. Enable it only during active debugging and disable it when done. Use specific status code filters to limit the scope.

Batch requests: partial failures

A batch request can return 200 OK overall while individual sub-requests inside it fail. If you only check the top-level status, you will miss errors.

batch_error_check.pypython
from dataclasses import dataclass

@dataclass
class BatchSubResult:
    key: str
    status: int
    content: dict | None
    errors: list[str]
    is_success: bool

def parse_batch_results(response_json: dict) -> list[BatchSubResult]:
    """Parse each sub-request in a batch response, surfacing failures."""
    results = []
    for key, result in response_json.items():
        status = result.get("Status", 0)
        content = result.get("Content")
        errors = []
        if status >= 400:
            if isinstance(content, dict):
                errors = content.get("Errors", [])
            elif isinstance(content, str):
                errors = [content]

        results.append(BatchSubResult(
            key=key,
            status=status,
            content=content if status < 400 else None,
            errors=errors,
            is_success=status < 400,
        ))

    failed = [r for r in results if not r.is_success]
    if failed:
        print(f"Batch: {len(results)} total, {len(failed)} failed:")
        for r in failed:
            print(f"  {r.key}: HTTP {r.status} - {r.errors}")
    else:
        print(f"Batch: all {len(results)} sub-requests succeeded")

    return results

# Usage
resp = session.post(f"{BASE_URL}/batch", json=batch_body)
resp.raise_for_status()  # Only checks top-level 200
results = parse_batch_results(resp.json())

# Access successful results
for r in results:
    if r.is_success:
        value = r.content.get("Value")
        # process value...

Common partial failure causes

Sub-request statusLikely causeResolution
404WebID is invalid or the point was deletedRe-resolve the WebID using a path-based lookup
403Permission denied on specific pointsCheck PI identity mapping for the service account
409Write conflict (existing value at timestamp)Use updateOption=Replace to overwrite
500Server-side error for specific resourcesCheck PI Web API logs for the stack trace
502PI Data Archive connection failed mid-batchRetry the failed sub-requests only
retry_failed_subrequests.pypython
import time

def batch_with_retry(session, base_url, batch_body, max_retries=2):
    """Execute a batch request with automatic retry for transient failures."""
    retryable_statuses = {502, 503, 504}
    remaining = dict(batch_body)

    all_results = {}
    for attempt in range(max_retries + 1):
        if not remaining:
            break

        resp = session.post(f"{base_url}/batch", json=remaining)
        resp.raise_for_status()
        results = parse_batch_results(resp.json())

        retry_batch = {}
        for r in results:
            if r.is_success:
                all_results[r.key] = r
            elif r.status in retryable_statuses:
                retry_batch[r.key] = remaining[r.key]  # retry
            else:
                all_results[r.key] = r  # permanent failure

        remaining = retry_batch
        if remaining and attempt < max_retries:
            time.sleep(2 ** attempt)
            print(f"Retrying {len(remaining)} failed sub-requests (attempt {attempt + 2})")

    all_results.update({k: BatchSubResult(k, 503, None, ["Max retries exceeded"], False)
                        for k in remaining})
    return all_results

Memory pressure and 503 errors

When PI Web API runs low on memory, it starts returning 503 (Service Unavailable). This happens most often when:

  • Too many concurrent large queries. Each recorded value request with a large time range or high maxCount allocates significant memory. A single request for 1M recorded values can consume hundreds of MB.
  • Batch requests with hundreds of sub-requests. The server assembles all responses in memory before returning. A batch of 500 recorded-value reads can exceed the application pool memory limit.
  • Multiple clients hammering the server. PI Web API is often shared across teams; one heavy client can affect everyone.
  • IIS application pool recycling. The default memory limit may be too low for heavy PI Web API usage. Check IIS Manager > Application Pools > Advanced Settings > Private Memory Limit.

Connection limit configuration

SettingLocationDefaultRecommendation
Application pool memory limitIIS Manager1,843,200 KBIncrease to 4-8 GB for heavy usage
Maximum concurrent requestsPI Web API AdminVariesMonitor and set based on server capacity
BatchLimitPI Web API ConfigurationVaries by versionKeep at 100-200 sub-requests per batch
Queue lengthIIS App Pool > Advanced1,000Increase if seeing HTTP 503 with "queue full"

Client-side mitigation

memory_mitigation.pypython
import time
from concurrent.futures import ThreadPoolExecutor

# 1. Limit concurrent requests with a thread pool
# Use max_workers=5, not 50 -- PI Web API is not designed for
# dozens of concurrent large queries from a single client
pool = ThreadPoolExecutor(max_workers=5)

# 2. Chunk batch requests (100 sub-requests per batch is a safe default)
def chunked_batch(session, base_url, all_requests, chunk_size=100, delay=0.2):
    """Execute batch requests in chunks with a delay between them."""
    results = {}
    keys = list(all_requests.keys())
    for i in range(0, len(keys), chunk_size):
        chunk_keys = keys[i:i+chunk_size]
        chunk = {k: all_requests[k] for k in chunk_keys}
        resp = session.post(f"{base_url}/batch", json=chunk)
        resp.raise_for_status()
        results.update(resp.json())
        if i + chunk_size < len(keys):
            time.sleep(delay)  # Give the server breathing room
    return results

# 3. Use smaller time ranges and lower maxCount
params = {
    "startTime": "*-1h",   # Not *-30d
    "endTime": "*",
    "maxCount": 1000,      # Not 150000
    "selectedFields": "Items.Timestamp;Items.Value;Items.Good",
}

# 4. Exponential backoff on 503
def request_with_backoff(session, url, params=None, max_retries=3):
    for attempt in range(max_retries):
        resp = session.get(url, params=params)
        if resp.status_code != 503:
            return resp
        wait = min(2 ** attempt, 30)  # Cap at 30 seconds
        print(f"503 received, waiting {wait}s before retry {attempt + 1}...")
        time.sleep(wait)
    return resp  # Return the last response even if still 503

Slow queries: finding the bottleneck

When PI Web API queries are slow, the bottleneck is usually one of three places: network, PI Web API server, or PI Data Archive. This profiler helps you identify which layer is causing the slowdown.

query_profiler.pypython
import time
import json

def timed_request(session, url, params=None, label=""):
    """Measure request timing with detailed breakdown."""
    start = time.perf_counter()
    resp = session.get(url, params=params)
    elapsed = time.perf_counter() - start

    # Build timing report
    report = {
        "label": label or url.split("/")[-1],
        "status": resp.status_code,
        "total_time_s": round(elapsed, 3),
        "response_bytes": len(resp.content),
    }

    if hasattr(resp, 'elapsed'):
        server_time = resp.elapsed.total_seconds()
        report["server_time_s"] = round(server_time, 3)
        report["network_overhead_s"] = round(elapsed - server_time, 3)
        report["bottleneck"] = (
            "network" if (elapsed - server_time) > server_time
            else "server"
        )

    # Check for truncation (silent data loss)
    if resp.status_code == 200:
        data = resp.json()
        items = data.get("Items", [])
        links = data.get("Links", {})
        report["item_count"] = len(items)
        if "Next" in links:
            report["truncated"] = True
            report["warning"] = "Data was truncated -- more pages available"

    print(json.dumps(report, indent=2))
    return resp

# Example: Compare different query strategies for the same data
print("=== Strategy 1: Recorded values, large range ===")
timed_request(session, f"{BASE_URL}/streams/{WEB_ID}/recorded",
    {"startTime": "*-7d", "maxCount": 50000},
    label="recorded-7d-50k")

print("\n=== Strategy 2: Interpolated values, same range ===")
timed_request(session, f"{BASE_URL}/streams/{WEB_ID}/interpolated",
    {"startTime": "*-7d", "interval": "1h"},
    label="interpolated-7d-1h")

print("\n=== Strategy 3: Summary statistics ===")
timed_request(session, f"{BASE_URL}/streams/{WEB_ID}/summary",
    {"startTime": "*-7d", "summaryType": "Average,Minimum,Maximum,StdDev"},
    label="summary-7d")

Bottleneck decision table

SymptomBottleneckFix
High network overhead, fast server timeNetwork / serializationUse selectedFields to reduce payload, use batch to reduce round trips
Large response, fast server timeJSON serializationRequest fewer fields, use summary instead of raw recorded values
Slow server time, small responseData Archive queryCheck PI Data Archive performance counters, verify point is not a formula/calc tag
Slow server time, large responseQuery too broadReduce time range, lower maxCount, use interpolated instead of recorded
First request slow, subsequent fastCold start / auth negotiationSession reuse, connection pooling, keep-alive

COMException errors

PI Web API uses the AF SDK internally to connect to PI Data Archive and AF Server. When these connections fail, the error surfaces as a COMException -- which can be confusing since you are calling a REST API.

COMException messageMeaning
[-10722] PI Data Archive connection lostPI Web API cannot reach the Data Archive. Check network, firewall (port 5450), and Data Archive service status.
[-10401] No accessThe PI Web API service account does not have PI Identity permissions on the Data Archive.
[-11091] Point not foundThe PI point exists in the AF hierarchy but the underlying Data Archive point was deleted.
AF Server connection errorPI Web API cannot reach the AF Server. Check SQL Server connectivity and AF Server service status.

Where to find COMExceptions

COMExceptions appear in the PI Web API log files (%ProgramData%\OSIsoft\WebAPI\Logs\), not in the HTTP response body. The HTTP response typically shows a generic 500 error. Always check the server logs when you see unexplained 500s.

Stale data and caching behavior

PI Web API caches some responses. If you read a value, write a new one, and immediately read again, you may get the old value back. Understanding the caching layers helps avoid confusing stale data issues.

Cache layerWhat is cachedHow to bypass
PI Web API response cacheCurrent values, config dataCache-Control: no-cache header
AF SDK object cacheAF elements, attributes, templatesCall the refresh endpoint or wait for cache expiry
Client-side HTTP cacheAny response with cache headersAdd a unique query parameter or Cache-Control: no-store
cache_bypass.pypython
# Force PI Web API to bypass its cache
resp = session.get(
    f"{BASE_URL}/streams/{WEB_ID}/value",
    headers={"Cache-Control": "no-cache"},
)

# For environments with aggressive proxy caching, add a cache buster
import time
resp = session.get(
    f"{BASE_URL}/streams/{WEB_ID}/value",
    params={"_nocache": int(time.time() * 1000)},
)

# Verify write-then-read consistency
def write_and_verify(session, base_url, web_id, value, max_wait=5.0):
    """Write a value and verify it was persisted."""
    # Write
    payload = {"Value": value, "Timestamp": "*"}
    resp = session.post(f"{base_url}/streams/{web_id}/value", json=payload)
    resp.raise_for_status()

    # Read back with cache bypass, retrying for consistency
    start = time.time()
    while time.time() - start < max_wait:
        resp = session.get(
            f"{base_url}/streams/{web_id}/value",
            headers={"Cache-Control": "no-cache"},
        )
        read_value = resp.json().get("Value")
        if read_value == value:
            return True
        time.sleep(0.5)

    print(f"Warning: wrote {value} but read back {read_value} after {max_wait}s")
    return False

When caching helps

For dashboard-style reads where you poll the same points every few seconds, caching actually helps by reducing PI Data Archive load. Only bypass the cache when you need to verify a write or need guaranteed freshness.

Production logging strategy

When something goes wrong in production, you need enough logging to diagnose the issue without drowning in noise. This logging setup captures the right level of detail for PI Web API pipelines.

production_logging.pypython
import logging
import time

from logging.handlers import RotatingFileHandler

def setup_pi_logger(
    name: str = "piwebapi",
    log_file: str = "piwebapi.log",
    max_bytes: int = 10 * 1024 * 1024,  # 10 MB
    backup_count: int = 5,
) -> logging.Logger:
    """Configure a production-grade logger for PI Web API operations."""
    logger = logging.getLogger(name)
    logger.setLevel(logging.INFO)

    formatter = logging.Formatter(
        "%(asctime)s %(levelname)-8s [%(name)s] %(message)s",
        datefmt="%Y-%m-%d %H:%M:%S",
    )

    # Rotating file handler -- prevents unbounded log growth
    file_handler = RotatingFileHandler(
        log_file, maxBytes=max_bytes, backupCount=backup_count
    )
    file_handler.setFormatter(formatter)
    logger.addHandler(file_handler)

    # Console handler for interactive debugging
    console_handler = logging.StreamHandler()
    console_handler.setFormatter(formatter)
    console_handler.setLevel(logging.WARNING)  # Only warnings+ to console
    logger.addHandler(console_handler)

    return logger

logger = setup_pi_logger()

def logged_request(session, method, url, **kwargs):
    """Wrapper that logs PI Web API requests with timing and error context."""
    start = time.perf_counter()
    resp = getattr(session, method)(url, **kwargs)
    elapsed = time.perf_counter() - start

    # Extract the endpoint path for cleaner logs
    path = url.split("/piwebapi/")[-1] if "/piwebapi/" in url else url

    if resp.status_code >= 500:
        logger.error(
            "%s %s -> %d (%.3fs) body=%s",
            method.upper(), path, resp.status_code, elapsed,
            resp.text[:500],
        )
    elif resp.status_code >= 400:
        logger.warning(
            "%s %s -> %d (%.3fs) body=%s",
            method.upper(), path, resp.status_code, elapsed,
            resp.text[:300],
        )
    elif elapsed > 10.0:
        logger.warning(
            "%s %s -> %d VERY SLOW (%.3fs, %d bytes)",
            method.upper(), path, resp.status_code, elapsed,
            len(resp.content),
        )
    elif elapsed > 5.0:
        logger.warning(
            "%s %s -> %d SLOW (%.3fs)",
            method.upper(), path, resp.status_code, elapsed,
        )
    else:
        logger.info(
            "%s %s -> %d (%.3fs, %d bytes)",
            method.upper(), path, resp.status_code, elapsed,
            len(resp.content),
        )

    return resp

# Usage
resp = logged_request(session, "get", f"{BASE_URL}/streams/{WEB_ID}/value")

PI identity mapping errors

A common production issue: your HTTP authentication succeeds (no 401) but data reads return empty results or 403 on specific points. This usually means the Windows identity is not mapped to a PI identity with the correct permissions.

check_identity_mapping.pypython
def check_pi_identity(session, base_url):
    """Check what PI identity the current user is mapped to."""
    # The system/userinfo endpoint reveals the effective identity
    resp = session.get(f"{base_url}/system/userinfo",
                       params={"selectedFields": "Name;IdentityType;IsAuthenticated"})
    if resp.status_code == 200:
        info = resp.json()
        print(f"Authenticated as: {info.get('Name')}")
        print(f"Identity type: {info.get('IdentityType')}")
        print(f"Is authenticated: {info.get('IsAuthenticated')}")
        return info
    else:
        print(f"Cannot check identity: HTTP {resp.status_code}")
        return None

# Common issue: user maps to "piworld" (read-only) instead of
# a custom identity with write permissions.
# Fix: PI System Management Tools > PI Identities & Mappings

Service account vs interactive user

In production, PI Web API calls usually run under a service account, not an interactive user. Verify that the service account has the correct PI identity mapping -- testing with your personal credentials is not sufficient because your personal Windows account likely has different PI mappings.

Next steps