Advanced Troubleshooting
Beyond basic errors. This page covers the harder-to-diagnose PI Web API issues that surface in production: Kerberos delegation, batch edge cases, memory pressure, IIS tracing, and performance degradation.
Start with the basics
For common errors like SSL, 401, 403, 404, and timeouts, see the Common Errors & Troubleshooting page first.
Kerberos double-hop / delegation failures
Kerberos delegation fails when PI Web API tries to access a back-end PI Data Archive on behalf of the user, but the server does not have delegation rights. This is one of the most common production issues in enterprise PI deployments.
Symptoms
- Authentication works when hitting PI Web API directly
- Requests that touch PI Data Archive return 401 or empty results
- The same request works when run directly on the PI Web API server
- Event Viewer on the PI Web API server shows Kerberos audit failures (Event ID 4625 or 4771)
- Works with Basic auth but fails with Kerberos
Root cause
Kerberos does not forward your credentials to a second server by default. PI Web API authenticates you, but when it connects to PI Data Archive on your behalf, the Data Archive sees an anonymous connection. This is the "double-hop" problem -- your credentials hop from your machine to PI Web API (hop 1) but cannot hop from PI Web API to Data Archive (hop 2) without explicit delegation configuration.
Diagnostic steps
# 1. Check the service account's SPN registration
setspn -L PI-WEB-API-SERVICE-ACCOUNT
# Expected output should include:
# HTTP/pi-web-api-server.domain.com
# HTTP/pi-web-api-server
# 2. Verify delegation is configured in AD
# Open Active Directory Users and Computers > find the service account
# > Properties > Delegation tab
# Should show: "Trust this computer for delegation to specified services only"
# With the PI Data Archive SPN listed
# 3. Check for duplicate SPNs (common cause of silent failures)
setspn -X
# Any duplicates will break Kerberos entirely
# 4. On the PI Web API server, check the current tickets
klist sessions
# Look for the service ticket to the PI Data Archive
# 5. Check Windows Event Viewer for Kerberos failures
# Event Viewer > Windows Logs > Security
# Filter for Event IDs: 4625, 4768, 4769, 4771
# Event 4771 with failure code 0x7 = delegation not permittedFix
- Configure constrained delegation in Active Directory for the PI Web API service account. The account needs "Trust this computer for delegation to specified services only" with the PI Data Archive SPN listed (typically
piserver/DATA-ARCHIVE-HOST). - Set the correct SPN for the PI Web API service. Run
setspn -S HTTP/pi-web-api-host PI-SERVICE-ACCOUNT. Use-S(not-A) to check for duplicates first. - Verify with klist on the PI Web API server to confirm the service ticket includes a forwarded TGT.
- Use protocol transition if PI Web API also accepts Basic auth. Set "Use any authentication protocol" in the delegation tab to allow protocol transition from Basic to Kerberos for the back-end hop.
- Restart IIS after making delegation changes. Kerberos ticket caches are not refreshed automatically.
Delegation is a domain admin task
Configuring Kerberos delegation requires Active Directory permissions. You will need your domain administrator to make these changes. Provide them the specific SPN values and the service account name to minimize back-and-forth.
PI Web API log files
PI Web API writes detailed logs that are essential for diagnosing issues that do not produce clear HTTP error messages. Knowing where to find them and what to look for saves hours of guessing.
| Log file | Location | What it contains |
|---|---|---|
| PI Web API log | %ProgramData%\OSIsoft\WebAPI\Logs\ | Application-level errors, AF SDK exceptions, configuration warnings |
| IIS log | %SystemDrive%\inetpub\logs\LogFiles\ | HTTP request/response log with status codes, timing, and client IPs |
| Windows Event Log | Event Viewer > Application | Service start/stop, unhandled exceptions, configuration errors |
| HTTPERR log | %SystemRoot%\System32\LogFiles\HTTPERR\ | Requests rejected by HTTP.sys before reaching IIS (connection resets, queue overflows) |
| AF SDK log | %ProgramData%\OSIsoft\AF\Logs\ | AF Server connection failures, SDK-level exceptions |
# Search PI Web API logs for errors in the last hour
$logPath = "$env:ProgramData\OSIsoft\WebAPI\Logs"
Get-ChildItem $logPath -Filter "*.log" |
Sort-Object LastWriteTime -Descending |
Select-Object -First 1 |
Get-Content -Tail 200 |
Select-String -Pattern "ERROR|Exception|WARN"
# Search IIS logs for non-200 responses
$iisPath = "$env:SystemDrive\inetpub\logs\LogFiles"
Get-ChildItem $iisPath -Recurse -Filter "*.log" |
Sort-Object LastWriteTime -Descending |
Select-Object -First 1 |
Get-Content -Tail 500 |
Select-String -Pattern " (4|5)\d{2} "
# Check HTTPERR for connection-level failures
Get-Content "$env:SystemRoot\System32\LogFiles\HTTPERR\httperr*.log" -Tail 50IIS request tracing (Failed Request Tracing)
When PI Web API returns errors but the logs do not explain why, IIS Failed Request Tracing (FREB) captures the complete request pipeline -- every module, every handler, every authentication step. This is the most powerful debugging tool for PI Web API server-side issues.
# 1. Install the IIS tracing feature (if not already installed)
Install-WindowsFeature Web-Http-Tracing
# 2. Enable tracing for the PI Web API site via appcmd
# Replace "Default Web Site" with your site name if different
$site = "Default Web Site"
appcmd configure trace "$site" /enablesite
# 3. Add a tracing rule for status codes 400-599
appcmd configure trace "$site/piwebapi" /enable /path:"*" /statusCodes:"400-599" /timeTaken:"00:00:30"
# Traces are written to:
# %SystemDrive%\inetpub\logs\FailedReqLogFiles\
# Open the XML files in a browser for a formatted timeline view
# 4. IMPORTANT: Disable tracing when done (performance impact)
appcmd configure trace "$site" /disablesitePerformance impact
Failed Request Tracing adds overhead to every matching request. Enable it only during active debugging and disable it when done. Use specific status code filters to limit the scope.
Batch requests: partial failures
A batch request can return 200 OK overall while individual sub-requests inside it fail. If you only check the top-level status, you will miss errors.
from dataclasses import dataclass
@dataclass
class BatchSubResult:
key: str
status: int
content: dict | None
errors: list[str]
is_success: bool
def parse_batch_results(response_json: dict) -> list[BatchSubResult]:
"""Parse each sub-request in a batch response, surfacing failures."""
results = []
for key, result in response_json.items():
status = result.get("Status", 0)
content = result.get("Content")
errors = []
if status >= 400:
if isinstance(content, dict):
errors = content.get("Errors", [])
elif isinstance(content, str):
errors = [content]
results.append(BatchSubResult(
key=key,
status=status,
content=content if status < 400 else None,
errors=errors,
is_success=status < 400,
))
failed = [r for r in results if not r.is_success]
if failed:
print(f"Batch: {len(results)} total, {len(failed)} failed:")
for r in failed:
print(f" {r.key}: HTTP {r.status} - {r.errors}")
else:
print(f"Batch: all {len(results)} sub-requests succeeded")
return results
# Usage
resp = session.post(f"{BASE_URL}/batch", json=batch_body)
resp.raise_for_status() # Only checks top-level 200
results = parse_batch_results(resp.json())
# Access successful results
for r in results:
if r.is_success:
value = r.content.get("Value")
# process value...Common partial failure causes
| Sub-request status | Likely cause | Resolution |
|---|---|---|
| 404 | WebID is invalid or the point was deleted | Re-resolve the WebID using a path-based lookup |
| 403 | Permission denied on specific points | Check PI identity mapping for the service account |
| 409 | Write conflict (existing value at timestamp) | Use updateOption=Replace to overwrite |
| 500 | Server-side error for specific resources | Check PI Web API logs for the stack trace |
| 502 | PI Data Archive connection failed mid-batch | Retry the failed sub-requests only |
import time
def batch_with_retry(session, base_url, batch_body, max_retries=2):
"""Execute a batch request with automatic retry for transient failures."""
retryable_statuses = {502, 503, 504}
remaining = dict(batch_body)
all_results = {}
for attempt in range(max_retries + 1):
if not remaining:
break
resp = session.post(f"{base_url}/batch", json=remaining)
resp.raise_for_status()
results = parse_batch_results(resp.json())
retry_batch = {}
for r in results:
if r.is_success:
all_results[r.key] = r
elif r.status in retryable_statuses:
retry_batch[r.key] = remaining[r.key] # retry
else:
all_results[r.key] = r # permanent failure
remaining = retry_batch
if remaining and attempt < max_retries:
time.sleep(2 ** attempt)
print(f"Retrying {len(remaining)} failed sub-requests (attempt {attempt + 2})")
all_results.update({k: BatchSubResult(k, 503, None, ["Max retries exceeded"], False)
for k in remaining})
return all_resultsMemory pressure and 503 errors
When PI Web API runs low on memory, it starts returning 503 (Service Unavailable). This happens most often when:
- Too many concurrent large queries. Each recorded value request with a large time range or high maxCount allocates significant memory. A single request for 1M recorded values can consume hundreds of MB.
- Batch requests with hundreds of sub-requests. The server assembles all responses in memory before returning. A batch of 500 recorded-value reads can exceed the application pool memory limit.
- Multiple clients hammering the server. PI Web API is often shared across teams; one heavy client can affect everyone.
- IIS application pool recycling. The default memory limit may be too low for heavy PI Web API usage. Check
IIS Manager > Application Pools > Advanced Settings > Private Memory Limit.
Connection limit configuration
| Setting | Location | Default | Recommendation |
|---|---|---|---|
| Application pool memory limit | IIS Manager | 1,843,200 KB | Increase to 4-8 GB for heavy usage |
| Maximum concurrent requests | PI Web API Admin | Varies | Monitor and set based on server capacity |
| BatchLimit | PI Web API Configuration | Varies by version | Keep at 100-200 sub-requests per batch |
| Queue length | IIS App Pool > Advanced | 1,000 | Increase if seeing HTTP 503 with "queue full" |
Client-side mitigation
import time
from concurrent.futures import ThreadPoolExecutor
# 1. Limit concurrent requests with a thread pool
# Use max_workers=5, not 50 -- PI Web API is not designed for
# dozens of concurrent large queries from a single client
pool = ThreadPoolExecutor(max_workers=5)
# 2. Chunk batch requests (100 sub-requests per batch is a safe default)
def chunked_batch(session, base_url, all_requests, chunk_size=100, delay=0.2):
"""Execute batch requests in chunks with a delay between them."""
results = {}
keys = list(all_requests.keys())
for i in range(0, len(keys), chunk_size):
chunk_keys = keys[i:i+chunk_size]
chunk = {k: all_requests[k] for k in chunk_keys}
resp = session.post(f"{base_url}/batch", json=chunk)
resp.raise_for_status()
results.update(resp.json())
if i + chunk_size < len(keys):
time.sleep(delay) # Give the server breathing room
return results
# 3. Use smaller time ranges and lower maxCount
params = {
"startTime": "*-1h", # Not *-30d
"endTime": "*",
"maxCount": 1000, # Not 150000
"selectedFields": "Items.Timestamp;Items.Value;Items.Good",
}
# 4. Exponential backoff on 503
def request_with_backoff(session, url, params=None, max_retries=3):
for attempt in range(max_retries):
resp = session.get(url, params=params)
if resp.status_code != 503:
return resp
wait = min(2 ** attempt, 30) # Cap at 30 seconds
print(f"503 received, waiting {wait}s before retry {attempt + 1}...")
time.sleep(wait)
return resp # Return the last response even if still 503Slow queries: finding the bottleneck
When PI Web API queries are slow, the bottleneck is usually one of three places: network, PI Web API server, or PI Data Archive. This profiler helps you identify which layer is causing the slowdown.
import time
import json
def timed_request(session, url, params=None, label=""):
"""Measure request timing with detailed breakdown."""
start = time.perf_counter()
resp = session.get(url, params=params)
elapsed = time.perf_counter() - start
# Build timing report
report = {
"label": label or url.split("/")[-1],
"status": resp.status_code,
"total_time_s": round(elapsed, 3),
"response_bytes": len(resp.content),
}
if hasattr(resp, 'elapsed'):
server_time = resp.elapsed.total_seconds()
report["server_time_s"] = round(server_time, 3)
report["network_overhead_s"] = round(elapsed - server_time, 3)
report["bottleneck"] = (
"network" if (elapsed - server_time) > server_time
else "server"
)
# Check for truncation (silent data loss)
if resp.status_code == 200:
data = resp.json()
items = data.get("Items", [])
links = data.get("Links", {})
report["item_count"] = len(items)
if "Next" in links:
report["truncated"] = True
report["warning"] = "Data was truncated -- more pages available"
print(json.dumps(report, indent=2))
return resp
# Example: Compare different query strategies for the same data
print("=== Strategy 1: Recorded values, large range ===")
timed_request(session, f"{BASE_URL}/streams/{WEB_ID}/recorded",
{"startTime": "*-7d", "maxCount": 50000},
label="recorded-7d-50k")
print("\n=== Strategy 2: Interpolated values, same range ===")
timed_request(session, f"{BASE_URL}/streams/{WEB_ID}/interpolated",
{"startTime": "*-7d", "interval": "1h"},
label="interpolated-7d-1h")
print("\n=== Strategy 3: Summary statistics ===")
timed_request(session, f"{BASE_URL}/streams/{WEB_ID}/summary",
{"startTime": "*-7d", "summaryType": "Average,Minimum,Maximum,StdDev"},
label="summary-7d")Bottleneck decision table
| Symptom | Bottleneck | Fix |
|---|---|---|
| High network overhead, fast server time | Network / serialization | Use selectedFields to reduce payload, use batch to reduce round trips |
| Large response, fast server time | JSON serialization | Request fewer fields, use summary instead of raw recorded values |
| Slow server time, small response | Data Archive query | Check PI Data Archive performance counters, verify point is not a formula/calc tag |
| Slow server time, large response | Query too broad | Reduce time range, lower maxCount, use interpolated instead of recorded |
| First request slow, subsequent fast | Cold start / auth negotiation | Session reuse, connection pooling, keep-alive |
COMException errors
PI Web API uses the AF SDK internally to connect to PI Data Archive and AF Server. When these connections fail, the error surfaces as a COMException -- which can be confusing since you are calling a REST API.
| COMException message | Meaning |
|---|---|
[-10722] PI Data Archive connection lost | PI Web API cannot reach the Data Archive. Check network, firewall (port 5450), and Data Archive service status. |
[-10401] No access | The PI Web API service account does not have PI Identity permissions on the Data Archive. |
[-11091] Point not found | The PI point exists in the AF hierarchy but the underlying Data Archive point was deleted. |
AF Server connection error | PI Web API cannot reach the AF Server. Check SQL Server connectivity and AF Server service status. |
Where to find COMExceptions
COMExceptions appear in the PI Web API log files (%ProgramData%\OSIsoft\WebAPI\Logs\), not in the HTTP response body. The HTTP response typically shows a generic 500 error. Always check the server logs when you see unexplained 500s.
Stale data and caching behavior
PI Web API caches some responses. If you read a value, write a new one, and immediately read again, you may get the old value back. Understanding the caching layers helps avoid confusing stale data issues.
| Cache layer | What is cached | How to bypass |
|---|---|---|
| PI Web API response cache | Current values, config data | Cache-Control: no-cache header |
| AF SDK object cache | AF elements, attributes, templates | Call the refresh endpoint or wait for cache expiry |
| Client-side HTTP cache | Any response with cache headers | Add a unique query parameter or Cache-Control: no-store |
# Force PI Web API to bypass its cache
resp = session.get(
f"{BASE_URL}/streams/{WEB_ID}/value",
headers={"Cache-Control": "no-cache"},
)
# For environments with aggressive proxy caching, add a cache buster
import time
resp = session.get(
f"{BASE_URL}/streams/{WEB_ID}/value",
params={"_nocache": int(time.time() * 1000)},
)
# Verify write-then-read consistency
def write_and_verify(session, base_url, web_id, value, max_wait=5.0):
"""Write a value and verify it was persisted."""
# Write
payload = {"Value": value, "Timestamp": "*"}
resp = session.post(f"{base_url}/streams/{web_id}/value", json=payload)
resp.raise_for_status()
# Read back with cache bypass, retrying for consistency
start = time.time()
while time.time() - start < max_wait:
resp = session.get(
f"{base_url}/streams/{web_id}/value",
headers={"Cache-Control": "no-cache"},
)
read_value = resp.json().get("Value")
if read_value == value:
return True
time.sleep(0.5)
print(f"Warning: wrote {value} but read back {read_value} after {max_wait}s")
return FalseWhen caching helps
For dashboard-style reads where you poll the same points every few seconds, caching actually helps by reducing PI Data Archive load. Only bypass the cache when you need to verify a write or need guaranteed freshness.
Production logging strategy
When something goes wrong in production, you need enough logging to diagnose the issue without drowning in noise. This logging setup captures the right level of detail for PI Web API pipelines.
import logging
import time
from logging.handlers import RotatingFileHandler
def setup_pi_logger(
name: str = "piwebapi",
log_file: str = "piwebapi.log",
max_bytes: int = 10 * 1024 * 1024, # 10 MB
backup_count: int = 5,
) -> logging.Logger:
"""Configure a production-grade logger for PI Web API operations."""
logger = logging.getLogger(name)
logger.setLevel(logging.INFO)
formatter = logging.Formatter(
"%(asctime)s %(levelname)-8s [%(name)s] %(message)s",
datefmt="%Y-%m-%d %H:%M:%S",
)
# Rotating file handler -- prevents unbounded log growth
file_handler = RotatingFileHandler(
log_file, maxBytes=max_bytes, backupCount=backup_count
)
file_handler.setFormatter(formatter)
logger.addHandler(file_handler)
# Console handler for interactive debugging
console_handler = logging.StreamHandler()
console_handler.setFormatter(formatter)
console_handler.setLevel(logging.WARNING) # Only warnings+ to console
logger.addHandler(console_handler)
return logger
logger = setup_pi_logger()
def logged_request(session, method, url, **kwargs):
"""Wrapper that logs PI Web API requests with timing and error context."""
start = time.perf_counter()
resp = getattr(session, method)(url, **kwargs)
elapsed = time.perf_counter() - start
# Extract the endpoint path for cleaner logs
path = url.split("/piwebapi/")[-1] if "/piwebapi/" in url else url
if resp.status_code >= 500:
logger.error(
"%s %s -> %d (%.3fs) body=%s",
method.upper(), path, resp.status_code, elapsed,
resp.text[:500],
)
elif resp.status_code >= 400:
logger.warning(
"%s %s -> %d (%.3fs) body=%s",
method.upper(), path, resp.status_code, elapsed,
resp.text[:300],
)
elif elapsed > 10.0:
logger.warning(
"%s %s -> %d VERY SLOW (%.3fs, %d bytes)",
method.upper(), path, resp.status_code, elapsed,
len(resp.content),
)
elif elapsed > 5.0:
logger.warning(
"%s %s -> %d SLOW (%.3fs)",
method.upper(), path, resp.status_code, elapsed,
)
else:
logger.info(
"%s %s -> %d (%.3fs, %d bytes)",
method.upper(), path, resp.status_code, elapsed,
len(resp.content),
)
return resp
# Usage
resp = logged_request(session, "get", f"{BASE_URL}/streams/{WEB_ID}/value")PI identity mapping errors
A common production issue: your HTTP authentication succeeds (no 401) but data reads return empty results or 403 on specific points. This usually means the Windows identity is not mapped to a PI identity with the correct permissions.
def check_pi_identity(session, base_url):
"""Check what PI identity the current user is mapped to."""
# The system/userinfo endpoint reveals the effective identity
resp = session.get(f"{base_url}/system/userinfo",
params={"selectedFields": "Name;IdentityType;IsAuthenticated"})
if resp.status_code == 200:
info = resp.json()
print(f"Authenticated as: {info.get('Name')}")
print(f"Identity type: {info.get('IdentityType')}")
print(f"Is authenticated: {info.get('IsAuthenticated')}")
return info
else:
print(f"Cannot check identity: HTTP {resp.status_code}")
return None
# Common issue: user maps to "piworld" (read-only) instead of
# a custom identity with write permissions.
# Fix: PI System Management Tools > PI Identities & MappingsService account vs interactive user
In production, PI Web API calls usually run under a service account, not an interactive user. Verify that the service account has the correct PI identity mapping -- testing with your personal credentials is not sufficient because your personal Windows account likely has different PI mappings.
Next steps
Common errors
Start with the basics: SSL, 401, 403, 404, timeouts, and connection errors.
Advanced recipes
Production-grade patterns for resilient PI Web API integrations.
PI Integration Audit
ServiceLet PiSharp diagnose your production PI Web API issues professionally.
Ask PiChat
Describe your error and get AI-assisted troubleshooting guidance.