
Reading Values
PI Web API provides four ways to read data: current snapshot, recorded history, interpolated values at regular intervals, and statistical summaries. This guide covers each method in depth, including compression behavior, quality flags, digital states, and performance optimization.
Read types at a glance
| Type | Endpoint | Use case | Returns |
|---|---|---|---|
| Current value | /streams/{webId}/value | Latest snapshot | Single value |
| Recorded | /streams/{webId}/recorded | Actual stored events (compression-filtered) | Variable count |
| Interpolated | /streams/{webId}/interpolated | Evenly-spaced values (charts, ML, exports) | Fixed count |
| Summary | /streams/{webId}/summary | Statistics (min, max, avg, count) | One per summary type |
Understanding PI compression
Before reading values, you must understand how PI stores data. PI Data Archive uses exception-based compression: raw sensor readings are filtered, and only values that change significantly are stored. This means:
- Recorded values are not raw sensor readings. They are the values that passed the compression filter. A slowly-changing point might store 20 values per hour; a volatile one might store 500.
- Timestamps are not evenly spaced. Recorded values have irregular timestamps because they are event-driven.
- Use interpolated values when you need regular intervals. The
/interpolatedendpoint calculates values at exact timestamps using the surrounding recorded values.
Why this matters
Many developers new to PI assume "recorded" means "raw" and are surprised when they get fewer values than expected. If you need every raw sensor reading, ask your PI administrator about the point's exception deviation and compression deviation settings. These parameters control how aggressively data is filtered.
Current value
Returns the most recent value for a PI point. This is the value currently displayed in PI ProcessBook, PI Vision, and other PI clients.
POINT_WEB_ID = "your-web-id"
response = session.get(
f"{BASE_URL}/streams/{POINT_WEB_ID}/value",
params={
"selectedFields": "Timestamp;Value;Good;UnitsAbbreviation",
},
)
data = response.json()
print(f"Value: {data['Value']}")
print(f"Timestamp: {data['Timestamp']}")
print(f"Good: {data['Good']}")
print(f"Units: {data.get('UnitsAbbreviation', 'N/A')}")The selectedFields parameter
Use selectedFields to request only the fields you need. This reduces response size and improves performance, especially for batch reads of hundreds of points. Separate field names with semicolons. The default response includes many fields you may not need (Substituted, Annotated, Questionable, etc.).
Handling digital state values
Not all PI points store numeric values. Digital state points (e.g., pump on/off, valve open/closed, sensor status) return an object instead of a number in the Value field. This breaks code that assumes Value is always a float.
response = session.get(f"{BASE_URL}/streams/{WEB_ID}/value")
data = response.json()
value = data["Value"]
# Check if the value is a digital state (dict) or numeric (int/float)
if isinstance(value, dict):
# Digital state: {"Name": "Active", "Value": 1}
state_name = value.get("Name", "Unknown")
state_value = value.get("Value", None)
print(f"Digital state: {state_name} (code: {state_value})")
elif isinstance(value, (int, float)):
print(f"Numeric value: {value}")
elif isinstance(value, str):
# String point (rare, but possible)
print(f"String value: {value}")
else:
print(f"Unexpected value type: {type(value)} = {value}")
# Helper function for safe value extraction
def extract_numeric_value(pi_value):
"""Extract a numeric value from a PI Web API value response.
Returns None for digital states and bad quality values.
"""
value = pi_value.get("Value")
if not pi_value.get("Good", True):
return None
if isinstance(value, dict):
return None # Digital state
if isinstance(value, (int, float)):
return float(value)
return NoneDigital states break naive code
If you do float(data["Value"]) on a digital state value, you get a TypeError. Always check the type of the Value field before processing. This is the single most common runtime error in PI Web API integrations.
Recorded values
Returns the actual stored events within a time range. The number of values depends on compression settings and how often the value changed significantly.
response = session.get(
f"{BASE_URL}/streams/{POINT_WEB_ID}/recorded",
params={
"startTime": "*-1d", # 1 day ago
"endTime": "*", # Now
"maxCount": 1000, # Limit results
"selectedFields": "Items.Timestamp;Items.Value;Items.Good",
},
)
items = response.json()["Items"]
print(f"Retrieved {len(items)} recorded values")
# Warning: if len(items) == maxCount, data may be truncated
if len(items) == 1000:
print("Results may be truncated! Use pagination for complete data.")Boundary type
The boundaryType parameter controls what happens at the start and end of your time range. This affects data accuracy, especially for calculations.
| boundaryType | Behavior |
|---|---|
Inside (default) | Returns only values with timestamps strictly within the range. You may miss the value at exactly startTime. |
Outside | Includes the value immediately before startTime and after endTime. Useful when you need context around the boundaries. |
Interpolated | Adds interpolated values at exactly startTime and endTime. Best for accurate calculations over a precise time window. |
# Use Interpolated boundary for accurate time-window calculations
response = session.get(
f"{BASE_URL}/streams/{POINT_WEB_ID}/recorded",
params={
"startTime": "2026-03-15T08:00:00Z",
"endTime": "2026-03-15T16:00:00Z",
"boundaryType": "Interpolated",
"maxCount": 10000,
},
)
# Now the first value is interpolated AT 08:00:00 and the last AT 16:00:00Server-side filtering
Use filterExpression to filter values on the server before they are sent to you. This is much more efficient than downloading all values and filtering in Python.
# Only return values above 50
response = session.get(
f"{BASE_URL}/streams/{POINT_WEB_ID}/recorded",
params={
"startTime": "*-1d",
"endTime": "*",
"filterExpression": "'.' > 50", # '.' refers to the current value
"maxCount": 10000,
},
)
# Only return good quality values
response = session.get(
f"{BASE_URL}/streams/{POINT_WEB_ID}/recorded",
params={
"startTime": "*-1d",
"endTime": "*",
"filterExpression": "IsGood('.')",
"maxCount": 10000,
},
)Interpolated values
Returns values at evenly-spaced intervals. The PI server calculates each value by interpolating between the surrounding recorded values. Use this when you need consistent time spacing for charts, machine learning features, or data exports.
response = session.get(
f"{BASE_URL}/streams/{POINT_WEB_ID}/interpolated",
params={
"startTime": "*-1d",
"endTime": "*",
"interval": "1h", # One value per hour = 24 values for 1 day
"selectedFields": "Items.Timestamp;Items.Value;Items.Good",
},
)
items = response.json()["Items"]
for item in items:
print(f"{item['Timestamp']}: {item['Value']}")Interpolation does not work for all point types
Interpolation only works for numeric points (Float16, Float32, Float64, Int16, Int32). For digital state points and string points, the interpolated endpoint returns the last recorded value at each interval (step interpolation), which may not be what you expect. If you need digital state values at regular intervals, consider using recorded values and resampling client-side.
Aligning multiple streams
When reading interpolated values from multiple points, use the same startTime, endTime, and interval to ensure timestamps are aligned. You can also use syncTime and syncTimeBoundaryType for precise alignment.
# Read aligned interpolated values for multiple points
params = {
"startTime": "t", # Today at midnight
"endTime": "*", # Now
"interval": "5m", # 5-minute intervals
}
temp_items = session.get(
f"{BASE_URL}/streams/{TEMP_WEB_ID}/interpolated", params=params
).json()["Items"]
pressure_items = session.get(
f"{BASE_URL}/streams/{PRESSURE_WEB_ID}/interpolated", params=params
).json()["Items"]
# Because we used the same params, timestamps are aligned
# and we can zip them together
for temp, pressure in zip(temp_items, pressure_items):
# temp["Timestamp"] == pressure["Timestamp"]
print(f"{temp['Timestamp']}: T={temp['Value']:.1f} P={pressure['Value']:.2f}")Performance: interpolated vs recorded
Interpolated queries are typically much faster than recorded queries for large time ranges because the server returns a predictable number of values. Querying 30 days of recorded values might return 500,000 events; the same range with interval=1h returns exactly 720 values. Use interpolated values whenever you do not need every raw event.
Summary values
Returns statistical summaries over a time range. Useful for dashboards, reports, and KPI calculations.
response = session.get(
f"{BASE_URL}/streams/{POINT_WEB_ID}/summary",
params={
"startTime": "*-1d",
"endTime": "*",
"summaryType": "Average,Minimum,Maximum,StdDev,Count,PercentGood,Range",
"calculationBasis": "TimeWeighted",
},
)
summaries = response.json()["Items"]
for summary in summaries:
stype = summary["Type"]
value = summary["Value"]["Value"]
good = summary["Value"]["Good"]
print(f"{stype}: {value} (Good: {good})")Available summary types
| Summary type | Description |
|---|---|
Average | Time-weighted or event-weighted average |
Minimum | Minimum value in the range |
Maximum | Maximum value in the range |
Total | Sum of values (time-weighted integral) |
Count | Number of recorded values in the range |
StdDev | Standard deviation |
Range | Maximum minus minimum |
PercentGood | Percentage of time the value had good quality |
All | Returns all summary types at once |
Calculation basis
| Basis | Description |
|---|---|
TimeWeighted | Weights each value by how long it was held. A value that lasted 8 hours contributes more to the average than one that lasted 5 seconds. This is the correct choice for most process data. |
EventWeighted | Each recorded event has equal weight regardless of duration. Use this for discrete events (batch counts, alarm counts) rather than continuous process values. |
Loading into pandas
A production-grade pattern for loading PI data into pandas, with proper handling of quality flags, digital states, and timezones.
import pandas as pd
def pi_recorded_to_dataframe(
session, base_url, web_id, start="*-7d", end="*",
max_count=10000, column_name="Value"
):
"""Load PI recorded values into a pandas DataFrame.
Handles:
- Timezone-aware timestamps (UTC)
- Digital state filtering (non-numeric values become NaN)
- Quality flag filtering (bad values become NaN)
- Proper DatetimeIndex for time-series analysis
"""
response = session.get(
f"{base_url}/streams/{web_id}/recorded",
params={
"startTime": start,
"endTime": end,
"maxCount": max_count,
"selectedFields": "Items.Timestamp;Items.Value;Items.Good",
},
)
response.raise_for_status()
items = response.json().get("Items", [])
if not items:
return pd.DataFrame(columns=[column_name])
# Build DataFrame
rows = []
for item in items:
value = item["Value"]
good = item.get("Good", True)
# Handle digital states (value is a dict like {"Name": "Active", "Value": 1})
if isinstance(value, dict):
numeric_value = None # or value.get("Value") if you want the integer code
elif isinstance(value, (int, float)):
numeric_value = float(value) if good else None
else:
numeric_value = None
rows.append({
"Timestamp": item["Timestamp"],
column_name: numeric_value,
})
df = pd.DataFrame(rows)
df["Timestamp"] = pd.to_datetime(df["Timestamp"], utc=True)
df = df.set_index("Timestamp").sort_index()
return df
# Usage
df = pi_recorded_to_dataframe(session, BASE_URL, POINT_WEB_ID)
print(f"Shape: {df.shape}")
print(f"Time range: {df.index.min()} to {df.index.max()}")
print(f"NaN count: {df.isna().sum().iloc[0]} (bad quality or digital states)")
print(f"\nStatistics:")
print(df.describe())
# Multiple points into one aligned DataFrame
points = {
"Temperature": TEMP_WEB_ID,
"Pressure": PRESSURE_WEB_ID,
"Flow": FLOW_WEB_ID,
}
frames = {}
for name, wid in points.items():
frames[name] = pi_recorded_to_dataframe(
session, BASE_URL, wid, column_name=name
)[name]
combined = pd.concat(frames, axis=1)
print(combined.head())Use interpolated values for aligned DataFrames
Recorded values from different points have different timestamps (event-driven). When you need aligned rows for multi-variate analysis, machine learning, or correlation plots, use the /interpolated endpoint with the same interval for all points. This gives you perfectly aligned timestamps without needing to resample client-side.
Performance tips
| Technique | Impact | When to use |
|---|---|---|
selectedFields | Reduces response size 50-80% | Always. Request only the fields you need. |
| Interpolated instead of recorded | Predictable response size, faster for large ranges | When you do not need every raw event |
| Summary instead of raw data | Single value instead of thousands | Dashboards, KPIs, reports |
| Batch requests | One HTTP call instead of N | Reading multiple points simultaneously |
filterExpression | Server-side filtering, less data transferred | When you only need values matching a condition |
| Session reuse | Eliminates TCP/TLS/auth overhead per request | Always. Never create a new connection per request. |