What is the difference between recorded and interpolated values in PI Web API?

Recorded values are the actual compressed events stored in the PI Data Archive -- they have irregular timestamps based on when values changed significantly. Interpolated values are calculated at regular intervals you specify (e.g., every 5 minutes) and are ideal for analysis and charting.

How does PI compression affect the data I get from PI Web API?

PI Data Archive uses exception-based compression, so recorded values are NOT raw sensor readings. Only values that changed beyond the compression deviation threshold are stored. A slowly-changing point may have 20 values per hour while a volatile one has 500.

How do I handle digital state values in PI Web API with Python?

Digital states come back as JSON objects like {Name: 'Active', Value: 1} instead of numbers. Check if the Value field is a dict -- if so, it is a digital state. Use isinstance(value, dict) to detect them and handle them separately from numeric values.

Control panel with data streams flowing from industrial equipment to monitoring screens

How-to Guide

Reading Values

PI Web API provides four ways to read data: current snapshot, recorded history, interpolated values at regular intervals, and statistical summaries. This guide covers each method in depth, including compression behavior, quality flags, digital states, and performance optimization.

Read types at a glance

Type	Endpoint	Use case	Returns
Current value	`/streams/{webId}/value`	Latest snapshot	Single value
Recorded	`/streams/{webId}/recorded`	Actual stored events (compression-filtered)	Variable count
Interpolated	`/streams/{webId}/interpolated`	Evenly-spaced values (charts, ML, exports)	Fixed count
Summary	`/streams/{webId}/summary`	Statistics (min, max, avg, count)	One per summary type

Understanding PI compression

Before reading values, you must understand how PI stores data. PI Data Archive uses exception-based compression: raw sensor readings are filtered, and only values that change significantly are stored. This means:

Recorded values are not raw sensor readings. They are the values that passed the compression filter. A slowly-changing point might store 20 values per hour; a volatile one might store 500.
Timestamps are not evenly spaced. Recorded values have irregular timestamps because they are event-driven.
Use interpolated values when you need regular intervals. The/interpolated endpoint calculates values at exact timestamps using the surrounding recorded values.

Why this matters

Many developers new to PI assume "recorded" means "raw" and are surprised when they get fewer values than expected. If you need every raw sensor reading, ask your PI administrator about the point's exception deviation and compression deviation settings. These parameters control how aggressively data is filtered.

Current value

Returns the most recent value for a PI point. This is the value currently displayed in PI ProcessBook, PI Vision, and other PI clients.

current_value.pypython

POINT_WEB_ID = "your-web-id"

response = session.get(
    f"{BASE_URL}/streams/{POINT_WEB_ID}/value",
    params={
        "selectedFields": "Timestamp;Value;Good;UnitsAbbreviation",
    },
)
data = response.json()

print(f"Value:     {data['Value']}")
print(f"Timestamp: {data['Timestamp']}")
print(f"Good:      {data['Good']}")
print(f"Units:     {data.get('UnitsAbbreviation', 'N/A')}")

The selectedFields parameter

Use selectedFields to request only the fields you need. This reduces response size and improves performance, especially for batch reads of hundreds of points. Separate field names with semicolons. The default response includes many fields you may not need (Substituted, Annotated, Questionable, etc.).

Handling digital state values

Not all PI points store numeric values. Digital state points (e.g., pump on/off, valve open/closed, sensor status) return an object instead of a number in the Value field. This breaks code that assumes Value is always a float.

handle_digital_states.pypython

response = session.get(f"{BASE_URL}/streams/{WEB_ID}/value")
data = response.json()

value = data["Value"]

# Check if the value is a digital state (dict) or numeric (int/float)
if isinstance(value, dict):
    # Digital state: {"Name": "Active", "Value": 1}
    state_name = value.get("Name", "Unknown")
    state_value = value.get("Value", None)
    print(f"Digital state: {state_name} (code: {state_value})")
elif isinstance(value, (int, float)):
    print(f"Numeric value: {value}")
elif isinstance(value, str):
    # String point (rare, but possible)
    print(f"String value: {value}")
else:
    print(f"Unexpected value type: {type(value)} = {value}")

# Helper function for safe value extraction
def extract_numeric_value(pi_value):
    """Extract a numeric value from a PI Web API value response.

    Returns None for digital states and bad quality values.
    """
    value = pi_value.get("Value")
    if not pi_value.get("Good", True):
        return None
    if isinstance(value, dict):
        return None  # Digital state
    if isinstance(value, (int, float)):
        return float(value)
    return None

Digital states break naive code

If you do float(data["Value"]) on a digital state value, you get a TypeError. Always check the type of the Value field before processing. This is the single most common runtime error in PI Web API integrations.

Recorded values

Returns the actual stored events within a time range. The number of values depends on compression settings and how often the value changed significantly.

recorded_values.pypython

response = session.get(
    f"{BASE_URL}/streams/{POINT_WEB_ID}/recorded",
    params={
        "startTime": "*-1d",       # 1 day ago
        "endTime": "*",            # Now
        "maxCount": 1000,          # Limit results
        "selectedFields": "Items.Timestamp;Items.Value;Items.Good",
    },
)

items = response.json()["Items"]
print(f"Retrieved {len(items)} recorded values")

# Warning: if len(items) == maxCount, data may be truncated
if len(items) == 1000:
    print("Results may be truncated! Use pagination for complete data.")

Boundary type

The boundaryType parameter controls what happens at the start and end of your time range. This affects data accuracy, especially for calculations.

boundaryType	Behavior
`Inside` (default)	Returns only values with timestamps strictly within the range. You may miss the value at exactly `startTime`.
`Outside`	Includes the value immediately before `startTime` and after `endTime`. Useful when you need context around the boundaries.
`Interpolated`	Adds interpolated values at exactly `startTime` and `endTime`. Best for accurate calculations over a precise time window.

boundary_type_example.pypython

# Use Interpolated boundary for accurate time-window calculations
response = session.get(
    f"{BASE_URL}/streams/{POINT_WEB_ID}/recorded",
    params={
        "startTime": "2026-03-15T08:00:00Z",
        "endTime": "2026-03-15T16:00:00Z",
        "boundaryType": "Interpolated",
        "maxCount": 10000,
    },
)
# Now the first value is interpolated AT 08:00:00 and the last AT 16:00:00

Server-side filtering

Use filterExpression to filter values on the server before they are sent to you. This is much more efficient than downloading all values and filtering in Python.

filter_expression.pypython

# Only return values above 50
response = session.get(
    f"{BASE_URL}/streams/{POINT_WEB_ID}/recorded",
    params={
        "startTime": "*-1d",
        "endTime": "*",
        "filterExpression": "'.' > 50",   # '.' refers to the current value
        "maxCount": 10000,
    },
)

# Only return good quality values
response = session.get(
    f"{BASE_URL}/streams/{POINT_WEB_ID}/recorded",
    params={
        "startTime": "*-1d",
        "endTime": "*",
        "filterExpression": "IsGood('.')",
        "maxCount": 10000,
    },
)

Interpolated values

Returns values at evenly-spaced intervals. The PI server calculates each value by interpolating between the surrounding recorded values. Use this when you need consistent time spacing for charts, machine learning features, or data exports.

interpolated_values.pypython

response = session.get(
    f"{BASE_URL}/streams/{POINT_WEB_ID}/interpolated",
    params={
        "startTime": "*-1d",
        "endTime": "*",
        "interval": "1h",   # One value per hour = 24 values for 1 day
        "selectedFields": "Items.Timestamp;Items.Value;Items.Good",
    },
)

items = response.json()["Items"]
for item in items:
    print(f"{item['Timestamp']}: {item['Value']}")

Interpolation does not work for all point types

Interpolation only works for numeric points (Float16, Float32, Float64, Int16, Int32). For digital state points and string points, the interpolated endpoint returns the last recorded value at each interval (step interpolation), which may not be what you expect. If you need digital state values at regular intervals, consider using recorded values and resampling client-side.

Aligning multiple streams

When reading interpolated values from multiple points, use the same startTime, endTime, and interval to ensure timestamps are aligned. You can also use syncTime and syncTimeBoundaryType for precise alignment.

aligned_multi_stream.pypython

# Read aligned interpolated values for multiple points
params = {
    "startTime": "t",        # Today at midnight
    "endTime": "*",          # Now
    "interval": "5m",        # 5-minute intervals
}

temp_items = session.get(
    f"{BASE_URL}/streams/{TEMP_WEB_ID}/interpolated", params=params
).json()["Items"]

pressure_items = session.get(
    f"{BASE_URL}/streams/{PRESSURE_WEB_ID}/interpolated", params=params
).json()["Items"]

# Because we used the same params, timestamps are aligned
# and we can zip them together
for temp, pressure in zip(temp_items, pressure_items):
    # temp["Timestamp"] == pressure["Timestamp"]
    print(f"{temp['Timestamp']}: T={temp['Value']:.1f} P={pressure['Value']:.2f}")

Performance: interpolated vs recorded

Interpolated queries are typically much faster than recorded queries for large time ranges because the server returns a predictable number of values. Querying 30 days of recorded values might return 500,000 events; the same range with interval=1h returns exactly 720 values. Use interpolated values whenever you do not need every raw event.

Summary values

Returns statistical summaries over a time range. Useful for dashboards, reports, and KPI calculations.

summary_values.pypython

response = session.get(
    f"{BASE_URL}/streams/{POINT_WEB_ID}/summary",
    params={
        "startTime": "*-1d",
        "endTime": "*",
        "summaryType": "Average,Minimum,Maximum,StdDev,Count,PercentGood,Range",
        "calculationBasis": "TimeWeighted",
    },
)

summaries = response.json()["Items"]
for summary in summaries:
    stype = summary["Type"]
    value = summary["Value"]["Value"]
    good = summary["Value"]["Good"]
    print(f"{stype}: {value} (Good: {good})")

Available summary types

Summary type	Description
`Average`	Time-weighted or event-weighted average
`Minimum`	Minimum value in the range
`Maximum`	Maximum value in the range
`Total`	Sum of values (time-weighted integral)
`Count`	Number of recorded values in the range
`StdDev`	Standard deviation
`Range`	Maximum minus minimum
`PercentGood`	Percentage of time the value had good quality
`All`	Returns all summary types at once

Calculation basis

Basis	Description
`TimeWeighted`	Weights each value by how long it was held. A value that lasted 8 hours contributes more to the average than one that lasted 5 seconds. This is the correct choice for most process data.
`EventWeighted`	Each recorded event has equal weight regardless of duration. Use this for discrete events (batch counts, alarm counts) rather than continuous process values.

Loading into pandas

A production-grade pattern for loading PI data into pandas, with proper handling of quality flags, digital states, and timezones.

to_pandas_production.pypython

import pandas as pd

def pi_recorded_to_dataframe(
    session, base_url, web_id, start="*-7d", end="*",
    max_count=10000, column_name="Value"
):
    """Load PI recorded values into a pandas DataFrame.

    Handles:
    - Timezone-aware timestamps (UTC)
    - Digital state filtering (non-numeric values become NaN)
    - Quality flag filtering (bad values become NaN)
    - Proper DatetimeIndex for time-series analysis
    """
    response = session.get(
        f"{base_url}/streams/{web_id}/recorded",
        params={
            "startTime": start,
            "endTime": end,
            "maxCount": max_count,
            "selectedFields": "Items.Timestamp;Items.Value;Items.Good",
        },
    )
    response.raise_for_status()
    items = response.json().get("Items", [])

    if not items:
        return pd.DataFrame(columns=[column_name])

    # Build DataFrame
    rows = []
    for item in items:
        value = item["Value"]
        good = item.get("Good", True)

        # Handle digital states (value is a dict like {"Name": "Active", "Value": 1})
        if isinstance(value, dict):
            numeric_value = None  # or value.get("Value") if you want the integer code
        elif isinstance(value, (int, float)):
            numeric_value = float(value) if good else None
        else:
            numeric_value = None

        rows.append({
            "Timestamp": item["Timestamp"],
            column_name: numeric_value,
        })

    df = pd.DataFrame(rows)
    df["Timestamp"] = pd.to_datetime(df["Timestamp"], utc=True)
    df = df.set_index("Timestamp").sort_index()

    return df


# Usage
df = pi_recorded_to_dataframe(session, BASE_URL, POINT_WEB_ID)

print(f"Shape: {df.shape}")
print(f"Time range: {df.index.min()} to {df.index.max()}")
print(f"NaN count: {df.isna().sum().iloc[0]} (bad quality or digital states)")
print(f"\nStatistics:")
print(df.describe())

# Multiple points into one aligned DataFrame
points = {
    "Temperature": TEMP_WEB_ID,
    "Pressure": PRESSURE_WEB_ID,
    "Flow": FLOW_WEB_ID,
}

frames = {}
for name, wid in points.items():
    frames[name] = pi_recorded_to_dataframe(
        session, BASE_URL, wid, column_name=name
    )[name]

combined = pd.concat(frames, axis=1)
print(combined.head())

Use interpolated values for aligned DataFrames

Recorded values from different points have different timestamps (event-driven). When you need aligned rows for multi-variate analysis, machine learning, or correlation plots, use the /interpolated endpoint with the same interval for all points. This gives you perfectly aligned timestamps without needing to resample client-side.

Performance tips

Technique	Impact	When to use
`selectedFields`	Reduces response size 50-80%	Always. Request only the fields you need.
Interpolated instead of recorded	Predictable response size, faster for large ranges	When you do not need every raw event
Summary instead of raw data	Single value instead of thousands	Dashboards, KPIs, reports
Batch requests	One HTTP call instead of N	Reading multiple points simultaneously
`filterExpression`	Server-side filtering, less data transferred	When you only need values matching a condition
Session reuse	Eliminates TCP/TLS/auth overhead per request	Always. Never create a new connection per request.

Need help?

Ask PiChat

Get AI-assisted answers to PI Web API data reading questions.

Data Pipeline Sprint

Service

Let PiSharp build your PI data extraction pipeline in a focused two-week sprint.