Background
The BrainVision Core Data Format
BrainVision is a widely used file format for electrophysiological time-series data (EEG, EMG, EOG, and similar biosignals). A complete recording consists of three files that share the same base name:
| File | Extension | Contents |
|---|---|---|
| Header file | .vhdr | INI-style text file; describes channel count, sampling rate, binary format, and links to the other two files |
| Data file | .eeg | Raw binary samples (INT_16 or IEEE_FLOAT_32) |
| Marker file | .vmrk | INI-style text file; contains a timestamped event list |
The data file stores samples in one of two orientations:
- MULTIPLEXED — samples are interleaved across channels:
ch1[t0], ch2[t0], …, ch1[t1], ch2[t1], … - VECTORIZED — all samples for each channel are contiguous:
ch1[t0], ch1[t1], …, ch2[t0], ch2[t1], …
Each channel has an independent resolution scale factor stored in the header. The physical value (in the channel's unit, e.g. µV) is obtained by multiplying the raw integer sample by this factor.
The marker file records discrete events such as stimulus presentations, responses, and recording-segment boundaries. Each marker has a type, a description, a sample-position, an optional channel index, and an optional date/time stamp.
For the full format specification see the BrainVision Core Data Format documentation.
The Onda Format
Onda is an open, Arrow-based format for multi-channel LPCM data, including biosignal data. It separates signal metadata from the raw binary samples, making it easy to work with large datasets without loading all data into memory.
The two main Onda data structures are:
SignalV2 — a row in an Arrow table that describes one group of channels from a single recording. Key fields include:
| Field | Meaning |
|---|---|
recording | UUID identifying the recording |
channels | ordered list of channel names |
sample_unit | physical unit as a lowercase snake_case string (e.g. "microvolt") |
sample_resolution_in_unit | scale factor from raw integer to physical unit |
sample_type | Julia numeric type as a string ("int16" or "float32") |
sample_rate | samples per second |
span | half-open [start, stop) time interval in nanoseconds |
file_path | path to the binary data file |
file_format | string describing the binary layout |
onda.annotation@1 — a row in an Arrow table representing a discrete event associated with a recording. Required columns are recording (UUID), id (UUID), and span (TimeSpan).
Use Onda.load(signal) to load the sample data described by a SignalV2 into memory as a SampleV2 (a channels × samples matrix paired with its signal descriptor).
Package Structure and API Layers
OndaVision exposes three layers of abstraction, from lowest to highest level:
| Layer | Functions | Use when |
|---|---|---|
| Raw parsers | read_vhdr, read_vmrk, read_brainvision | You need raw Dict or Matrix access to BrainVision data, or you are building a custom pipeline |
| Mid-level converters | brainvision_to_signal, brainvision_annotations | You need Onda objects but want control over how the files are read |
| High-level integrated | read_brainvision_onda, write_brainvision | Standard BrainVision ↔ Onda round-trip |
Most users should start with read_brainvision_onda and write_brainvision.
Supported BrainVision Format Combinations
| Feature | Read | Write |
|---|---|---|
DataFormat: BINARY | ✓ | ✓ |
BinaryFormat: INT_16 | ✓ | ✓ |
BinaryFormat: IEEE_FLOAT_32 | ✓ | ✓ |
DataOrientation: MULTIPLEXED | ✓ | ✓ |
DataOrientation: VECTORIZED | ✓ | converted to MULTIPLEXED on write |
DataType: TIMEDOMAIN | ✓ | ✓ |
| Character encoding: UTF-8 | ✓ | ✓ |
| Character encoding: Latin-1 | ✓ | — (always written as UTF-8) |