This is an HDF5 dataset containing Visium HD Spatial Gene Expression data stored in a manner to support easy and efficient fetching of 2 µm resolution "image" slices for a single gene or multiple genes. Data can be binned to arbitrary scales or plotted against the microscope tissue image.
An HDF5 dataset is often composed of groups, which are made of other groups or datasets. Datasets are raw byte arrays stored in binary compressed manner. Additionally, each group or dataset can have attributes associated with them.
There are five groups that make an hd_feature_slice.h5 dataset:
$ h5ls hd_feature_slice.h5
feature_slices Group
features Group
images Group
masks Group
umis Group
Each group is documented below.
This group contains datasets related to the feature (gene) names and ids in the reference transcriptome. It is identical to the features
group stored in the HDF5 raw/filtered feature-barcode matrix output from the pipeline.
$ h5ls hd_feature_slice.h5/features
_all_tag_keys Dataset {1}
feature_type Dataset {32285}
genome Dataset {32285}
id Dataset {32285}
name Dataset {32285}
target_sets Group
$ h5ls hd_feature_slice.h5/features/target_sets
Visium\ Mouse\ Transcriptome\ Probe\ Set\ v2.0 Dataset
This group contains a group for the index of each feature (gene) listed in hd_feature_slice.h5/features/{id}
where {id}
is the index of the feature in the /features group. Only features that have at least one total UMI are stored here, i.e., if a specific feature is missing, that feature had no UMIs observed in this sample.
$ h5ls hd_feature_slice.h5/feature_slices
0 Group
10 Group
100 Group
1000 Group
10007 Group
.
.
999 Group
9990 Group
9991 Group
Each gene specific group contains the matrix row, col and data that compose the slice of the gene expression data for that gene.
$ h5ls hd_feature_slice.h5/feature_slices/0
col Dataset {48/Inf}
data Dataset {48/Inf}
row Dataset {48/Inf}
This group contains the total UMI spatial matrix organized similar to the per-gene group (hd_feature_slice.h5/feature_slices/index).
$ h5ls hd_feature_slice.h5/umis/total
col Dataset {5451603/Inf}
data Dataset {5451603/Inf}
row Dataset {5451603/Inf}
This group contains the grayscale microscope and CytAssist images projected onto the grid of 2 µm squares.
$ h5ls hd_feature_slice.h5/images/microscope
col Dataset {11785262/Inf}
data Dataset {11785262/Inf}
row Dataset {11785262/Inf}
This group contains a binary image mask marking bins under tissue. There are four different groups:
$ h5ls hd_feature_slice.h5/masks
filtered Group
square_008um Group
square_020um Group
square_050um Group
The filtered group corresponds to the raw resolution bins (square_002um).
Each of the groups again stores a matrix for the mask:
$ h5ls hd_feature_slice.h5/masks/square_008um
col Dataset {391917/Inf}
data Dataset {391917/Inf}
row Dataset {391917/Inf}
Here is some example python code for how to bin a feature slice.
import h5py as h5
import numpy as np
ROW_DATASET_NAME = "row"
COL_DATASET_NAME = "col"
DATA_DATASET_NAME = "data"
METADATA_JSON_ATTR_NAME = "metadata_json"
UMIS_GROUP_NAME = "umis"
TOTAL_UMIS_GROUP_NAME = "total"
class CooMatrix:
row: list[int]
col: list[int]
data: list[int | float]
@classmethod
def from_hdf5(cls, group):
return cls(
row=group[ROW_DATASET_NAME][:],
col=group[COL_DATASET_NAME][:],
data=group[DATA_DATASET_NAME][:],
)
def to_ndarray(self, nrows, ncols, binning_scale = 1):
"""Convert the COO matrix representation to a dense ndarray at the specified binning scale."""
ncols_binned = int(np.ceil(ncols / binning_scale))
nrows_binned = int(np.ceil(nrows / binning_scale))
result = np.zeros((nrows_binned, ncols_binned), dtype="int32")
for row, col, data in zip(self.row, self.col, self.data):
result[row // binning_scale, col // binning_scale] += data
return result
# Load total UMIs at 8um bin size
with h5.File("hd_feature_slide.h5", "r") as h5_file:
metadata = json.loads(h5_file.attrs[METADATA_JSON_ATTR_NAME])
umis_8um = CooMatrix.from_hdf5(h5_file[UMIS_GROUP_NAME][TOTAL_UMIS_GROUP_NAME]).to_ndarray(
nrows=metadata["nrows"], ncols=metadata["ncols"], binning_scale=4
)
Each hd_feature_slice.h5 file also contains metadata in the following format that may be useful for example to translate between the original microscope image and the barcoded array space, or to obtain the full dimensions of the barcoded array:
$ h5dump -a /metadata_json hd_feature_slice.h5
HDF5 "hd_feature_slice.h5" {
ATTRIBUTE "metadata_json" {
DATATYPE H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_UTF8;
CTYPE H5T_C_S1;
}
DATASPACE SCALAR
DATA { (content reformatted below for readability)
(0): {
"sample_id": "10001_10002",
"sample_desc": "SJ0118_SJ118-7-D_100_100_10_Brain_5_mouse_Brain_D",
"slide_name": "visium_hd_rc1",
"nrows": 3350,
"ncols": 3350,
"spot_pitch": 2.0,
"hd_layout_json": {
"slide_uid": "UNKNOWN",
"file_format": "n/a",
"aligner_version": "n/a",
"input_hash": "n/a",
"slide_design": "n/a",
"transform": [
1.0,
0.0,
0.0,
0.0,
1.0,
0.0,
0.0,
0.0,
1.0
]
},
"transform_matrices": {
"spot_colrow_to_microscope_colrow": [
[
0.041985535298565406,
-7.298152402766004,
25585.255831045426
],
[
7.298152402766004,
0.041985535298565406,
2956.752264439149
],
[
0.0,
0.0,
1.0
]
],
"microscope_colrow_to_spot_colrow": [
[
0.000788241806459152,
0.13701644608939972,
-425.29105551521997
],
[
-0.13701644608939972,
0.000788241806459152,
3503.2701905117615
],
[
0.0,
0.0,
1.0
]
],
"spot_colrow_to_cytassist_colrow": null,
"cytassist_colrow_to_spot_colrow": null
}
}
}
}
}