Support homeXenium Onboard AnalysisAdvanced
Viewing Output Files with Example Code

Viewing Output Files with Example Code

On this page, we provide some example code from a few commonly used software packages to help you start exploring several Xenium Onboard Analysis output file formats. Read this Analysis Guide for an introduction to a variety of analysis tools for data exploration and downstream analysis.

10x Genomics does not provide support for community-developed tools and makes no guarantees regarding their function or performance. Please contact tool developers with any questions.

Multi-file OME-TIFF images can be viewed with the commonly used Python package, Tifffile. However, it does not open downsampled multi-file images by default. Downsampled images may be useful for converting to other image file types or for viewing the image if the full resolution is too large. The code snippet below illustrates how to open either full or downsampled resolution images.

To open all of the multi-tissue stain images together, all four morphology_focus_xxxx.ome.tif files must be present in the morphology_focus/ directory.

# Import Python libraries # This code uses python v3.12.0, tifffile v2023.9.26, matplotlib v3.8.2 import tifffile import matplotlib.pyplot as plt # Option 1: Load full resolution image channels # The following may produce a warning: 'OME series cannot read multi-file pyramids'. This is because tifffile does not support loading a pyramidal multi-file OME-TIFF file. Only the full resolution (level=0) data will load for all channels in the directory. fullres_multich_img = tifffile.imread( "morphology_focus/morphology_focus_0000.ome.tif", is_ome=True, level=0, aszarr=False) # Examine shape of array (number of channels, height, width), e.g. (4, 40867, 31318) fullres_multich_img.shape # Extract number of channels, e.g. 4 n_ch = fullres_multich_img.shape[0] # Plot each channel fig, axes = plt.subplots(ncols=n_ch, nrows=1, squeeze=False) for i in range(n_ch): axes[0, i].imshow(fullres_multich_img[i], cmap="gray") axes[0, i].set_title(f"Channel: {i}") plt.savefig('tifffile_fullres_four_channels.png') # Option 2: Load a single channel image at any resolution, e.g., level=0 or level=1. Note 'is_ome' is set to False. # Load one of the multi-file OME-TIFF files as a regular TIFF file at full resolution. fullres_img_tiff = tifffile.imread( "morphology_focus/morphology_focus_0000.ome.tif", is_ome=False, level=0) # Now load the file at downsampled resolution downsampled_img = tifffile.imread( "morphology_focus/morphology_focus_0000.ome.tif", is_ome=False, level=1) # Plot the full resolution and downsampled images side-by-side fig, axes = plt.subplots(ncols=2, nrows=1, squeeze=False) axes[0, 0].imshow(fullres_img_tiff, cmap="gray") axes[0, 0].set_title(f"Full resolution: {fullres_img_tiff.shape}") axes[0, 1].imshow(downsampled_img, cmap="gray") axes[0, 1].set_title(f"Downsampled: {downsampled_img.shape}") plt.savefig('example_fullres_downsample.png')

Some Xenium Onboard Analysis output files are provided in the Parquet file format to enable faster loading and reading of data. The code below can be used to explore any of the XOA cell Parquet files (cell_boundaries.parquet, cells.parquet, and nucleus_boundaries.parquet) or transcript Parquet file (transcripts.parquet).

We highly recommend working with the Parquet file format itself. However, if the CSV format is needed, the examples below demonstrate how to use either Python or R packages to view and convert the transcripts.parquet file to CSV format. For larger datasets, you may encounter memory limits for reading and converting the data to CSV. In these situations, we suggest writing the CSV file in chunks.

The example for Python uses dask. See the documentation for installation and usage guidance.

# Import Python libraries # Examples tested with python v3.12.0, dask v2024.5.2 import dask.dataframe as dd # Read in the parquet file, edit path to where parquet file saved df = dd.read_parquet('path/to/your/transcripts.parquet') # Print information about the data frame df.info() # Print first 5 rows of the dask data frame df.head() # Optional: convert parquet data frame to CSV, edit path and output name for new file # dask automatically figures out chunk size df.to_csv('path/to/your/transcripts.csv', index=False)

The example for R uses the arrow package. See the documentation for installation and usage guidance.

A note on installation:

  • For Windows or Linux, install in R with install.packages('arrow')
  • The arrow v16.1.0 package for MacOS on CRAN is not feature complete as of this writing (June 2024), please use conda to install: conda create -n r -c conda-forge r-base r-arrow
# For Windows or Linux, install arrow with this command: # install.packages('arrow') # Import R package library(arrow) # Path to your parquet file, edit path to where parquet file saved PATH <- 'transcripts.parquet' # Edit path and output name for new file OUTPUT <- gsub('\\.parquet$', '.csv', PATH) # Specify chunk size CHUNK_SIZE <- 1e6 # Read in the parquet file parquet_file <- arrow::read_parquet(PATH, as_data_frame = FALSE) start <- 0 # Optional: convert parquet data frame to CSV while(start < parquet_file$num_rows) { end <- min(start + CHUNK_SIZE, parquet_file$num_rows) chunk <- as.data.frame(parquet_file$Slice(start, end - start)) data.table::fwrite(chunk, OUTPUT, append = start != 0) start <- end } if(require('R.utils', quietly = TRUE)) { R.utils::gzip(OUTPUT) }

Python or other tools can be used to parse the gene panel JSON file. Here is example Python code to extract gene name, Ensembl ID, and gene coverage information for each gene target in a given panel:

# Import Python libraries # Example with python v3.12, pandas v2.1.1 import json import pandas as pd # Open JSON file f = open('gene_panel.json') # Edit file name here # Return JSON object as a dictionary data = json.load(f) # Create lists to store extracted information gene = [] ensembl = [] cov = [] # Iterate through the JSON list to extract information for i in data['payload']['targets']: if (i['type']['descriptor'] == "gene"): # Only collect info for genes, not controls gene_name = i['type']['data']['name'] ensembl_id = i['type']['data']['id'] coverage = str(i['info']['gene_coverage']) gene.append(gene_name) ensembl.append(ensembl_id) cov.append(coverage) # Create output CSV file out_df = pd.DataFrame(list(zip(gene, ensembl, cov)), columns=['Gene name', 'Ensembl ID', 'Gene coverage']) out_df.to_csv('my_panel_gene_info.csv', index=False) # Close file f.close()

The zarr Python library documentation and tutorials are available here. These code snippets below show how to read a Zarr array into numpy N-dimensional arrays. This code can be used to explore any of the XOA Zarr output files: cells.zarr.zip, analysis.zarr.zip, cell_feature_matrix.zarr.zip, and transcripts.zarr.zip.

In this first example, we read in the cells.zarr.zip file and take a look at the file's structure:

# Import Python libraries # This script was tested with zarr v2.13.6 import zarr import numpy as np # Function to open a Zarr file def open_zarr(path: str) -> zarr.Group: store = (zarr.ZipStore(path, mode="r") if path.endswith(".zip") else zarr.DirectoryStore(path) ) return zarr.group(store=store) # For example, use the above function to open the cells Zarr file, which contains segmentation mask Zarr arrays root = open_zarr("cells.zarr.zip") # Look at group array info and structure root.info root.tree() # shows structure, array dimensions, data types # Create cell and nucleus segmentation mask np array objects to read or modify cellseg_mask = np.array(root["masks"][1]) nucseg_mask = np.array(root["masks"][0]) # Show dimensions of the 2D segmentation mask arrays (also shown in .tree()) # .ndim() shows number of dimensions # The shape should match the number of pixels in the morphology image. cellseg_mask.shape nucseg_mask.shape # Show max value of cells in the masks (value=0 are background pixels) # The .max() method counts all the values that are not 0, which should equal # the total cells detected in the dataset (reported in e.g., analysis_summary.html # summary tab metric). cellseg_mask.max() nucseg_mask.max() # Examples for exploring file contents # How to show array root["masks"][0][0:9] # or root["masks/0"] root["cell_summary"][0:9] # How to show attribute values root.attrs["major_version"] root.attrs["segmentation_methods"] # How to list out attribute names and values dict(root.attrs.items()) dict(root['cell_summary'].attrs.items())

Using the same Python function as above to read in the file, here are a few example lines to view the analysis.zarr.zip and transcripts.zarr.zip arrays and attributes:

# Read in secondary analysis Zarr arrays root = open_zarr("analysis.zarr.zip") # Examples for exploring file contents # How to show a slice of the clustering_index arrays root["cell_groups"][0]["indices"][0:9] # How to show attributes root["cell_groups"].attrs["group_names"] # Read in transcripts Zarr arrays root = open_zarr("transcripts.zarr.zip") # Examples for exploring file contents # How to show array info root['grids'][0]['0,0']['gene_identity'].shape root['grids'][0]['0,0']['quality_score'][0:9] root['grids'][0]['0,0']['location'][0:9,] # How to show array attributes root.attrs['major_version'] root['density']['gene'].attrs['gene_names'][0:9]