Skip to main content
000%
BETA// LIVEPlatform in beta. Feedback welcome.Share Feedback →
Sapinover - Click to go home
ETF Intelligence · Research

The Overnight ETF Tape, Mapped

There are roughly four thousand exchange-traded products in the United States, and almost no one can tell you what any two of them actually share once you look past the name. We built the machine that can.

Every night, while the regular session is dark, exchange-traded funds keep trading on alternative trading systems. Sapinover has been collecting that overnight tape across three venues since September 2025. Separately, four times a year, every fund discloses what it holds in a regulatory filing called N-PORT. Those two datasets have never been joined in a structured, queryable way by anyone outside the issuers themselves. Joining them is the foundation of an ETF intelligence product we have spent the last several weeks building in the open.

This report is a walk through what now exists: the data, the pages, the visuals, and a plain reading of what each one shows. It closes with the roadmap for what ships next.

1,137
ETF profiles
794
Funds with holdings
27
Feature dimensions
8
Structural families

01The data nobody else joins

The product sits on two proprietary-to-assemble datasets. The first is the overnight tape: every trade printed on BlueOcean, Bruce Markets, and Moon during extended hours, captured daily and enriched with the next session's open and close so each print carries a measurable timing differential. The second is holdings. We pull the SEC's quarterly N-PORT bulk filings, match each ticker to its specific fund series inside a trust filing, and resolve real portfolios for 794 funds. Matching to the series rather than the trust is the hard part. A single filer like iShares Trust contains hundreds of funds, and a naive join hands every one of them the same holdings. We resolve to the series so IWM gets small caps and TLT gets long bonds.

From those holdings we build a 27-dimensional description of each fund: the share of assets in common equity, preferred equity, debt, short-term instruments, asset-backed securities, and derivatives; geographic exposure across fourteen countries; a Herfindahl index of how concentrated that geography is; fund characteristics including log assets, leverage factor, inverse and leveraged flags, and overnight notional; and a measure of portfolio breadth. Every feature is standardized before anything downstream touches it.

SEC N-PORT794 ETFs, holdingsOvernight ATS3-venue tape, daily27-D Feature Vectorasset mix · geographyconcentration · structurez-score standardizedWard Clusteringk=8, silhouette-tuned8 structural familiesPCAinterpretable axest-SNE · local structureLDA · max separation
The pipeline. Two datasets that have never been joined at scale become one 27-dimensional feature vector per fund, then a clustering, then three different projections for visualization.
Why this is hard to copy. The holdings data is free, but resolving it to the series level across thousands of funds is not. The overnight tape is not public at all. No competitor exposes structured N-PORT holdings alongside after-hours trading activity, because assembling either side is most of the work.

02A profile for every fund

Each of the 1,137 funds has its own page. The free portion shows the fund's identity, family, category, leverage, and a structural summary. Behind a retail subscription sits the overnight history: the full record of how that fund has traded after hours, session by session, with volume-weighted prices, timing differentials, and directional consistency.

The newest addition to every profile is the Cluster DNA card. It places the fund inside the structural map you will see in the next section, draws its eight-axis fingerprint, names its nearest structural neighbors by distance in the model's coordinate space, and links straight into the full cluster explorer pre-filtered to that fund. A reader who lands on the page for a single leveraged product can see, in one card, the other 215 funds that share its structure and how far apart they sit.

03Mapping the universe

With every fund reduced to a comparable 27-number vector, the funds that hold similar things end up near each other in that 27-dimensional space. The trouble is that nobody can see in 27 dimensions. The map below is a principal component projection: a linear compression of all 27 features down to the two directions that carry the most variation. Each dot is one of 152 funds from a representative sample, colored by the family the model assigned it.

PC1 — Portfolio Breadth vs Leveraged Structure (21.0% variance)PC2 — Geographic concentration (8.5%)LeveragedInternationalIncome / DebtUS Broad
Leveraged / InverseInternational DevelopedSecuritized BondsDerivative IncomeIndia EquityEmerging MarketsPrecious MetalsUS Equity Broad
152 ETFs projected onto the top two principal components. Leveraged structures pull hard to the left, broad equity sits center-right, income and debt funds drop to the bottom, international funds rise to the upper right. The separation is the model working.

This is not a decorative scatter. Position carries meaning, because the axes are linear combinations of the original features. The horizontal axis alone explains 21 percent of all the variation across the 27 features, and it runs from broad, diversified portfolios on one end to leveraged single-name structures on the other. The vertical axis separates funds by how geographically concentrated they are.

04Why the axes mean something

A common way to draw maps like this uses an algorithm called t-SNE, which produces attractive clusters but axes that mean nothing. We default instead to principal component analysis precisely because every position traces back to the input features. The explorer shows the loadings behind each axis, so a user can read exactly why a fund sits where it does. The first component loads most strongly on portfolio breadth and common equity at one end and on leveraged structure at the other. The second loads on geographic concentration. The third, which you can rotate into view, separates debt-heavy funds from concentrated Asia exposure.

For completeness the explorer also offers t-SNE for readers who want to see local neighborhoods, and a supervised projection called LDA that stretches the view along the directions of maximum separation between the families. The same underlying clustering shows through all three. A user who wants to test whether the families are real can switch to the supervised view and watch them pull apart.

Honest about the limits. The top three components together capture 35.5 percent of the total variation. That is the nature of genuinely high-dimensional data, and the explorer says so out loud with a scree plot rather than hiding it. A map that claimed to capture everything in two dimensions would be lying.

05Every family has a fingerprint

The clearest way to see what separates the families is to draw each one's average composition as an eight-axis radar. Below are three families that could not be more different. The leveraged group is almost all derivatives and short-term collateral with the leverage axis pushed out. The broad US group is nearly pure equity with heavy domestic weighting. The India group is fully international equity with extreme geographic concentration. Three shapes, three businesses.

EquityDebtShort-TermOtherUSIntlGeo HHILevLeveraged / Inverse216 ETFs · TQQQ, SQQQEquityDebtShort-TermOtherUSIntlGeo HHILevUS Equity Broad303 ETFs · SPY, QQQEquityDebtShort-TermOtherUSIntlGeo HHILevIndia Equity5 ETFs · INDA, INDY
Average feature fingerprint for three contrasting families. Each axis runs zero to one. The shapes are computed from real holdings, not assigned by hand.

06The shape of the whole map

Two views give the structure of the entire map at once. The first is the inter-family similarity matrix: the cosine between each pair of family centers in the standardized feature space. Values near one mean two families point the same way; values near negative one mean they are structurally opposed. The leveraged family and the broad US equity family sit at negative 0.80, about as opposed as two groups can be. That is the model confirming, in one number, that a three-times-levered Nasdaq product and a plain S&P fund are built from opposite parts.

C1C2C3C4C5C6C7C8C1 Leveraged / Invers1.00-0.52-0.09-0.06-0.16-0.44-0.33-0.80C2 International Deve-0.521.00-0.05-0.310.040.170.270.25C3 Securitized Bonds-0.09-0.051.000.04-0.01-0.03-0.050.00C4 Derivative Income-0.06-0.310.041.00-0.09-0.26-0.22-0.26C5 India Equity-0.160.04-0.01-0.091.000.180.020.07C6 Emerging Markets-0.440.17-0.03-0.260.181.000.100.27C7 Precious Metals-0.330.27-0.05-0.220.020.101.000.21C8 US Equity Broad-0.800.250.00-0.260.070.270.211.00
Cosine similarity between the eight family centers. Green is aligned, blue is opposed. The strong blue cell between the leveraged family and the US broad family is the headline.

The second view is simply how large and how well-defined each family is. Size is membership count. The marker on the right of each bar is the silhouette score, a standard measure of how cleanly a group separates from its neighbors. India and the securitized-bond group score very high because they are small and distinct. The emerging-markets group scores negative, which is the model telling the truth: that group overlaps heavily with the developed-international group and is the weakest seam in the map.

Leveraged / Inverse216σ 0.20International Developed47σ 0.24Securitized Bonds8σ 0.79Derivative Income127σ 0.41India Equity5σ 0.82Emerging Markets37σ -0.18Precious Metals15σ 0.18US Equity Broad303σ 0.30
The eight families by membership and cohesion. A negative silhouette, as on emerging markets, is surfaced rather than buried.

07What the eight families are

Leveraged and inverse (216 funds). Daily-reset products built from swaps and collateral rather than stock. TQQQ and SQQQ sit on top of each other here because the model reads structure, not direction. US equity broad (303 funds). The largest family: plain domestic equity from SPY and QQQ outward. Derivative income (127 funds).Debt-heavy and option-overlay income strategies, from TLT to the single-stock yield products. International developed (47 funds). VXUS, VEA, and the developed-markets core.

Emerging markets (37 funds) and precious metals (15 funds) are smaller satellites. Securitized bonds (8 funds) is a tight, distinct pocket of structured credit. India equity (5 funds) is the cleanest group in the entire map, five funds that share a single-country mandate and cluster with near-perfect cohesion. None of these labels were assigned by hand. The model found the groups; we read them back from the data and let a language model write each family a short, compliance-reviewed description grounded in its real fingerprint.

08Reverse lookup and comparison

The map is also a lookup tool. In the explorer, a reader can click any fund as a reference and recolor the entire universe by distance from it, with the twelve nearest structural neighbors ranked underneath. Ask for the funds most like a given product and the answer is a ranked list, not a guess. A separate comparison view takes any two funds and lays out their holdings overlap, category and geographic breakdowns, and overnight behavior side by side. The whole state of the explorer lives in the page address, so any view a user builds is a link they can send.

09What ships next

The structural map is the foundation. The next releases turn it into a cost-and-flow product and then into a service the issuers themselves will pay for.

ETF Roadmap

Next: fund economics. Ingest the annual N-CEN filings for expense ratios, authorized participants, custodians, and securities-lending activity. This adds the true cost-of-ownership layer and powers a screener that ranks funds on fee against overnight activity.

Then: the issuer desk. The top twenty-five issuers account for the overwhelming majority of overnight notional. A distribution-analytics dashboard built for their capital-markets desks turns the overnight tape into a service, with venue breakdowns and peer comparison they cannot get anywhere else.

Standing: the ETF DNA report. An automated weekly read on the overnight tape and a quarterly refresh of this structural map as each new N-PORT cycle publishes, so the families update themselves as the market changes.

Everything in this report is live today at the fund profiles, the comparison pages, and the cluster explorer. The map refreshes every quarter as new holdings publish. The overnight tape refreshes every night.

Alpha writes on market structure and the plumbing of overnight trading for Sapinover. The figures in this report are a snapshot of the live N-PORT 2026Q1 clustering run. Correspondence: alpha@sapinover.com.

Data & Method

  • SEC Form N-PORT quarterly portfolio holdings bulk data sets (2026 Q1).
  • Overnight ATS prints: BlueOcean, Bruce Markets, and Moon, collected daily since September 2025.
  • Feature standardization and Ward's minimum-variance agglomerative clustering; cluster count selected by maximizing the silhouette score over k in [6, 20].
  • Projections: principal component analysis (primary, interpretable), t-SNE (local structure), and linear discriminant analysis (supervised separation).
  • Family descriptions generated by a language model from each cluster's real fingerprint and loadings, then reviewed against FINRA communication standards.