- Published on
Data Access After Protocol 20: How and Why Mercury (and Zephyr) Addresses Evolving Data Needs.
- Authors
- Name
- Tommaso
- @heytdep
The recent "Working with Data on Stellar" event featuring our co-founder tdep event delved into the impact of Protocol 20 on data access within the Stellar ecosystem and the role of indexing services, with a special focus on how the Mercury indexer is designed to fit at best among other network components.
Soon the recording of the event should be shared so you will be able to replay it.
Introduction
The event kicked off with a high-level overview of Stellar Network's data management. We explored both the network's current data, as handled by Stellar-Core, and its historical data, focusing on the role of archivers.
Next, we dived into how the introduction of smart contracts in the Stellar Network created the necessity of indexing services.
Before Protocol 20
Before Protocol 20, accessing data on Stellar followed well-established patterns due to the predefined structure of data and tx effets.
For simpler applications and wallets, explorer services provided efficient retrieval of transactions, user data, and other relevant information. The predefined structure of data and linear logics flows made complex queries unnecessary. More complex applications employed custom backend solutions. For instance, if non-custodial, the backend would generate transactions for the user to sign and send, while also fetching data and storing it in separate databases. Alternatively, running your own Stellar infrastructure allowed complete control over data ingestion flow and organization.
What Changes with Protocol 20
The introduction of Protocol 20, particularly when working with smart contracts and decentralized environments, presents new challenges and needs around data access. Existing methods like block explorers and RPCs can be used, but are ofter insufficient for the deep, granular, and complex queries often required.
This is because of the added on-chain complexity, where we have contract logic instead of pre-defined logic. Furthermore, user-protocol (DeFi) interactions now happen entirely on-chain, eliminating the involvement of external backends and databases. As a result, protocols and developers who need to retrieve data efficiently with advanced querying needs can either build and maintain their own infrastructure or leverage an indexing service.
The Role of an Indexer
An ideal indexing service acts as an intermediary between applications and the network's data, fulfilling several criteria. We've talked specifically about 4 charateristics:
- Efficiency: Fast API, can also work with large amounts of data efficiently.
- Granularity and complexity: The ability to construct highly specific or complex queries, targeting precise elements/groups of the data.
- Flexibility: Customizable, flexible to work out of the box with clients.
- Cost-Effectiveness: Maintaining a light infrastructure means less costs, which also means less costs for the users.
Theoretically, the ideal indexer would employ and maintain a minimal infrastructure, storing only the data demonstrably utilized.
This aligns with findings from Horizon developers where over 90% of user access pertains to data within the past year, and were considering trimming past history to make the service more sustainable.
However, managed indexers don't know in advance which data will require the efficiency offered by indexers (lots of data can easily be obtain through explorers without for example forming a complex query). Consequently, they store all data and maintain its full history (for instance, they can also power block explorers), making minimal storage and infrastructure impractical.
Moreover, within the Stellar ecosystem, existing services like Horizon and Stellar Expert already excel at providing access to historical data such as transactions, user data, etc. This eliminates the need for indexers to replicate their functionality or store data unlikely to ever be queried through an indexer (e.g., lots classic Stellar data unless you need the functionalities of an indexer or are working both on soroban and classic).
Introducing the Mercury Indexer
Mercury emerges as an indexing service specifically designed for the Stellar ecosystem. Recognizing the strengths of existing infrastructure, Mercury avoids any overlap with its functionalities. Instead, it leverages the concept of subscriptions, empowering clients to target and store only the data segments they require. This aligns with the inherent determinism of the chain, where developers possess prior knowledge of the data they need before deployment. For instance, subscribing to specific ledger entries allows Mercury to store precisely that data, optimizing storage and query performance.
Through this design, Mercury can easily achieve maximum efficiency without uselessy overlapping with existing functionality.
Example
For example, let's assume that I want to index transfer events for my token as I need an efficient way to query these for my dapp or service. In Mercury, I can subscribe to events for my token contract that have as first topic ScVal::Symbol(ScvSymbol("transfer"))
, and from that moment start indexing all related events.
Note: you can also subscribe to all of contract's events.
The above approach however, can sometimes lead to bad user experience as, even though the developer already knows before deploying the contract that they will need to index those events, making a subscription to Mercury is not the first thing that comes to their mind. As a result, once the contract was deployed and tested on chain, the developer would need to re-emit the events to populate the Mercury database.
Solving the UX dilemma.
To prevent this inconvenience, we have introduced data retention policies targeting specific data structures. For example, we currently have a retention policy of 2 months for contract events without an associated subscription (the retention is idefinite for events with an associated subscription). This way, you can access events that were emitted up to two months before making the subscription.
This is however not a replacement to indexing historical data.
Solving the historical data problem.
Let's say that you need to index data for a project that has been active for 4 months and want to start relying on Mercury. In that case, you cannot obtain full information through data retention. Currently, there would be no way to populate the database with past data.
However, we recognize the importance of such a feature, and are planning to implement soon data catchups on request, were we allow clients to catchup their subscriptions with historical data. We have already conducted a catchup of testnet for a client, and plan to standardize the activity in the next releases.
Tackling customizability
As of now, we've explored the functionality of Mercury "classic"/"vanilla" where, while you have some degree of customizability in the susbcriptions, as any other managed indexing service you can't directly modify the ingestion flow.
However, some clients might need to have very personalized indexes with aggregations, multi-step workflows, etc.
To allow for such degree of customiation without leaving the comfort of working with a managed infrastructure, we have built and integrated in Mercury the Zephyr Virtual Machine. This allows us to power a cloud execution environment where users can safely deploy programs that define custom ingestion logic. These programs can interact with the database (read, write, update) and access ledger meta ledger by ledger.
Live Coding Session
During the live coding session part of the event we coded a Zephyr program to calculate the amount of stellar classic operations, contract operations, and the exponential moving average for classic and soroban transactions.
The index is public and can be accessed by anyone, follow the instructions here.
Currently, we are only supporting the Rust SDK, though any WASM compatible language can work with Zephyr. This is the code for the above program:
Note that this is a very early version of the SDK and we plan on improving the experience for developers.
use rs_zephyr_sdk::{log, stellar_xdr::next::{FeeBumpTransactionInnerTx, Operation, OperationBody, TransactionEnvelope, TransactionResultResult}, Condition, DatabaseDerive, DatabaseInteract, EnvClient};
use ta::{indicators::ExponentialMovingAverage, Next};
#[derive(DatabaseDerive, Clone)]
#[with_name("avgfee")]
pub struct Stats {
pub classic: i128,
pub contracts: i128,
pub other: i128,
pub fee_sor: ExponentialMovingAverage,
pub fee_clas: ExponentialMovingAverage,
}
const PERIOD: usize = 12 * 60 * 24;
// Slightly updated version for precision correctness.
// Please note that this design is not the most efficient (and it hasn't been thought through much) and can definitely be improved. This
// is the result of an on-the-fly coded program in the Stellar Event "Working with Data on Stellar, the Role of Indexers and Live-Coding a ZephyrVM Program".
#[no_mangle]
pub extern "C" fn on_close() {
let env = EnvClient::new();
let reader = env.reader();
let (contract, classic, other, avg_soroban, avg_classic) = {
let mut contract_invocations = 0;
let mut classic = 0;
let mut other_soroban = 0;
let mut tot_soroban_fee = 0;
let mut tot_classic_fee = 0;
let envelopes = reader.envelopes_with_meta();
let mut successful_envelopes = 0;
for (envelope, meta) in &envelopes {
let charged = meta.result.result.fee_charged;
let success = match meta.result.result.result {
TransactionResultResult::TxSuccess(_) => true,
TransactionResultResult::TxFeeBumpInnerSuccess(_) => true,
_ => false
};
if success {
successful_envelopes += 1;
match envelope {
TransactionEnvelope::Tx(v1) => {
count_ops_and_fees(v1.tx.operations.to_vec(), charged, &mut classic, &mut contract_invocations, &mut other_soroban, &mut tot_soroban_fee, &mut tot_classic_fee)
},
TransactionEnvelope::TxFeeBump(feebump) => {
let FeeBumpTransactionInnerTx::Tx(v1) = &feebump.tx.inner_tx;
count_ops_and_fees(v1.tx.operations.to_vec(), charged, &mut classic, &mut contract_invocations, &mut other_soroban, &mut tot_soroban_fee, &mut tot_classic_fee)
},
TransactionEnvelope::TxV0(v0) => {
count_ops_and_fees(v0.tx.operations.to_vec(), charged, &mut classic, &mut contract_invocations, &mut other_soroban, &mut tot_soroban_fee, &mut tot_classic_fee)
}
}
}
};
(contract_invocations as i128, classic as i128, other_soroban as i128, tot_soroban_fee as f64 / (contract_invocations) as f64, tot_classic_fee as f64 / (successful_envelopes - contract_invocations) as f64)
};
let current = env.read::<Stats>();
if let Some(row) = current.last() {
let mut row = row.clone();
if avg_classic.is_normal() {
row.fee_clas.next(avg_classic as f64);
};
if avg_soroban.is_normal() {
row.fee_sor.next(avg_soroban as f64);
};
let previous_classic = row.classic;
row.classic += classic;
row.contracts += contract;
row.other += other;
env.update(&row, &[Condition::ColumnEqualTo("classic".into(), bincode::serialize(&ZephyrVal::I128(previous_classic)).unwrap())]);
} else {
let mut fee_soroban = ExponentialMovingAverage::new(PERIOD).unwrap();
if avg_soroban.is_normal() {
fee_soroban.next(avg_soroban);
}
let mut fee_classic = ExponentialMovingAverage::new(PERIOD).unwrap();
if avg_classic.is_normal() {
fee_classic.next(avg_classic);
}
env.put(&Stats {
classic,
contracts: contract,
other,
fee_sor: fee_soroban,
fee_clas: fee_classic
})
}
}
fn count_ops_and_fees(ops: Vec<Operation>, txfee: i64, classic: &mut i32, contract_invocations: &mut i32, other_soroban: &mut i32, tot_soroban_fee: &mut i64, tot_classic_fee: &mut i64) {
// only 1 invokehostfn operations can be in one transaction
if let Some(op) = ops.get(0) {
if let OperationBody::InvokeHostFunction(_) = op.body {
*tot_soroban_fee += txfee;
} else {
*tot_classic_fee += txfee
}
}
for op in ops.iter() {
match op.body {
OperationBody::InvokeHostFunction(_) => {
*contract_invocations += 1;
},
OperationBody::ExtendFootprintTtl(_) => {
*other_soroban += 1;
},
OperationBody::RestoreFootprint(_) => {
*other_soroban += 1;
},
_ => {
*classic += 1;
}
}
}
}