We wanted to share our thought process behind how we wanted to setup the Cardano Welfare Stake Pool Infrastructure (CWSP). This will give readers and our delegators confidence on the infrastructure that we run, the security and scalability of it, the energy impact , as well as an idea of the effort it took to implement, and day to day operations and monitoring of the infrastructure.
CWSP infrastructure is split in three different categories :
a. The Mainnet - Three Virtual compute resources running in Azure Cloud DataCenters which is our production environment. One Block Producer and one Relay node in Australian East DC, Second Relay server in Singapore DC.
b. The Testnet - Two Virtual compute resources running in Google Cloud Platform which is our test environment. One Block Producer and one Relay node in Australia East DC.
c. The Air-gapped systems - Apple Macbook Pro reserved to do all offline based activities that a pool operator needs to perform occasionally. All keys and configuration files are stored offline in two USB-C drives ,in two different location (Sydney and Melbourne).
The key driving factors behind the CWSP architecture decision process were enterprise level scalability, security, availability and energy efficiency . We will elaborate on each these key factors at high level below.
Scalability : Initially, we thought of running a Intel Server as ESXi host, running multiple VMs, or even two Servers with VMs in two different cities. This setup found to be non energy efficient and ease of scalability bottleneck . We won't be able to run Relay servers in multiple Geo, Server maintenance & uptime responsibility when we need to go for vacation and so on. As we know, that Cardano software process which is built upon "Proof of Stake" (POS) concept, it doesn't require high compute power, in compare to "Proof of Work" concept driven infrastructure (example: Bitcoin, Etherum) . Therefore, the preferred choice was Virtual servers in Cloud DataCenter.
There is no doubt that cloud brings us an entire new set of value propositions for enterprise computing environments, offering a huge set of benefits like application scalability, operational flexibility, improved economies of scale, reduced costs, resource efficiencies, agility improvement and more. It is harder to find all these benefits when considering just the traditional computing model.
Security: If we had to host the services from Servers running from our own residence, as few stake pool operators do, with or without redundancy, the physical security aspect could be just limited to CCTV cameras and/or 24/7 alarm monitoring. When we host our production service in Azure, we are ensuring Cardano and our delegators that, both mainnet and testnet infrastructure are secured from potential physical harm at different levels. Access request and control, facility perimeter monitoring, building entrance procedure and background checks of individuals, 2-FA authentication and biometric access control inside the building in compliance with ISO 27001, HIPAA, FedRAMP, SOC 1, and SOC 2.
Cloud datacenters such as Azure provides infrastructure layer security, such as data-link layer encryption by default, 2FA authentication , private IP network, cross-region secure private network link, etc.
Availability : Uptime is vital for our Core and Relay nodes , we put our stakepool in as mission-critical applications category. A reliable cloud partner provides SLA, uptime assurance and have Level 1-3 support services working 24/7. Our Mainnet virtual machines in Azure Availability Zones ensures high availability for our comprehensive business continuity and disaster recovery strategy with built-in security and scalable , high performance architecture.
We also run hourly backup of the Core node and nightly backup of relay nodes into a different Cloud datacenter for orchestrated Disaster recovery or to meet any adhoc restore requirement. For our core node, we try to maintain RPO (Recovery point objective) of 1 hour and RTO (Recovery Time Objective) of 1 hour or less.
Energy Efficiency : When it comes to "Proof of Stake" concept, it is all about being energy efficiency, in compare to "Proof of Work" concept. We have done research on which cloud service provider utilised green energy , full or partial and is the most energy efficient out there in the market. A wired report among various others, brings Google Cloud Platform and Microsoft Azure Cloud services to the forefront. The efficiency of a datacenter infrastructure comes from how smart and energy efficient your infrastructure design is, like choice of lights and cooling systems being used, energy source , efficiency of storage devices and servers, etc.
Wired Rating for Google Cloud Platform: We chose GCP to run our stake pool testnet environment
- Overall Greenness: B+
- Energy Efficiency: A+
- Transparency: A
- Technological Innovation: A
- Total Renewable Energy Portfolio: 5.5 GW
WIRED’s Rating for Microsoft Azure : We chose Azure to run our stake pool mainnet environment. GCP being new in ANZ region, we had some technical challenges to run our stake pool the way we wanted to. Azure have cost benefit over GCP which currently works in our favor. We have the intention to move the mainnet services to GCP in 1-2 years time.
- Overall Greenness: B
- Energy Efficiency: A
- Transparency: A
- Technological Innovation: A+
- Total Renewable Energy Portfolio: 1.9 GW
Wired Rating for Amazon Web Services:
- Overall Greenness: C-
- Energy Efficiency: B
- Transparency: F
- Technological Innovation: Unknown
- Total Renewable Energy Portfolio: 1.6 GW