Architecting applications to scale in the cloud
Speed, agility and resilience are just some of the perks of building applications in the cloud.
Amazon Web Services (AWS) and similar cloud-service offerings have revolutionized how organizations approach application development and deployment. Through cloud services, organizations can obtain on-demand, fast and modular infrastructures for deploying virtual instances of networking components, storage repositories, computer platforms and management frameworks. As cloud services continue to mature, the platform-as-a-service (PaaS) model has steadily gained recognition and favor.
The abstraction layers provided when using a PaaS model offer a particularly effective means for developers to focus on coding and to implement a variety of business applications efficiently on public clouds.
These abstractions also relieve administrators of many configuration and deployment challenges; additionally, they help maintain traditional hardware servers and easily configure the PaaS environment through administrative tools and control panels to address changing requirements.
Between IaaS and SaaS
In a typical implementation, PaaS serves as a partial operating system and middleware, residing in the technology stack between an underlying infrastructure-as-a-service (IaaS) layer and the software-as-a-service (SaaS) layer that provides a user interface.
The combination of AWS and PaaS provides many advantages, as well as mechanisms for developers and architects to design, plan and implement new technologies, and prototype new solutions without the cost and commitment of traditional computing. Software developers following agile development processes have the flexibility to move from ideas and concepts through rapid development to a marketable product — swiftly and cost effectively — within a responsive PaaS environment.
Hyland’s Nuxeo Platform and AWS
Using a PaaS model optimized for the capabilities and features of AWS, Hyland's Nuxeo Platform has designed an extensible modular framework for handling high-volume document management and complex digital asset management tasks — with high scalability and reliability. This design framework makes it possible to efficiently leverage AWS cloud services and customize the platform for a diverse variety of requirements.
Let’s explore, from an architectural overview, the fundamental mechanisms for building and deploying content management applications on AWS using Nuxeo Platform.
AWS for content management
Since the introduction of Amazon Simple Storage Services (S3) in 2006, AWS has evolved into a powerful, responsive infrastructure that has helped shape the way in which applications are developed and how enterprises handle IT requirements — providing commodity-level access to a full range of cloud services. Several AWS characteristics make it well suited for content management tasks, including:
Fast, low-latency content distribution
Delivery of content over the web to end users using Amazon CloudFront is a fast, cost- effective way for handling everything from streaming video to entire dynamic websites. Through integration with other AWS offerings, such as S3 and EC2, content delivery can be optimized to take advantage of Amazon’s global network of edge locations, minimizing latency and boosting performance. As a usage-based service with no commitment, the only costs accrued derive from the actual volume of content delivered. Content collaboration can be handled efficiently with full access for workgroups spread out across multiple locations. Even extremely large files, such as those encountered in engineering projects and digital asset management environments, can be delivered securely worldwide — at high speed — to support collaborative efforts.
Inexpensive, reliable storage
S3, Amazon’s pioneering cloud storage service, continues to provide value to enterprises that require a reliable, scalable method for storing many kinds of digital assets in the cloud. Developers gain secure access to object containers, referred to as buckets, that are addressable by URL, located in a specified geographical region, and scalable to accommodate escalating storage demands.
Nuxeo Platform supports AWS storage types (Standard S3, S3 Object Lock, S3 Glacier), and it has the ability to define where content gets stored based on the document type, retention policy and/or the lifecycle state.
Elastic compute resources
Content-centric applications often place varying demands on compute resources for media processing, handling different levels of user traffic, data migration, sorting and indexing, and other tasks. AWS provides a number of elastic capabilities that can scale to meet demands, including the capability to scale out virtual servers (EC2), media transcoding (Elastic Transcoder), perform Auto Scaling and adjust load balancing (ELB).
Proven work environment
AWS has refined and enhanced its cloud service offerings over many years to ensure a secure and reliable work environment for the mission-critical applications encountered in content management operations. Important features are available and accessible, including database snapshots, automated load balancing, key performance metrics, high-volume bandwidth availability, monitoring and so on.
Other AWS capabilities support content-centric applications very effectively. For example, disaster recovery can be accomplished rapidly using failover techniques, minimizing downtime and potential enterprise losses from business interruptions. In digital asset management scenarios, back-end processing nodes can be separated from front-end servers so the scaling of images and video processing can be handled more efficiently.
AWS provides a capable IaaS framework for new-generation digital asset management projects, distributed content collaboration and global content distribution.
Architecting applications for the cloud
Simply porting traditional applications over to a cloud platform is usually not the best option: Not leveraging the PaaS offering, developers would have to build and manage their own services. To take full advantage of a cloud-services environment, whether IaaS, PaaS or SaaS, the key to success entails understanding the limitations of the environment and then effectively leveraging the built-in advantages. Designing an application from the ground up — the best way to build apps for the cloud — requires developers adopt a different mindset and master a new paradigm.
For example, architecting an application that can rapidly adapt to high volumes of transactions and a varying number of users typically requires that developers create a large amount of code to handle the demands of scaling, which might include caching, database scaling, asynchronous messaging and so on. A well-designed PaaS platform will already be optimized with built-in capabilities for these functions. Instead of writing code, the developer can simply tap into the appropriate functions as needed.
General guidelines for architecting applications for availability in the cloud include:
Anticipate failure
Be aware of the possibility that parts of the cloud will sometimes fail. Design and test applications for resiliency and the ability to respond to failures. Componentize applications so multiple modules that communicate with each other through an API can recover independently, if necessary. Data replication and multiple deployments of critical components are useful in this regard.
Employ stateless computing techniques
Using stateless protocols eliminates the possibility that a state stored in memory will be lost due to an outage or service interruption. Because the internal state of an application won’t be available as conditions change or failures take place, store these states in an object store, database or message queue.
Scale up and out
When it comes to pure processing, scaling out horizontally is much more effective, offering essentially unlimited scalability that takes advantage of elasticity of cloud resources. However, for the persistence layer (i.e. databases), there are a lot of cases when scaling up can make sense and would actually be the right solution. If you're using a SQL PaaS like RDS, scale up is the option you should look at. If you're using NoSQL and natively distributed storage, you gain the possibility to scale out by adding nodes, but it may involve data migration (even if automated).
Keep data consistency in mind
Because there can be multiple instances of an application residing in different geographic regions, some changes in the database or the application may not be reflected for several milliseconds. To maintain a model where data is replicated and highly available within this environment, developers must devise an approach that handles potential inconsistencies when different application instances draw from the same database.
Investigate the specifications of the PaaS that you select to fully understand the capabilities and feature set as you begin development. In the case of Nuxeo Platform, even if your application requirements are modest, such as basic document management or digital asset management that includes workflow processes, built-in cloud-aware capabilities that have been architected into the product can save considerable time and effort, as well as improve the reliability of your application. Nuxeo Platform also frequently updates the platform to include new features and capabilities, so as cloud service technologies evolve, architects and developers can take advantage of the latest enhancements.
Introducing Hyland’s Nuxeo Platform
Hyland’s Nuxeo Platform, when coupled with AWS and the available toolset, provides a comprehensive, extensible platform, readily adaptable to business application development. Through the efficiencies of operating tightly with AWS, Nuxeo Platform enables architects and developers to easily build and run content-focused business applications that can handle extremely large document sets (even at volumes ranging into the billions).
Several prebuilt applications are included, allowing development teams to quickly deploy and launch a number of fully featured content management tools or customize these applications to meet specific requirements. These modern technologies, powerful plug-in model, integrated development environment and flexible packaging capabilities make the Nuxeo Platform an ideal environment to rapidly design, develop, and deploy applications, on premises or within a cloud environment.
Nuxeo Platform supports the creation of end-to-end workflows — through a graphical interface — for performing content management processes. Alternatively, applications can be built within the integrated development environment, accessing the exposed functionality in an API that supports the representational state transfer (REST) model.
Tailoring apps to individual PaaS capabilities
Capabilities and functionality of individual IaaS offerings vary. For maximum efficiency, interoperability and performance, when building applications to run on top of an IaaS framework, developers gain many advantages by exploiting the built-in features, components and capabilities available.
The infrastructure offered by the cloud provider typically:
- Includes components that are fully integrated and tested for interoperability
- Features mechanisms for manual or automated scaling of virtual servers, storage, network resources and other system resources
- Provides an easy way to monitor ongoing costs by usage, with billing limited to the resources used
- Costs substantially less to use than comparable on-premises systems
As a developer, when architecting a solution to deploy in an IaaS environment, you should:
- Rely as much as possible on open standards, as well as industry standards that are widely accepted
- Build solutions based on a pluggable architecture
The Nuxeo Platform supports both a standards-based architecture and pluggable component model (Extension Point). Nuxeo Platform is well suited for leveraging the AWS infrastructure, featuring:
- Meta-Data Store: AWS RDS for PostgreSQL, MongoDB Atlas
- Binary Store: S3 Binary Store
- ElasticCache
- AWS Elasticsearch
- AWS MSK (for Kafka)
- AWS Lambda
- Conversions
- AI services
These same basic concepts apply to a number of cloud-specific services, including provisioning and monitoring.
As much as possible, the solution should fit in the model as defined by the IaaS provider and exploit those capabilities that make it easy to perform operations without the need for extensive coding. Ideally, this includes the capability to configure automated processes that would otherwise need to be manually managed or configured.
By default, Nuxeo Platform is packaged as Debian packages and also has available installers for a number of other environments.
Nuxeo Platform can also be set up using Amazon Machine Instance.
For monitoring and managing resource use, Nuxeo exposes its metrics by means of Java Management Extensions (JMX).
These metrics can be accessed by AWS CloudWatch and used to engage AutoScaling to respond rapidly to changing application demands.
Scale out and distributed architecture
Scaling fluidly to meet the demands of business is a key advantage of cloud services. The AWS IaaS includes a number of features that make it possible to react as fast as possible and adapt to the demand, removing the need for administrative monitoring and manual actions to address resource issues.
The underlying architecture of cloud services makes it easy to quickly provision new servers (scaling out), but much more difficult to quickly provision the basic resources of a virtual server. However, you can increase available processing power using clustering of virtual servers to scale out, effectively achieving the same processing gains as would be achieved by scaling up. Alternatively, to meet specialized application requirements, you can select virtual machines (VMs) that are preprovisioned with certain characteristics, such as VMs optimized for handling heavy I/O or providing substantial amounts of memory.
The bottom line is: Your application will be able to scale in the cloud if it supports scale out. If you are limited to scale up, you won’t be able to gain much benefit from cloud services.
Nuxeo Platform architecture can scale out along different axes, adapting to whatever type of demands are to be absorbed, as described in the following sections.
CPU scale out
Nuxeo Platform processing demands can easily be scaled out using the built-in clustering model. In support, AWS includes specific features to accommodate high-performance computing in the cloud, using Cluster Compute. This lets you scale out applications across thousands of cores to more effectively handle massive throughput demands with tightly coupled I/O across a high-bandwidth network.
Using several midsize VMs, you can build a high-performance Nuxeo Platform application and then use AutoScaling to automatically add one or more VMs when the load increases.
Dedicated resources and specific processing requirements
Certain types of processing operations require specific resources and hardware. AWS offers several types of VM configurations to meet a range of requirements, including:
- I/O provisioned to handle I/O-intensive operations
- Graphic processing unit (GPU) or High CPU to take care of processor-intensive tasks, such as video transcoding or artificial intelligence
- High-memory VM to accommodate applications that require large blocks of contiguous memory space to operate efficiently
Nuxeo Platform architecture provides the flexibility to dedicate nodes for specific types of processing operations so you can:
- Use a general-purpose VM to accomplish basic tasks
- Ensure good response time for the interactive users
- Ensure cost-effective processing
- Leverage specific VMs for particular demanding tasks
Dedicated processing nodes can be useful for:
- Performing video transcoding
- Processing high-resolution digital images
- Scanning large files for viruses
- Performing optical character recognition on scanned images
- Running cryptographic algorithms to encrypt and decrypt files
- Indexing large collections of documents
Nuxeo Platform product experts recommend two approaches:
- Deploy exactly the same image on each node. The one exception would be for hardware and the work-queue configurations. This ensures that if the dedicated nodes become unavailable, the standard nodes can continue with the processing.
- Isolate the different types of processing inside the application: It will be easier to monitor and scale out the part that is needed and will avoid the "noisy neighbors" issues.
Storage scale out
Typically, solutions rely on scaling up the data tier. Even if this is somehow possible — with solutions like AWS RDS — it is usually not the optimal approach.
- Scaling up is not cost effective.
- Scaling up cannot be progressive and transparent.
- Scaling up cannot be continued indefinitely.
To address this, Nuxeo Platform includes a number of options that support scaling out, to improve processing at the data level.
Query scale out
When using the default storage back end, Nuxeo Platform makes extensive use of the database. Queries in particular create a great deal of database activity with a potential for bottlenecks. In certain configurations, the Nuxeo Visible Content Store (VCS) generates complex SQL queries that keep the database server very busy.
When the volume of data queries increases, the number of concurrent accesses increases as well. Queries typically present the primary bottleneck that diminishes the database server performance, slowing query response times.
At the SQL level, you essentially have two options for reducing the bottleneck:
- Use a database server with greater capacity
- Denormalize the data to make the query run faster
The use of Elasticsearch as the primary query engine lets you transparently direct the query to scale on multiple nodes and maintain high-end performance — even on massive volumes — using midrange hardware. By nature, Elasticsearch does not try to enforce the same kind of integrity that ACID properties do. Because of that, it can be easily distributed across several nodes, providing a very simple and efficient scale-out solution for queries.
As concurrent users increase, the requests handled per second scales very effectively. Hyland’s internal platform testing showed an impressive degree of scaling to handle concurrent requests even at volume levels of one billion documents.
> Read more | Benchmarking a content solution platform at 11 billion documents
Scaling out with multiple data stores
In terms of storage, if a database server reaches its upper limits for store-and-retrieve operations, Nuxeo Platform supports data sharding across several repositories. With this capability, you can exceed the scale-up limitations of one database server, because the application can distribute data across several repositories, each of them associated with a single database. This effectively boosts both database performance and scalability.
Elasticsearch makes it possible to maintain a unique index for performing federated search operations across multiple repositories. From a document and asset management perspective, this mechanism can be extremely useful, providing a single interface to a diverse range of information resources and returning query results in a single list. It also makes it possible to link directly to each resource to further expand the search.
Scaling out with NoSQL
If your database requirements approach the level of billions of documents, traditional SQL databases may not be the right choice, even with the help of Elasticsearch. Nuxeo Platform support for MongoDB, a NoSQL database, extends high-volume database capabilities with powerful scaling. The NoSQL database movement originated in response to the well-recognized limitations of traditional relational databases and the difficulties in storing and analyzing massive volumes of data encountered in many of today’s web applications.
MongoDB offers these benefits in a content management environment:
- Relies on a native distributed architecture
- Scales out very easily
With the pluggable Nuxeo Platform architecture, switching from an SQL backend (VCS) to a MongoDB backend (DBS) becomes a basic configuration task that can be accomplished during deployment.
About Hyland and its intelligent content solutions
Hyland, an industry-leading intelligent content solutions provider, offers enterprise content management platforms and hundreds of solutions that are cloud-ready to empower customers to deliver exceptional experiences for the people they serve.
Trusted by thousands of organizations worldwide, including more than half of the Fortune 100, Hyland’s solutions connect systems and manage high volumes of diverse content to improve, accelerate and automate processes and workflows.
You might also like:
Fundamentals of cloud data security and protection
Cloud data security is a top priority for any cloud-enabled organization. Proper execution requires cloud expertise, organizational behavioral training and continual digital transformation of the technology stack and strategy.
A guide to building your business case for the cloud
How to communicate the urgency of cloud migration to nontech executives.
Benefits of DAM in the cloud
Managing your assets in the cloud keeps you well positioned to meet the growing demands for content.
Understanding cloud computing for nontechnical professionals
Learn what nontechnical business stakeholders need to know about the cloud when choosing cloud-based software.