Making choices: what kind of relationship are you seeking with your database?

There are many choices available for the database tier of the modern application, particularly in the cloud. In this webinar, I reviewed the database landscape and provide practical recommendations for making sense of an overwhelming amount of options, and discuss best uses of MySQL and NoSQL flavors (especially MongoDB, Redis, and Hadoop).

The Hadoop Ecosystem and a Modern Data Architecture

hortonworks-logoOur friends from Hortonworks are making huge progress on delivering Hadoop for the Enterprise and making the vision of the Modern Data Architecture a reality with a solid ecosystem that includes Rackspace.

Download their “Hadoop and a Modern Data Architecture Whitepaper” and learn more at http://hortonworks.com/blog/hadoop-ecosystem-modern-data-architecture/ 

 

Cloud Orchestration: Automating Deployments Of Full Stack Configurations

This blog first appeared on my Rackspace.com blog.

Over the past few months we have been working hard to provide more automation capabilities for the Rackspace Cloud. In November we told you about Auto Scale, which allows you to define rules to grow or shrink your cloud. Back in August we told you about Deployments, a Control Panel capability that allows you to deploy common applications with a point-and-click interface and based on our best practices.

A common request we have heard from you that you want a programmatic approach to create and deploy full stack configurations. Today we are releasing the Cloud Orchestration API to help configure and deploy your cloud stack topologies and applications.

WHAT IS CLOUD ORCHESTRATION?

Automating your deployment helps you be more efficient, saves you time that you can use more productively, and helps you reduce the possibility of manually introducing configuration errors. Many of you are familiar with the Nova API. There is really no easier way to programmatically create a single server on the Rackspace Cloud. But what about deploying more complex configurations? What if you need to install software or provision load balancers, servers and databases, and wire them all together?

This new Cloud Orchestration API makes automation easy for you. Cloud Orchestration is a service that allows you to create, update and manage groups of cloud resources and their software components as a single unit and then deploy them in an automated, repeatable fashion via a template.

Before today, you had to worry about separate service API call code to get your application up and running. You also had to worry about the order in which you instantiate and connect cloud resources. With the Cloud Orchestration API, now you can simply declare the resources you want, how you want them configured, and what software to install by simply editing a text file (the configuration template) instead of editing API call code. The Cloud Orchestration service implements automation for you. Configuration updates can be done by simply editing the configuration template and making a single API call to Cloud Orchestration, regardless of how many changes you made (say, adding more nodes to your auto scaling policy or adding a new node to your database tier). Even deleting the entire stack can be done with a single API call.

WHAT IS SUPPORTED?

Cloud Orchestration today supports declaration and configuration of:

  • Cloud Servers
  • Cloud Load Balancers
  • Cloud Databases
  • Cloud Block Storage
  • Cloud DNS
  • Auto Scaling
  • Bash scripts
  • Chef Cookbooks and Berkshelf

We are working on additional features in the open, jointly with other contributors to OpenStack. Here is a list of some future features that are in development here at Rackspace or are in planning within the open community:

  • Support for additional cloud resources, including Cloud Monitoring and object storage through Cloud Files, among others.
  • Automatic error handling and retry logic to ensure you always get exactly what you declare despite underlying system errors.
  • Self-healing, for automatic repair or re-provisioning of new resources when a stack becomes unhealthy or a resource is not performing as expected.
  • Integration with additional software configuration management tools, such as Ansible.
  • Multi-region deployment from within a single API call or template.
  • Catalog functionality to help you deploy Rackspace’s or the community’s best-practice templates.

BENEFITS OF CLOUD ORCHESTRATION

Declarative Syntax and Portability

With Cloud Orchestration, you first specify declaratively the set of cloud resources that needs to be deployed, using the OpenStack Heat Orchestration Template (HOT) format. A declarative template format (as opposed to other imperative approaches) ensures that you don’t have to be concerned with how the provisioning will happen. You just specify what needs to happen. Your declaration is also separated from any input you provide on the how (e.g. scripts and recipes). This principle of Separation of Concerns helps simplify the maintenance of your infrastructure and allows you to more easily port your templates to other OpenStack clouds running the Heat service.

Take a look at the below:

heat_template_version: 2013-05-23

resources:
  compute_instance: 
    type: "OS::Nova::Server"
    properties:
      flavor: 1GB Standard Instance
      image: CentOS 6.4
      name: A very simple stack

That simple example shows how we specify that the deployment requires a 1GB server instance with CentOS 6.4.

More productivity with reusable, repeatable and intelligently ordered resource provisioning

Cloud Orchestration takes care of sequencing the provisioning operations (“orderability”) of your stack. For example, imagine that you have to setup three servers and one load balancer. Cloud Orchestration will ensure that the servers are up and working and the appropriate IP addresses get added to the load balancer before completing the stack. Cloud Orchestration has the intelligence to determine what needs to be provisioned first and in which order each task in the provisioning workflow must be executed. You don’t have to worry about ordering tasks.

Cloud Orchestration templates ensure repeatable deployments. New configurations based on the same template are deployed in exactly the same manner, reducing errors by avoiding potential configuration deltas.

Finally, Cloud Orchestration templates are reusable, nestable, and portable, which can help improve your productivity. You can utilize templates that you have previously created or those created by the community. The rich orchestration capabilities will ensure that you always get a full working stack. Cloud Orchestration templates are also portable across private and public cloud deployments of OpenStack.

HOW ARE HEAT ORCHESTRATION TEMPLATES (HOT) DIFFERENT FROM CHEF AND PUPPET?

Cloud Orchestration is not a replacement for server configuration tools such as Puppet and Chef. They are very complementary. You will continue to use Chef and Puppet to “template-ize” your server software configurations, while Cloud Orchestration and its HOT templates will help you create a full stack that includes all the infrastructure resources required for your stack. Cloud Orchestration allows you to quickly bootstrap your preferred software configuration management solution. With Chef, we have gone several steps further and provided direct support to specify the cookbooks and Berksfile you want deployed.

Heat is the future of automation in our cloud

We are making a big commitment to the OpenStack Heat project here at Rackspace. We have been making great strides with Heat as a community over the past few months. Today, we are making the capabilities of Cloud Orchestration available via API, but we are already working to provide this capability directly in the Control Panel. You will hear from us soon. We will also be integrating the current Deployments feature of the Control Panel into Cloud Orchestration.

Best Practices from Rackspace, backed by Fanatical Support

One of the greatest benefits of helping thousands of customers through our support and DevOps services is that the templates we produce here at Rackspace represent best practices that you can take advantage of. We see each template we produce as the implementation of months and years of experience for a specific application or scenario. You can just “borrow” these templates and customize them for your own purposes. With Cloud Orchestration, you get reliable and repeatable deployments every time.

We have created a GitHub organization at http://github.com/rackspace-orchestration-templates. In there you will find a list of our available orchestration templates. These templates contain tried-and-true application and resource topologies to provide an optimized experience for a particular type of application or infrastructure configuration. While these templates may not be suitable for every use case, they serve as a trusted foundation for many common use cases and are an ideal reference from which you can build a custom template or use as-is. To start, you will find templates for WordPress (Single and Multi-node options), Minecraft server, Ghost, and PHP. We will continue to add new templates regularly. Please check back often. If you would like to see something there that you don’t see today, leave a comment below or in the Cloud Orchestration community post.

Finally, in the Rackspace Orchestration Templates organization in GitHub, you will also find the sample-templates repository, which includes examples that you can use when learning how to write your own templates. These templates may not always capture best-practice application or resource topology configurations, but will serve as a frame of reference on how applications and resources can be constructed and deployed with Cloud Orchestration templates.

WHAT SHOULD I DO NEXT?

Take a look at the Getting Started Guide, the API Reference and the Heat Orchestration Template Authoring Guide. Create your first template and deploy it. It is really easy to create, change and finally delete your stack with a few command line calls. Again, visit the Cloud Orchestration community post and let us know what else you would like to see or just drop a comment below. As usual, don’t forget to let us know what great applications you are building.

WEBINAR: Pre-Aggregated Analytics And Social Feeds Using MongoDB

This week we hosted a webinar with Appboy and Untappd to discuss their use of MongoDB.

Jon Hyman, co-founder and CIO of Appboy, an engagement platform for mobile apps, talked about using pre-aggregated analytics on top of the aggregation framework. Greg Avola, co-founder and developer at Untappd, a social network for beer lovers, talked about how MongoDB helped make its social feed faster and how it used location indexes to enable geo-location search.

Check out the video recording and the PDF of the presentation slides:

Cloud Big Data Platform Ready For More Hadoop Apps

This blog first appeared on my Rackspace.com blog.

A few weeks ago we told you about our two Data Services for Hadoop-based applications, the Managed Big Data Platform service (in Unlimited Availability) and the Cloud Big Data Platform (in Early Access). Working hand in hand with Hortonworks, we are giving you a choice of architectures for your Hadoop applications, whether you need a custom-built Hadoop architecture based on specific dedicated hardware, or a dynamic, API-driven programmable Hadoop cluster in our public cloud.

Today, we are pushing the ball further as we move our Cloud Big Data Platform from Early Access into Limited Availability.

WHAT HAVE YOU TOLD US?

The users of our Cloud Big Data Platform in Early Access come from a variety of industries and backgrounds, from online marketing and ecommerce working on recommendation engines and user sentiment analysis, to hospitality and retail looking at product performance, and to education and science working to improve people’s lives. We have learned a lot from you. These three statements summarize what we are hearing:

  • “Big data” technologies are better understood: while many of you were just “testing the water” in the past two years or so, today you are taking a really good look at how to utilize Hadoop in your applications. There is more clarity and understanding of the technology, its uses and limitations, as the tooling around Hadoop continues to evolve and mature.
  • “Big data” projects are more actionable:  we see more pragmatic approaches in implementations, with a focus on visible value. We see less and less abstract, ambiguous “big data discussions” with unclear goals and hyped expectations, and more practical uses of the technology for projects that drive real business value, as it should be.
  • Doing “big data” is still hard: From a technical perspective, we hear that these projects are still hard for you. There is still a lot of work that we as an industry need to do to make sure the technology is manageable, performant and scalable to make these initiatives less difficult to carry out. We are glad to be doing our part to help you with these projects.

FROM EARLY ACCESS TO LIMITED AVAILABILITY

Today, our Cloud Big Data Platform is moving from Early Access to Limited Availability. Limited Availability is the last phase we use here at Rackspace prior to the service being delivered in Unlimited Availability in a few months. We hope to get as many customers on the service as we can, but we will unfortunately not be able to accept everybody yet. However, let us know what you are working on and will definitely consider your application. The most important aspect for us is making sure we deliver on our promise of being Fanatical across the lifecycle of your initiative, which is why we have our Early Access, and now our Limited Availability program.

What does this mean to you beyond the new capabilities we have built in?

It means three things:

  • We are broadening the pool of customers we are accepting into our service.
  • We now offer the full Fanatical Support and Rackspace Cloud SLA for your workload to enable you to bring production applications to the service.
  • We are going to start billing you for the resources you consume. This is important for YOUR accounts payable team or personal credit card!

To sign up for Limited Availability, visit the Cloud Big Data Platform page at http://www.rackspace.com/cloud/big-data. Click on the “CONTACT US” button and fill out that simple contact form. A Racker will contact you to get you on board. Once we grant you access to the service, you will then simply provision a cluster right from Cloud Control Panel or using the API.

CHOICES OF NODES

You will have two options for your deployment depending on your data and compute requirements. See below. It is worth emphasizing here that the 1.3TB instances will share the compute node, butthe 11TB instances are all single tenant.

cloud-big-data-platform-chart

Remember that Hadoop keeps three copies of your data by default. To account for your total storage footprint, remember to multiply by three the estimated source data volume you have, and then to decide how many nodes you will need to provision.

And yes, to repeat, the free period is over and you will start getting billed for the resources you consume.

WE ARE AVAILABLE IN LONDON!

Cloud Big Data Platform is now available in London!

The hourly and equivalent monthly prices will be as follows:

  • £0,27 per hour (£197,10 per month) for the 1.3TB instance
  • £2,16 per hour (£1.576,80 per month) for the 11TB instance

NEW CAPABILITIES FOR LIMITED AVAILABILITY

We have added a number of new capabilities to the Cloud Big Data Platform over the past few weeks to prepare it for Limited Availability. Here are some of them:

  • Rackspace Cloud SLA for your production workload: Limited Availability environments are production-ready. The Rackspace Cloud SLA applies, which includes 99.9% instance availability per month (excluding scheduled maintenance) and 100% network availability (also excluding scheduled or emergency maintenance).
  • Fanatical Support: We have the Hadoop teams ready to help you in your design and deployment. As we said above, “doing big data” can be difficult, and our Hadoop Support teams are ready to help you be successful in your application.
  • Network Performance: This is significant. For example, in a simple network throughout test, our Early Access environment in DFW saw 480Mbps. We repeated that test in our Limited Availability environment and now see about 5.2Gbps in DFW, ORD and LON. That is an improvement of about 10 times.
  • More storage: We raised the limit of node storage to 11TB per node.
  • Single tenancy available: Our 11TB instances are single-tenant. Your application will have the whole node all to itself.
  • Cloud Files Connector: We have improved the integration between the service and our object storage in Cloud Files.

WHAT DID WE NOT GET TO FOR LIMITED AVAILABILITY?

One thing we have not made available in Limited Availability is Hortonworks Data Platform 2.0. In Limited Availability, we support Hortonworks Data Platform 1.3. Some of you are interested in the new goodies in newest codebase. Particularly, we hear that you want:

  • YARN to move beyond batch into online and interactive queries: You want to do more than MapReduce (batch) queries. We heard that you are exploring Giraph, Storm, HBase and Tez for your applications requiring interactive, online and streaming patterns.
  • Better SQL semantics: We heard that you are looking for improved SQL semantics in access to your queries, and are interested in exploring how HIVE is evolving through the Stinger initiative.
  • Higher Hadoop availability: through the improvements in the latest bits of HDFS, better handling of name node failures in Hadoop, snapshots and NFS, among others.

Rest assured that our engineers are hard at work to make this available in our Managed and Cloud Big Data Platform services. Expect more news from us as we learn more from you in this Limited Availability phase and work towards Unlimited Availability. We are making sure that there is an ample supply of storage and compute nodes available for your needs in our datacenters, we want to enhance the API, UI and overall programmability.

LET ME SEE MORE!

If you have one hour to spare, watch the video below entitled “Apache Hadoop on the Open Cloud” with Nirmal Ranganathan from Rackspace and Steve Loughran from Hortonworks. In it, they cover:

  • An overview of the service, and its OpenStack architecture
  • Using the Control Panel and API to provision and manage a cluster of Hadoop nodes
  • Processing location data from Wikipedia, off of Cloud Files object storage
  • Rendering the data on a simple Google Map

You can click on the video below to hear Nirmal and Steve.

Choosing The Right Cloud Provider For Your MongoDB Database

This blog first appeared on my Rackspace.com blog.

In yesterday’s post I considered some of the limitations of running MongoDB on the public cloud. In the event that you decide to host MongoDB with a cloud provider, below are some thoughts on how to choose the right one. The framework is actually applicable to many other data services, but we will continue to use MongoDB for the discussion.

The choice of a cloud provider affects how you will be able to apply your development and operation resources for a given application. In a world of limited resources, any effort applied to the management of the database engine is not necessarily being applied to the development of your application.

Let’s discuss four progressively sophisticated levels:

  1. Do-it-yourself database
  2. Provisioned database
  3. Automated database
  4. Managed database service

In the graphic below we see those four levels. I also show a number of database related tasks (from architecture, to management and administration). In the top half of the chart (in green), we see activities that are managed by the MongoDB service provider FOR the developer, while in the bottom half of the chart (in red) we see the activities that have to be performed BY the developer. As we move to the right in the chart, we see that the MongoDB service is providing more capabilities to you as a development team. Clearly, the less “database management time” that you have to invest in, the more “application development time” that is available towards your business goal.

mongoDB-provider-chart

Let’s now discuss each level.

Level 1 (Do-it-yourself database)

At this level, you are able to procure a generic hosted server from the MongoDB cloud provider. You are free to select the specifications of the hardware, but you are also burdened by that decision. The architecture can be difficult to scale later on. Additionally, you must install and configure MongoDB, and are also required to implement backups, compacting, sharding, monitoring and all other maintenance activities to keep MongoDB running. Achieving availability and performance on the cloud is also something that you must develop from scratch. All of these activities take resources away from developing the actual application. This level may be well suited for your development or testing needs because of its initial low cost.

Level 2 (Provisioned database)

This level is better than Level 1. The cloud provider makes available to you some standard choices for hardware and automatically installs MongoDB, typically on another IaaS hosting provider. The level of sophistication is not great, but this level may be initially attractive because of the speed of provisioning, but obviously you realize the consequences that that choice has for production workloads. You must live with the hardware choices and their effect on the performance and scale of MongoDB. Maintenance and administration may still fall back on you, along with other complex responsibilities such as clustering and cluster management, which have a big impact on availability, scale and performance. Some MongoDB hosting startups fall within this level.

Level 3 (Automated database)

This level represents a significant improvement. Here, the MongoDB provider delivers a set of automation features, on top of provisioning and configuration choices. Backups are scheduled, monitoring runs automatically and other maintenance activities such as version upgrades are performed. While the level of automation may vary, you start to feel that you can outsource many management tasks to the automation suite. In a perfect world, this would be sufficient. But in reality, you are still in charge of “dealing with the database,” particularly as the application grows and the initial architecture starts showing signs of stress under the new level of load, affecting the performance and scalability of the application.

Level 4 (Managed database service)

Cloud vendors who deliver Level 4 capabilities offer a key important benefit over Level 3: you no longer deal with the database engine from a management perspective, and instead see the database engine as a service that just works. You connect to a hostname and port, consume a data service that requires virtually no maintenance and ideally never need to consider what’s sitting behind it (other than to build the application itself). What drives that service is an architecture that is fundamentally different than that available in previous levels, one that places scale, performance and availability at the core of the engineering choices. In the case of MongoDB, the architecture should deal with sharding transparently, for example, for vertical and horizontal scaling. It should also place particular emphasis on performance across the stack, from the network, to the storage choices and MongoDB configuration. Finally, it should take care of the complexities of achieving uptime by, for example, automatically allocating instances across hardware in a way to ensure redundancy and avoid single points of failure. Vendors at this level may also offer an enhanced level of analytics and tooling that helps you optimize your application in a proactive way. As examples, profiling and I/O and query analytics can help tune your application, or background tasks such as shard balancing or defragmentation are done not just automatically but also scheduled at appropriate times to reduce impact to production. Clearly, there are always some tasks that cannot be automated, but these should be application-specific, such as migrations or integrations with other systems.

THE ROLE OF OPERATIONAL EXPERTISE

A MongoDB cloud provider should deliver technical expertise to help you architect the best application possible. This kind of operational expertise should help your developer and application experts and bring them into a collaborative partnership across all phases of development—from data migration to production and code refreshes.

FINAL THOUGHTS

Obviously, non-technical issues are also important considerations when choosing a provider. Pricing and financial terms, the technical and operational reputation of the provider, compliance and regulatory restrictions as well as SLAs should not be overlooked.

As with any other platform choice, the decision of a MongoDB provider should not be taken lightly, particularly when a company’s main application depends squarely on MongoDB for its data tier. Development teams need a vendor with the flexibility to offer solutions to fit the different app needs and stages of the development lifecycle. Discuss your data design strategy early and often with the MongoDB experts of your cloud vendor. The strength of that expertise—not just the quality of the infrastructure—will be a key factor in any evaluation of a cloud provider, as your application will surely evolve and grow over time, and require that level of technical partnership for your application.

At Rackspace, our Data Service for MongoDB, ObjectRocket, was built as a Level 4 MongoDB managed service from the ground up: a unique architecture to make MongoDB perform fast, with high availability and scale. Let us know how we can help make your next MongoDB app deployment a success.