Amazon redshift spectrum vs athena

9/16/2023

Seamless scalability, high performance, and unlimited concurrency We have the data available for analytics when our users need it with the performance they expect. With Redshift Spectrum, we store data where we want, at the cost that we want. Redshift Spectrum gives us the ability to run SQL queries using the powerful Amazon Redshift query engine against data stored in Amazon S3, without needing to load the data. Why we extended Amazon Redshift to Redshift Spectrum The impact on our Amazon Redshift cluster was evident, and we saw our CPU utilization grow to 90%. In our peak, we maintained a cluster running 65 DC1.large nodes.

However, this approach required Amazon Redshift to store a lot of data for long periods, and our data grew substantially. Our customers could see how their campaigns performed faster than with other solutions, and react sooner to the ever-changing media supply pricing and availability. For us, that meant loading Amazon Redshift in frequent micro batches and allowing our customers to query Amazon Redshift directly to get results in near real time. We insisted on providing the freshest data possible. We saw other solutions provide data that was a few hours old, but this was not good enough for us. The ability to provide fresh, up-to-the-minute data to our customers and partners was always a main goal with our platform. I also share key performance metrics in our environment, and discuss the additional AWS services that provide a scalable and fast environment, with data available for immediate querying by our growing user base. I cover how our data growth and the need to balance cost and performance led us to adopt Redshift Spectrum. In this post, I explain the reasons why we extended Amazon Redshift with Redshift Spectrum as our modern data warehouse. Yet, we still wanted to have the data immediately available for user queries and to meet their expectations for fast performance. To balance cost and analytics performance, we looked for a way to store large amounts of less-frequently analyzed data at a lower cost. We saw our Amazon Redshift cluster grow from three nodes to 65 nodes. Over the past three years, our customer base grew significantly and so did our data. We chose Amazon Redshift because of its simplicity, scalability, performance, and ability to load new data in near real time. When running real-time bidding (RTB) campaigns in large scale, data freshness is critical so that our users can respond rapidly to changes in campaign performance.

We store massive amounts of ad transaction data that our users and partners analyze to determine ad campaign strategies. NUVIAD is, in their own words, “a mobile marketing platform providing professional marketers, agencies and local businesses state of the art tools to promote their products and services through hyper targeting, big data analytics and advanced machine learning tools.”Īt NUVIAD, we’ve been using Amazon Redshift as our main data warehouse solution for more than 3 years. I'll recommend Redshift for now since it can address a wider range of use cases, but we could give you better advice if you described your use case in depth.This is a guest post by Rafi Ton, founder and CEO of NUVIAD. If you choose Redshift you'll need to ingest the data from your files into it and maybe carry out some tuning tasks for performance gain. In the case you go for Athena you'd also proabably need to change your file format to Parquet or Avro and review your partition strategy depending on your most frequent type of query. In both cases you may need to adapt the data model to fit your queries better.

Once you select the technology you'll need to optimize your data in order to get the queries executed as fast as possible. If performance is not so critical and queries will be predictable somewhat I'd go for Athena. If performance is a key factor, users are going to execute unpredictable queries and direct and managing costs are not a problem I'd definitely go for Redshift. First of all you should make your choice upon Redshift or Athena based on your use case since they are two very diferent services - Redshift is an enterprise-grade MPP Data Warehouse while Athena is a SQL layer on top of S3 with limited performance.

0 Comments

Amazon redshift spectrum vs athena

Leave a Reply.

Author

Archives

Categories