Transcription

Cloudera Data Warehouse Public CloudAWS environmentsDate published: 2020-02-20Date modified: 2021-04-27https://docs.cloudera.com/

Legal Notice Cloudera Inc. 2022. All rights reserved.The documentation is and contains Cloudera proprietary information protected by copyright and other intellectual propertyrights. No license under copyright or any other intellectual property right is granted herein.Unless otherwise noted, scripts and sample code are licensed under the Apache License, Version 2.0.Copyright information for Cloudera software may be found within the documentation accompanying each component in aparticular release.Cloudera software includes software from various open source or other third party projects, and may be released under theApache Software License 2.0 (“ASLv2”), the Affero General Public License version 3 (AGPLv3), or other license terms.Other software included may be released under the terms of alternative open source licenses. Please review the license andnotice files accompanying the software for additional licensing information.Please visit the Cloudera software product page for more information on Cloudera software. For more information onCloudera support services, please visit either the Support or Sales page. Feel free to contact us directly to discuss yourspecific needs.Cloudera reserves the right to change any products at any time, and without notice. Cloudera assumes no responsibility norliability arising from the use of products, except as expressly agreed to in writing by Cloudera.Cloudera, Cloudera Altus, HUE, Impala, Cloudera Impala, and other Cloudera marks are registered or unregisteredtrademarks in the United States and other countries. All other trademarks are the property of their respective owners.Disclaimer: EXCEPT AS EXPRESSLY PROVIDED IN A WRITTEN AGREEMENT WITH CLOUDERA,CLOUDERA DOES NOT MAKE NOR GIVE ANY REPRESENTATION, WARRANTY, NOR COVENANT OFANY KIND, WHETHER EXPRESS OR IMPLIED, IN CONNECTION WITH CLOUDERA TECHNOLOGY ORRELATED SUPPORT PROVIDED IN CONNECTION THEREWITH. CLOUDERA DOES NOT WARRANT THATCLOUDERA PRODUCTS NOR SOFTWARE WILL OPERATE UNINTERRUPTED NOR THAT IT WILL BEFREE FROM DEFECTS NOR ERRORS, THAT IT WILL PROTECT YOUR DATA FROM LOSS, CORRUPTIONNOR UNAVAILABILITY, NOR THAT IT WILL MEET ALL OF CUSTOMER’S BUSINESS REQUIREMENTS.WITHOUT LIMITING THE FOREGOING, AND TO THE MAXIMUM EXTENT PERMITTED BY APPLICABLELAW, CLOUDERA EXPRESSLY DISCLAIMS ANY AND ALL IMPLIED WARRANTIES, INCLUDING, BUT NOTLIMITED TO IMPLIED WARRANTIES OF MERCHANTABILITY, QUALITY, NON-INFRINGEMENT, TITLE, ANDFITNESS FOR A PARTICULAR PURPOSE AND ANY REPRESENTATION, WARRANTY, OR COVENANT BASEDON COURSE OF DEALING OR USAGE IN TRADE.

Cloudera Data Warehouse Public Cloud Contents iiiContentsAWS environments overview. 5Activating an AWS environment. 6AWS environment requirements checklist.7Standard required IAM permissions for activating AWS environments. 9Standard JSON IAM permissions policy template. 12Reduced permissions mode for AWS environments. 14Minimum set of IAM permissions required for reduced permissions mode. 15Reduced permissions mode JSON IAM permissions policy template.16Required tags for CloudFormation stacks created with reduced permissions mode.17Activating AWS environments in reduced permissions mode. 18Deactivating AWS environments created with reduced permissions mode.20Retaining PostgreSQL backups in AWS environments.21Viewing and editing AWS environment details.21Deactivating AWS environments. 22Adding access to external S3 buckets for Cloudera Data Warehouseclusters on AWS.23Adding Cloudera Data Warehouse cluster access to external S3 buckets in the same AWS account. 23Adding Cloudera Data Warehouse cluster access to external S3 buckets in a different AWS account. 25Remote access.27Granting remote access to Kubernetes clusters on Amazon EKS. 28Revoking remote access to Kubernetes clusters on Amazon EKs.29Restricting access to endpoints in AWS environments. 30Editing the IP CIDRs in the trusted list for endpoints in AWS environments. 32Networking.33Overlay networks for AWS environments in Cloudera Data Warehouse service. 33Enabling overlay networks in AWS environments.34Use a non-transparent proxy with Cloudera Data Warehouse on AWS environments. 35Configure non-transparent proxies for Cloudera Data Warehouse on AWS environments.36Setting up private networking in AWS environments. 37Supported deployment modes for private networking in AWS.37Prerequisites for private networking in AWS environments. 38Activating an AWS environment with private subnet support. 39Architecture for Private Load Balancer, Private Worker Nodes deployment on AWS. 40Custom tags in AWS environments.41

Upgrades and Helm migration.42Upgrading PostgreSQL 9.6 before EOL. 42Validate the upgrade to PostgreSQL 10.16. 44Upgrade to PostgreSQL 11.12. 45Helm 2 to Helm 3 migration on AWS environments for Cloudera Data Warehouse.45Migrate AWS environments from Helm 2 to Helm 3. 46Upgrading Cloudera Data Warehouse on AWS environments to Amazon EKS Kubernetes clusterupdates. 46Upgrading using your own AMI or reduced permissions. 47Setting the scratch space limit for spilling Impala queries in AWSenvironments. 49Configure Impala Virtual Warehouses on AWS environments to spill toS3.51

Cloudera Data Warehouse Public CloudAWS environments overviewAWS environments overviewLearn about environments on AWS in CDP Public Cloud, which you can use for Database Catalogs and VirtualWarehouse in Cloudera Data Warehouse (CDW).The "environment" concept of CDP is closely related to the virtual private network in your cloud provider account.Registering an environment with Management Console provides CDP with access to your cloud provider accountand identifies resources in your account that CDP services can access, including Cloudera Data Warehouse. A singleenvironment is contained within a single cloud provider region, so all resources deployed by CDP are deployedwithin that region within one specific virtual network. After you have registered an environment with ManagementConsole, you can activate the environment in CDW so you can use it to create Database Catalogs, which enablesCDW to access the associated Data Lake. Then you can create Virtual Warehouses in CDW that use the DatabaseCatalog and its underlying environment.The following diagram shows the components of an AWS environment:For more information about AWS environments in CDP, see the links in the "Related information" section at thebottom of this page.5

Cloudera Data Warehouse Public CloudActivating an AWS environmentActivating an AWS environmentYou need to know how to start the Cloudera Data Warehouse (CDW) service to use the Database Catalog in theCloudera Data Warehouse (CDW) Public Cloud environment.About this taskBefore you can create a Database Catalog to use in a Virtual Warehouse, you must activate an environment that hasbeen registered in Management Console. How you activate the environment determines key capabilities of CDW,such as what data you can access.A Database Catalog can use different Data Lake types, including the following ones: Shared Data Experience (SDX) Data Lake typesCloudera Data Warehouse (CDW) Data Lake typesThe type of Data Lake used by the Database Catalog for your Virtual Warehouse determines whether or not you canaccess data in Data Hubs, and other clusters, from CDW.If you start (activate) the environment from Environments, which you might be inclined to do immediately afterregistering the environment, the Database Catalog gives you access from CDW to an SDX Data Lake. If you navigateto Cloudera Data Warehouse and activate an environment from the CDW service, the Database Catalog gives youaccess from CDW to a CDW Data Lake.Required role: DWAdminBefore you begin Determine which environment (and related data lake) you want to activate for use with a Database Catalog andVirtual Warehouse.Review the requirements checklist for your environment.Procedure1. In the Data Warehouse service, expand the Environments column by clicking More .2. In Environments, click the search icon and locate the environment that you want to activate.3.Click the start icon to activate the environment.4. In Activation Settings, enable environment features: Specify the Deployment Mode:Important: Cloudera recommends that you use the Private Load Balancer, Private Worker Nodesdeployment mode if possible for security. To view the public and private subnets that have been specified for your CDP environment, click AdvancedSettings.Whitelist IP CIDR(s): Add a comma-separated list of IP CIDRs on your network that need access toKubernetes endpoints and services endpoints of the Kubernetes cluster. Kubernetes endpoints are used to6

Cloudera Data Warehouse Public CloudActivating an AWS environmentcontrol the deployment and maintenance of workload components, such as Virtual Warehouses and DatabaseCatalogs. Service endpoints are endpoints of services like Hive, Impala, Data Analytics Studio (DAS), or Hue. Use Overlay Network: Select this option if IP address exhaustion is a concern for your deployment.5. Click Activate.Related InformationSupported deployment modesSetting up private networkingRestricting access to endpointsOverlay networks for AWS environmentsAWS environment requirements checklistTo successfully activate environments that have been registered with CDP on AWS VPCs with Cloudera DataWarehouse service, your AWS VPC must meet these requirements.1. VPC has DNS resolution and DNS hostnames enabledEnsure that your AWS VPC has DNS Resolution and DNS Hostnames enabled. For example, in the VPCDashboard, click Your VPCs in the left navigation menu, and select the VPC you want to use for your DataWarehouse service environment on CDP. View configuration details to make sure DNS resolution and DNShostnames are Enabled. Thw AWS screen looks something like this:2. DHCP option set uses default domain name with one domainWhen you create your VPC to use for the Data Warehouse service, ensure that the DHCP option set attached to theVPC uses only one domain and use the default domain name:domain-name region .compute.internal;7

Cloudera Data Warehouse Public CloudActivating an AWS environmentImportant:If your VPC is in the us-east-1 region [U.S. East (N. Virginia)], the default domain name is ec2.internal:You can verify the setting in the VPC Dashboard of the AWS Console. Click the DHCP options set ID of the defaultDHCP options set (always named "-" by AWS) to view details, including the associated domain:A details page appears:8

Cloudera Data Warehouse Public CloudActivating an AWS environment3. DHCP option set uses AmazonProvidedDNSWhen you create the VPC for the Data Warehouse service, AWS automatically creates a set of DHCP options andassociates them with the VPC. This set of options specifies the Amazon DNS Server as the default domain nameserver:domain-name-server AmazonProvidedDNS;Use this setting for VPCs for the Data Warehouse service shown in the AWS Console VPC Dashboard above.4. Ensure the correct subnets in VPC are specifiedWhen you activate an environment for the Data Warehouse service, ensure that the subnets are correct. If there aremore than three private subnets in the VPC only the top three are selected. However, they may not be the subnets youintend to use for the Data Warehouse service.5. Ensure private subnets have outbound internet connectivityYour private subnets must have outbound internet connectivity. Check the route tables of private subnets to verifythe internet routing. Worker nodes must be able to download Docker images for Kubernetes, billing and meteringinformation, and to perform API server registration.6. Ensure the Amazon Security Token Service (STS) is activatedTo successfully activate an environment in the Data Warehouse service, you must ensure the Amazon STS isactivated in your AWS VPC:1.2.3.4.In the AWS Management Console home page, select IAM under Security, Identity, & Compliance.In the Identity and Access Management (IAM) dashboard, select Account settings in the left navigation menu.On the Account settings page, scroll down to the section for Security Token Service (STS).In the Endpoints section, locate the region in which your environment is located and make sure that the STSservice is activated.Related InformationDHCP Options Sets in the Amazon documentationManaging AWS STS in an AWS Region in the Amazon documentationActivating environmentsStandard required IAM permissions for activating AWS environmentsReview the list of IAM permissions required for activating Cloudera Data Warehouse (CDW) environments whereCDW automatically creates and tags all of the resources in your AWS account for you.The following list of permissions are required in your IAM policy for standard deployments of CDW where all AWScloud resources are automatically created for you in your AWS account:Table 1: Standard IAM policy permissions required for environment activation in CDWAWS service"Allow" actionsCertificate Manager Formation (cloudformation)CreateStack9

Cloudera Data Warehouse Public CloudActivating an AWS environmentAWS service"Allow" tch (logs)PutRetentionPolicyDynamoDB (dynamodb)DeleteTableEC2 GroupIngressRunInstancesEC2 Auto Scaling lingGroupsDeleteAutoScalingGroup10

Cloudera Data Warehouse Public CloudActivating an AWS environmentAWS service"Allow" EFS temsDescribeMountTargetsEKS igUpdateClusterVersionIAM ceProfileSimulatePrincipalPolicyKMS cribeKey11

Cloudera Data Warehouse Public CloudActivating an AWS environmentAWS service"Allow" RDS DBInstancesDescribeDBSubnetGroupsS3 ryptionConfigurationPutObjectAclPutObject* Needed only when you upgrade your CDW environment.Related ConceptsMinimum set of IAM permissions required for reduced permissions modeRequired tags for CloudFormation stacks created with reduced permissions modeStandard JSON IAM permissions policy templateTo activate an AWS environment for Cloudera Data Warehouse (CDW) and have CDW automatically create all ofthe necessary cloud resources, you can use this sample JSON template when you register an environment in CDP.The following template contains all of the necessary IAM permissions needed to create a credential for registering anenvironment in CDP that you plan to use for CDW. You can use it to create your own IAM policy to upload to theAWS console.{"Version":"2012-10-17","Statement": [{"Sid":"VisualEditor0","Effect":"Allow",12

Cloudera Data Warehouse Public CloudActivating an AWS ",13

Cloudera Data Warehouse Public CloudActivating an AWS ectAcl","s3:PutObject"],"Resource":"*"}]}Reduced permissions mode for AWS environmentsIf you cannot provide the standard set of IAM permissions required by Cloudera Data Warehouse (CDW) forenvironment activation, you can use the reduced permissions mode.About this taskYou can activate an AWS environment for CDW with fewer than half of the standard required IAM permissionson your AWS cross-account IAM role. When activating an AWS environment in CDW, if the system detects thatyour account does not have the standard set of required IAM permissions, or restricted policy, the following dialogappears:14

Cloudera Data Warehouse Public CloudActivating an AWS environmentIf you check the option Check to activate environment with reduced permissions mode, you activate reducedmode (you manually create resource); otherwise, you uncheck the option (automatically resources are created).Procedure1. In Environment Validations, check the option Check to activate environment with reduced permissions mode.If you do not want to activate the environment in reduced permissions mode, uncheck the option, and clickActivate. Skip the rest of the steps in this procedure. CDW automatically creates the cloud resources in your AWSaccount for you.2. Manually create the cloud resources in your AWS account and tag them appropriately, as described later.CDW pre-populates the required CloudFormation template for you within the AWS console, and you perform themanual steps to create the stack.3. When you are finished using the stack, manually delete it in the AWS console.Related ConceptsMinimum set of IAM permissions required for reduced permissions modeRelated InformationStandard required IAM permissions for activating AWS environmentsMinimum set of IAM permissions required for reduced permissions modeReview a list of the minimum IAM permissions required to activate AWS environments for Cloudera DataWarehouse (CDW) in reduced permissions mode.The following is a list of the minimum permissions that are required for your IAM policy to activate environmentsfor CDW in reduced permissions mode. In this mode you must manually create your CloudFormation stack froma template that CDW pre-populates in the AWS console for you. When you are finished using the stack, you mustmanually delete its resources in your AWS account.Table 2: Minimum set of IAM policy permissions required for environment activation in CDW inreduced permissions modeAWS service"Allow" actionsCertificate Manager tion pdateStackCloudWatch (logs)PutRetentionPolicyEC2 hcpOptionsDescribeKeyPairsDescribeRouteTables15

Cloudera Data Warehouse Public CloudActivating an AWS environmentAWS service"Allow" VpcsEC2 Auto Scaling ssesEKS eClusterConfigUpdateClusterVersionIAM licyS3 clPutObjectRelated InformationStandard required IAM permissions for activating AWS environmentsReduced permissions mode JSON IAM permissions policy templateTo activate an AWS environment for Cloudera Data Warehouse (CDW) using reduced permissions mode, you canuse this sample JSON template when you register an environment in CDP.In this mode you must manually create your CloudFormation stack from a template that CDW pre-populates in theAWS console for you. When you are finished using the stack, you must manually delete its resources in your AWSaccount.To use this JSON policy to create your cross-account IAM role for CDP, see the procedure "Create a cross-accountIAM role" that is linked to at the bottom of this page. The following JSON policy can be used in Step 6 of thatprocedure:{"Version": "2012-10-17","Statement": [{"Sid": "VisualEditor0","Effect": "Allow","Action": "iam:SimulatePrincipalPolicy","Resource": "arn:aws:iam:: aws account id :role/*"},{"Sid": "VisualEditor1","Effect": "Allow","Action": c2:DescribeDhcpOptions",16

Cloudera Data Warehouse Public CloudActivating an AWS y","iam:PutRolePolicy"],"Resource": "*"}]}Related InformationCreate a cross-account IAM roleRequired tags for CloudFormation stacks created with reduced permissions modeThis is a list of tags you must manually apply to AWS CloudFormation stack resources when you use the reducedpermissions mode to activate environments for Cloudera Data Warehouse (CDW).Table 3: Required tags for CloudFormation stacks created with reduced permissions mode in CDWTag keyTag valueclusterNameName of the CDP environment registered with Management Consoledata-warehouse-env-ownerEmail ID of the account of the user who owns the stack and is a userwho has access to the CDP environment. This is the email ID of theemail address that is listed for Email on the Users page of the UserManagement module of Management Console.stackNameName of the CDW CloudFormation stack. In the format:env- environment-identifier -dwx-stackFor example:env-6g8dsf-dwx-stackclusterIdCDW environment ID that is displayed in the environment tile in theCDW UI. For example: env-hmrt2z17

Cloudera Data Warehouse Public CloudActivating an AWS environmentTag keyTag valueenvIdCDP environment ID (CRN [Cloudera Resource Name]). In the format:crn:cdp:environments: region : account-ID :environment: identifier For 4-48f7-a243-97348939becdactorCrnCRN (Cloudera Resource Name) from the user's profile in the UserManagement module of CDP. For 15306ae143accountIdTenant ID. In the above CRN example, the tenant ID is the GUIDlisted immediately after the AWS region: 9d74eee4-1cad-45d7b645-7ccf9edbb73dRelated InformationStandard required IAM permissions for activating AWS environmentsActivating AWS environments in reduced permissions modeLearn how to activate environments on AWS using the reduced permissions mode in Cloudera Data Warehouse(CDW). In this mode, you must manually create and delete the CloudFormation stack in the AWS Console.About this taskRequired role: EnvironmentAdmin or PowerUserWhen you activate an AWS environment for CDW, if you do not have the standard required IAM permissions, thefollowing message displays in the environment tile of the CDW UI, which provides a link to the AWS Console:Click the link and perform the following listed steps to navigateto the AWS Console and create the CloudFormation stack.Before you begin Because you need to use the AWS Console to manually create your CloudFormation stack for CDW environmentactivation, in another browser tab, log into your AWS account before you begin. Make sure that the IAM entitylogged in has all the permissions listed in "Standard required IAM permissions," which is linked to at the bottomof this page.You must also have the AWS CLI and the kubectl CLI configured and available on your system to apply thekubeconfig that CDW provides in Step 10 below.Important: Make sure you have the t

(CDW) service clusters running on AWS environments. When you create a Virtual Warehouse in the CDW service, a cluster is created in your AWS account. This cluster has two buckets. One bucket is used for managed data and the other is used for external data. The naming convention for these two S3 buckets