监控和日志

本节将介绍如何从Cloudwatch Logs以及S3查看应用日志,以及从Spark History Server查看日志。

配置CloudWatch Logs和S3日志路径

第一节创建的role没有创建log group的权限,所以先创建一个:

image-20240620102632950

命名为/emr-containers/jobs:

image-20240620102715461

在运行任务(start-job-run)的时候,可以在参数里指定Cloudwatch logs和S3 Logs路径。将下面的S3_BUCKET路径替换:

aws emr-containers start-job-run \
--virtual-cluster-id ${EMR_EKS_CLUSTER_ID} \
--name spark-pi-logging \
--execution-role-arn ${EMR_EKS_EXECUTION_ARN} \
--release-label emr-6.2.0-latest \
--job-driver '{
    "sparkSubmitJobDriver": {
        "entryPoint": "s3://aws-data-analytics-workshops/emr-eks-workshop/scripts/pi.py",
        "sparkSubmitParameters": "--conf spark.executor.instances=2 --conf spark.executor.memory=2G --conf spark.executor.cores=2 --conf spark.driver.cores=1"
        }
    }' \
--configuration-overrides '{
    "applicationConfiguration": [
      {
        "classification": "spark-defaults",
        "properties": {
          "spark.driver.memory":"2G"
         }
      }
    ], 
    "monitoringConfiguration": {
      "cloudWatchMonitoringConfiguration": {
        "logGroupName": "/emr-containers/jobs", 
        "logStreamNamePrefix": "emr-eks-workshop"
      }, 
      "s3MonitoringConfiguration": {
        "logUri": "s3://S3_BUCKET/logs/"
      }
    }
}'

查看S3日志

转到指定的S3存储桶检查日志, 日志数据将发送到以下Amazon S3位置。

  • controller日志 - /logUri/virtual-cluster-id/jobs/job-id/containers/pod-name/(stderr.gz/stdout.gz)

  • driver日志 - /logUri/virtual-cluster-id/jobs/job-id/containers/spark-application-id/spark-job-id-driver/(stderr.gz/stdout.gz)

  • executor日志 - /logUri/virtual-cluster-id/jobs/job-id/containers/spark-application-id/executor-pod-name/(stderr.gz/stdout.gz)

image-20240620095105909

下载driver日志,里面显示了PI的计算值:

image-20240620095211565

image-20240620095235715

查看cloudwatch日志

在StartJobRun API中,log_group_name是CloudWatch的日志组名称,log_stream_prefix是CloudWatch的日志流名称前缀。可以在AWS管理控制台中查看和搜索这些日志。

  • Controller日志 - logGroup/logStreamPrefix/virtual-cluster-id/jobs/job-id/containers/pod-name/(stderr/stdout)

  • driver日志 - logGroup/logStreamPrefix/virtual-cluster-id/jobs/job-id/containers/spark-application-id/spark-job-id-driver/(stderr/stdout)

  • executor日志 - logGroup/logStreamPrefix/virtual-cluster-id/jobs/job-id/containers/spark-application-id/executor-pod-name/(stderr/stdout)

image-20240620104211256

结果如下:

image-20240620104303322

Spark History Server查看日志

进入EMR on EKS的控制台,从jobs页面进入Spark UI:

image-20240620104537098

进入app ID:

image-20240620104716279

在 Spark History Server 上,查看作业的各种指标和详细信息:

image-20240620104803795