Spark任务的监控

为Spark Job添加S3和CloudWatch日志

从 EMR 控制台提交一个新作业。单击“Clone job”:

image-20231221095136542

将结果输出路径更新为 [“«S3_Bucket»/wordcount_output_console_logs/"]

image-20231221095321166

Applicattion logs部分,开启S3日志及Cloudwatch日志:

image-20231221095408786

最后点击提交任务。

如果使用CLI提交job,则设置S3日志的参数如下:

aws emr-serverless start-job-run --application-id ${APPLICATION_ID} --execution-role-arn ${JOB_ROLE_ARN} --name "Spark-WordCount-CLI" --job-driver '{
        "sparkSubmit": {
            "entryPoint": "s3://us-east-1.elasticmapreduce/emr-containers/samples/wordcount/scripts/wordcount.py",
            "entryPointArguments": [
          "'"$S3_BUCKET"'/wordcount_output/"
        ],
            "sparkSubmitParameters": "--conf spark.executor.cores=1 --conf spark.executor.memory=4g --conf spark.driver.cores=1 --conf spark.driver.memory=4g --conf spark.executor.instances=1 --conf spark.hadoop.hive.metastore.client.factory.class=com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"
        }
    }'\
    --configuration-overrides '{
    "monitoringConfiguration": {
        "s3MonitoringConfiguration": {
            "logUri": "'"$S3_BUCKET"'/logs/"
        }
    }
}'

检查任务日志 - S3

到指定的S3存储桶来检查日志。日志数据将发送到以下S3 位置:

Driver Logs /logs/applications/«application-id»/jobs/«job-id»/SPARK_DRIVER/(stderr.gz/stdout.gz)
Executor Logs /logs/applications/«application-id»/jobs/«job-id»/SPARK_EXECUTOR/(stderr.gz/stdout.gz)
Spark Event Logs /logs/applications/«application-id»/jobs/«job-id»/sparklogs/
Job Metadata /logs/applications/«application-id»/jobs/«job-id»/job-metadata.log

image-20231221100021250

进入SPARK_DRIVER目录,可以看到 stdout.gz。运行S3 select查询:

image-20231221100135015

image-20231221100215840

检查任务日志 - CloudWatch Logs

进入CloudWatch Logs的aws-serverless目录:

image-20231221100421756

也可以看到spark driver的输出:

image-20231221100522005

Spark UI

在 EMR serverless控制台上,单击其中一项作业,然后单击View application UIs:

image-20231221104007073

可以看到 Spark UI:

image-20231221104109089

在 Spark UI 上可以导航并查看作业的各种指标和详细信息。

一旦作业完成或失败, Spark UI(Running jobs)被禁用,而 Spark History Server(Completed jobs)将被启用。