从 EMR 控制台提交一个新作业。单击“Clone job”:
将结果输出路径更新为 [“«S3_Bucket»/wordcount_output_console_logs/"]
在Applicattion logs
部分,开启S3日志及Cloudwatch日志:
最后点击提交任务。
如果使用CLI提交job,则设置S3日志的参数如下:
aws emr-serverless start-job-run --application-id ${APPLICATION_ID} --execution-role-arn ${JOB_ROLE_ARN} --name "Spark-WordCount-CLI" --job-driver '{
"sparkSubmit": {
"entryPoint": "s3://us-east-1.elasticmapreduce/emr-containers/samples/wordcount/scripts/wordcount.py",
"entryPointArguments": [
"'"$S3_BUCKET"'/wordcount_output/"
],
"sparkSubmitParameters": "--conf spark.executor.cores=1 --conf spark.executor.memory=4g --conf spark.driver.cores=1 --conf spark.driver.memory=4g --conf spark.executor.instances=1 --conf spark.hadoop.hive.metastore.client.factory.class=com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"
}
}'\
--configuration-overrides '{
"monitoringConfiguration": {
"s3MonitoringConfiguration": {
"logUri": "'"$S3_BUCKET"'/logs/"
}
}
}'
到指定的S3存储桶来检查日志。日志数据将发送到以下S3 位置:
Driver Logs | /logs/applications/«application-id»/jobs/«job-id»/SPARK_DRIVER/(stderr.gz/stdout.gz) |
Executor Logs | /logs/applications/«application-id»/jobs/«job-id»/SPARK_EXECUTOR/(stderr.gz/stdout.gz) |
Spark Event Logs | /logs/applications/«application-id»/jobs/«job-id»/sparklogs/ |
Job Metadata | /logs/applications/«application-id»/jobs/«job-id»/job-metadata.log |
进入SPARK_DRIVER
目录,可以看到 stdout.gz
。运行S3 select查询:
进入CloudWatch Logs的aws-serverless
目录:
也可以看到spark driver的输出:
在 EMR serverless控制台上,单击其中一项作业,然后单击View application UIs
:
可以看到 Spark UI:
在 Spark UI 上可以导航并查看作业的各种指标和详细信息。
一旦作业完成或失败, Spark UI(Running jobs)
被禁用,而 Spark History Server(Completed jobs)
将被启用。