注册
记录DEM主机界面加载异常问题分析
技术分享/ 文章详情 /

记录DEM主机界面加载异常问题分析

卖女孩的小废柴 2023/06/25 1195 4 0

一、背景

在现场将集群加入到Dem中时,发现Dem的主机界面加载异常页面空白,随后进行问题排查。
image.png

二、问题分析记录

  1. 刚开始怀疑是取到数据后,页面加载的有问题,多次刷新无效。F12查看请求发现是点击页面后发送的请求没有返回数据导致异常。

image.png

  1. 经排查DEM源码,发现是慢sql导致,随后抓取DEM数据库的慢sql.发现是这条sql导致的。

分析Dem源码的具体分析请看下一小节,此处略过

select datediff(ss,last_recv_time,sysdate) ss,dbms_lob.substr(sf_get_session_sql(sess_id)),sess_id,substr(clnt_ip,8,13) from v$sessions where state='ACTIVE' order by 1 desc;

image.png

  1. 随后对其该sql进行分析

image.png
随后对sql进行逐步分析,发现DMA_MAINFRAME_STAT表的数据竟然有三千多万条。
image.png

  1. 查看DMA_MAINFRAME_STAT表定义,发现记录的是主机监控的信息。这个表从DEM运行开始就一直在记录,其中DEM已经运行了三年半了,并且DEM没有定时清理机制,所以查看只会越来越慢。
CREATE TABLE "DEM"."DMA_MAINFRAME_STAT" ( "MF_ID" VARCHAR(100), "TS" BIGINT, "CPU_USER_P" DECIMAL(22,2), "CPU_SYS_P" DECIMAL(22,2), "CPU_WAIT_P" DECIMAL(22,2), "CPU_USED_P" DECIMAL(22,2), "MEM_TOTAL" BIGINT, "MEM_USED" BIGINT, "SWAP_TOTAL" BIGINT, "SWAP_USED" BIGINT, "SWAP_PAGE_IN" BIGINT, "SWAP_PAGE_OUT" BIGINT, "DISK_DATA" TEXT, "NET_DATA" TEXT, CLUSTER PRIMARY KEY ("MF_ID", "TS")); COMMENT ON TABLE "DEM"."DMA_MAINFRAME_STAT" IS '主机监控信息'; COMMENT ON COLUMN "DEM"."DMA_MAINFRAME_STAT"."CPU_SYS_P" IS '系统进程cpu使用率,单位%'; COMMENT ON COLUMN "DEM"."DMA_MAINFRAME_STAT"."CPU_USED_P" IS '总cpu使用率,单位%'; COMMENT ON COLUMN "DEM"."DMA_MAINFRAME_STAT"."CPU_USER_P" IS '用户进程cpu使用率,单位%'; COMMENT ON COLUMN "DEM"."DMA_MAINFRAME_STAT"."CPU_WAIT_P" IS 'cpu等待,单位%'; COMMENT ON COLUMN "DEM"."DMA_MAINFRAME_STAT"."MEM_TOTAL" IS '总内存大小,单位B'; COMMENT ON COLUMN "DEM"."DMA_MAINFRAME_STAT"."MEM_USED" IS '已使用的内存大小,单位B'; COMMENT ON COLUMN "DEM"."DMA_MAINFRAME_STAT"."MF_ID" IS '主机ID'; COMMENT ON COLUMN "DEM"."DMA_MAINFRAME_STAT"."SWAP_PAGE_IN" IS '交换区读取页数'; COMMENT ON COLUMN "DEM"."DMA_MAINFRAME_STAT"."SWAP_PAGE_OUT" IS '交换区写入页数'; COMMENT ON COLUMN "DEM"."DMA_MAINFRAME_STAT"."SWAP_TOTAL" IS '总交换区大小,单位B'; COMMENT ON COLUMN "DEM"."DMA_MAINFRAME_STAT"."SWAP_USED" IS '已使用的交换区大小,单位B'; COMMENT ON COLUMN "DEM"."DMA_MAINFRAME_STAT"."TS" IS '信息收集时间'; COMMENT ON COLUMN "DEM"."DMA_MAINFRAME_STAT"."DISK_DATA" IS '磁盘信息'; COMMENT ON COLUMN "DEM"."DMA_MAINFRAME_STAT"."NET_DATA" IS '网络信息';
  1. 随后联系用户根据DMA_MAINFRAME_STAT表的TS字段只保留了半年内的数据并且重新收集DEM的统计信息。随后恢复界面正常.
SQL> delete from DMA_MAINFRAME_STAT where TS < 1600000000; SQL> dbms_stats.GATHER_SCHEMA_stats ('DEM');

三、DEM调用流程分析

通过分析Dem源码发现请求交互用的是google的GWT PRC开发的。PRC的介绍大致如下
GWT RPC是GWT中用于在客户端和服务器之间进行远程过程调用(RPC)的机制。通过GWT RPC,开发人员可以使用普通的Java接口来定义服务,并在客户端和服务器之间传输序列化的Java对象。
GWT RPC提供了透明的数据传输、类型安全的参数验证和异步回调机制。客户端代码只需要创建一个异步服务代理对象即可调用服务端方法,而无需关心底层的网络协议和数据传输细节,这使得客户端开发变得更加简单和高效。
GWT RPC的工作流程大致如下:

  1. 客户端通过GWT.create()方法创建一个异步服务代理对象,该代理对象实现了服务接口以及对应的异步接口。
  2. 客户端通过异步代理对象调用服务端的方法,并将需要传递的参数作为方法参数。
  3. GWT RPC将方法调用及其参数封装成一个HTTP POST请求,并发送给服务端。
  4. 服务端接收到请求后解析出方法名和参数等信息,调用对应的服务实现类的方法,并将结果返回给客户端。
  5. 客户端在异步回调方法中获取服务端返回的数据,并进行处理。

点击主机界面

点击主机界面后会执行MainframesPanel类的reloadData方法。查询请求方法MainframesPanel.this.mfService.getMainframes()

package com.dameng.dem.client.panel.dma; public class MainframesPanel extends BaseDashboardPanel { private void createToolStrip(Layout layout) { ToolStrip toolStrip = new ToolStrip(); toolStrip.setWidth100(); toolStrip.setHeight(30); layout.addMember((Canvas)toolStrip); this.refreshButton = new CommonButton(1); this.refreshButton.setTitle("); this.refreshButton.setIcon(Images.REFRESH); this.refreshButton.addClickHandler(new ClickHandler() { public void onClick(ClickEvent event) { MainframesPanel.this.refresh(); } }); } protected void reloadData(long pageSize, long curPageNumber) { String searchKey = CStringUtil.trimToEmpty(MainframesPanel.this.ipTextItem.getValueAsString()); List<MainframeFilter> filters = new ArrayList<>(); if (CStringUtil.isNotEmpty(searchKey)) { MainframeFilter filter = new MainframeFilter(); filter.setFilterType(1); filter.setFilterValue(searchKey); filters.add(filter); } MainframeFilter subscriptionFilter = new MainframeFilter(); subscriptionFilter.setFilterType(6); subscriptionFilter.setFilterValue(String.valueOf(DEM.currentUser.getId())); filters.add(subscriptionFilter); MainframesPanel.this.mfService.getMainframes(pageSize, curPageNumber, filters, Mainframe.PROP_ALL.intValue(), new AsyncCallback<List<Mainframe>>() { public void onSuccess(List<Mainframe> result) { MainframesPanel.null.access$0(MainframesPanel.null.this).refreshSimpleTime(result); MainframesPanel.null.this.afterReloadData(result.toArray()); (MainframesPanel.null.access$0(MainframesPanel.null.this)).optMenu.hideContextMenu(); MainframesPanel.null.access$0(MainframesPanel.null.this).refreshSummary(result); MainframesPanel.null.access$0(MainframesPanel.null.this).refreshAlerts((MainframesPanel.null.access$0(MainframesPanel.null.this)).alertLayout, (List)result); } public void onFailure(Throwable caught) { SimpleMessageDialog.openError(", caught); } }); } }

web.xml配置

通过分析Dem源码发现请求交互用的是google的GWT PRC开发的,根据规则会预先在web.xml中配置类。这里就可以看到url会跳转到com.dameng.dem.server.service.dma.MainframeService上。
image.png

IMainframeService类

最后请求会跳转到**getMainframes()**方法上
image.png

package com.dameng.dem.client.service.dma; @RemoteServiceRelativePath("dma_mainframe") public interface IMainframeService extends IRemoteService { @ServiceAnnotation(module = DemModule.RESOURCE, operation = ServiceAnnotation.Operation.GET, session = ServiceAnnotation.Session.DEM_SESSION) List<Mainframe> getMainframes(long paramLong1, long paramLong2, List<MainframeFilter> paramList, int paramInt); }

getMainframes方法定义

getMainframes方法具体的sql拼接逻辑如图
image.png

package com.dameng.dem.server.dao.dma.impl; public List<Mainframe> getMainframes(long pageSize, long pageNum, List<MainframeFilter> filters, final int fillProps) { MiscUtil.execute(() -> call("call dem.clear_dem_plan_cache();")); ResultSetProcessor<List<Mainframe>> rsProcessor = new ResultSetProcessor<List<Mainframe>>() { public List<Mainframe> process(ResultSet rs) throws SQLException { List<Mainframe> list = new ArrayList<>(); int index = 0; while (rs.next()) { index = 0; Mainframe mf = new Mainframe(); mf.setId(rs.getString(++index)); mf.setName(rs.getString(++index)); mf.setHasName(rs.getBoolean(++index)); mf.setGmtCreate(rs.getString(++index)); mf.setGmtModify(rs.getString(++index)); MainframeInfo mfInfo = new MainframeInfo(); mfInfo.setMfId(mf.getId()); mfInfo.setOuterIp(rs.getString(++index)); mfInfo.setInnerIp(rs.getString(++index)); mfInfo.setIpList(JsonUtil.arrayFromJson(rs.getString(++index), MainframeIp.class)); mfInfo.setNetCfg(rs.getBoolean(++index)); mfInfo.setAgentVersion(rs.getString(++index)); mfInfo.setAgentServicePort(rs.getInt(++index)); mfInfo.setAgentHome(rs.getString(++index)); mfInfo.setHostName(rs.getString(++index)); mfInfo.setOsName(rs.getString(++index)); mfInfo.setOsVersion(rs.getString(++index)); mfInfo.setOsVendor(rs.getString(++index)); mfInfo.setOsArch(rs.getString(++index)); mfInfo.setOsDataModel(rs.getString(++index)); mfInfo.setMemSize(rs.getLong(++index)); mfInfo.setCpuCount(rs.getInt(++index)); mfInfo.setCpuDesc(rs.getString(++index)); mfInfo.setTs(rs.getLong(++index)); mf.setMfInfo(mfInfo); if ((fillProps & Mainframe.PROP_ALERT_INFO.intValue()) != 0) { ResourceAlert alertInfo = new ResourceAlert(mf.getName()); alertInfo.fillAlertCounts(rs.getString(++index)); alertInfo.fillTopAlerts(rs.getString(++index)); mf.alertInfo = alertInfo; } if (Math.abs(System.currentTimeMillis() - mfInfo.getTs()) < (((Integer)DEM.config.mf_invalid_time.value).intValue() * 1000)) { mf.setAliveStatus("); } else { mf.setAliveStatus("); } if ((fillProps & Mainframe.PROP_SUBSCRIPTION.intValue()) != 0) mf.setHasSubscription((rs.getInt(++index) == 1)); if ((fillProps & Mainframe.PROP_MFSTAT.intValue()) != 0) { MainframeStat mfStat = new MainframeStat(); mfStat.setTs(rs.getLong(++index)); mfStat.setMfID(rs.getString(++index)); mfStat.setCpuUserPercent(rs.getDouble(++index)); mfStat.setCpuSysPercent(rs.getDouble(++index)); mfStat.setCpuWaitPercent(rs.getDouble(++index)); mfStat.setCpuUsedPercent(rs.getDouble(++index)); mfStat.setMemTotal(rs.getLong(++index)); mfStat.setMemUsed(rs.getLong(++index)); mfStat.setSwapTotal(rs.getLong(++index)); mfStat.setSwapUsed(rs.getLong(++index)); mfStat.setSwapPageIn(rs.getLong(++index)); mfStat.setSwapPageOut(rs.getLong(++index)); mfStat.setDiskData((DiskData)JsonUtil.fromJson(rs.getString(++index), DiskData.class)); mfStat.setNetData((NetData)JsonUtil.fromJson(rs.getString(++index), NetData.class)); mf.setMfStat(mfStat); } list.add(mf); } return list; } }; StringBuilder sql = new StringBuilder(); if (pageSize > 0L && pageNum > 0L) { sql.append("select top ").append(pageSize * (pageNum - 1L)).append(", ").append(pageSize); } else { sql.append("select"); } sql.append(" mf.id, mf.name, mf.has_name, mf.gmt_create, mf.gmt_modify,"); sql.append( " mf.outer_ip,mf.inner_ip,mf.ip_list, mf.net_config_flag,mf.dmagent_version,mf.dmagent_service_port,mf.dmagent_home,mf.host_name,"); sql.append(" mf.os_name, mf.os_version, mf.os_vendor, mf.os_arch, mf.os_data_model,"); sql.append(" mf.mem_size,"); sql.append(" mf.cpu_count, mf.cpu_desc,"); sql.append(" mf.ts"); if ((fillProps & Mainframe.PROP_ALERT_INFO.intValue()) != 0) { sql.append(String.format( ", (select listagg(level || '%s' || count(1), '%s') within group(order by level) from dem.dma_alert_his where res_id = mf.id and flag = 1 and valid = 1 group by level) alert_counts", new Object[] { "&!&", "&-&" })); sql.append(String.format( ", (select listagg(top_alerts, '%s') within group(order by top_alerts) from (select gmt_create || '%s' || level || '%s' || message top_alerts from dem.dma_alert_his where res_id = mf.id and flag = 1 and valid = 1 order by gmt_create desc limit %s)) top_alerts", new Object[] { "&-&", "&!&", "&!&", Integer.valueOf(3) })); } if ((fillProps & Mainframe.PROP_SUBSCRIPTION.intValue()) != 0) sql.append(",case when mf_subscription.resource_id is not null then 1 else 0 end"); if ((fillProps & Mainframe.PROP_MFSTAT.intValue()) != 0) { sql.append(" ,mf_stat.ts,"); sql.append(" mf_stat.mf_id,"); sql.append(" mf_stat.cpu_user_p, mf_stat.cpu_sys_p, mf_stat.cpu_wait_p, mf_stat.cpu_used_p,"); sql.append(" mf_stat.mem_total, mf_stat.mem_used,"); sql.append( " mf_stat.swap_total, mf_stat.swap_used, mf_stat.swap_page_in, mf_stat.swap_page_out,"); sql.append(" mf_stat.disk_data, mf_stat.net_data"); } sql.append(" from dem.dma_valid_mf_view mf"); if ((fillProps & Mainframe.PROP_SUBSCRIPTION.intValue()) != 0) { long userId = 0L; for (MainframeFilter filter : filters) { if (filter.getFilterType() == 6) userId = Long.valueOf(filter.getFilterValue()).longValue(); } sql.append( " left join (select resource_id, user_id from dem.dma_subscription_resource where flag = 1 and resource_type = 2 and user_id = " + userId + ")mf_subscription on mf.id = mf_subscription.resource_id "); } if ((fillProps & Mainframe.PROP_MFSTAT.intValue()) != 0) { sql.append(" left join"); sql.append("("); sql.append( "select /*+adaptive_npln_flag(0) enable_rq_to_nonref_spl(1)*/ * from dem.dma_mainframe_stat where (mf_id, ts) in "); sql.append( "(select id, (select max(ts) from dem.dma_mainframe_stat where mf_id = id) from dem.dma_valid_mf_view)"); sql.append(")mf_stat "); sql.append("on mf_stat.mf_id = mf.id"); } sql.append(" where true "); List<Object> paramList = new ArrayList(); if (filters != null && filters.size() > 0) for (MainframeFilter filter : filters) { if (filter.getFilterType() == 1) { sql.append(" and (mf.outer_ip like ? or mf.host_name like ? or mf.ip_list like ?)"); paramList.add("%" + StringUtil.trimToEmpty(filter.getFilterValue()) + "%"); paramList.add("%" + StringUtil.trimToEmpty(filter.getFilterValue()) + "%"); paramList.add("%" + StringUtil.trimToEmpty(filter.getFilterValue()) + "%"); } else if (filter.getFilterType() == 8) { String[] valueArr = filter.getFilterValue().split(","); if (valueArr.length > 0) { sql.append(" and mf.outer_ip in ( "); for (int i = 0; i < valueArr.length; i++) { sql.append(" ? "); paramList.add(valueArr[i]); if (i != valueArr.length - 1) sql.append(" , "); } sql.append(" )"); } } if (filter.getFilterType() == 5) { sql.append(" and mf.outer_ip = ?"); paramList.add(filter.getFilterValue()); continue; } if (filter.getFilterType() == 2) { sql.append(" and mf.id = ?"); paramList.add(filter.getFilterValue()); continue; } if (filter.getFilterType() == 3) { sql.append( " and mf.id in (select res_id from DEM.DMA_VALID_ALERT_RES_VIEW where alert_id = ? and res_type in("); sql.append(StringUtil.join((Object[])Resource.MF_TYPES, ",", "'")); sql.append("))"); paramList.add(filter.getFilterValue()); continue; } if (filter.getFilterType() == 4) { sql.append(" and mf.NET_CONFIG_FLAG=? "); paramList.add(filter.getFilterValue()); continue; } if (filter.getFilterType() == 7) { String[] valueArr = filter.getFilterValue().split(","); if (valueArr.length > 0) { sql.append(" and mf.id in ( "); for (int i = 0; i < valueArr.length; i++) { sql.append(" ? "); paramList.add(valueArr[i]); if (i != valueArr.length - 1) sql.append(" , "); } sql.append(" )"); } continue; } if (filter.getFilterType() == 9) { sql.append( " and mf.id in (select mf_id from DEM.DMA_VALID_UD_SCRIPT_MF_VIEW where ud_script_id = ?)"); paramList.add(filter.getFilterValue()); } } sql.append(" order by mf.gmt_create;"); return (List<Mainframe>)select(sql.toString(), paramList, rsProcessor); }

四、总结

  1. 这次的DEM加载异常主要是查询接口的SQL语句缓慢导致界面加载异常
  2. 接口SQL语句的不合理写法导致的查询缓慢同时SQL语句是DEM中内置语句难以修改,所以清理异常表的数据是比较合理方案
  3. DEM系统运行时间已有三年之久,因DEM没有定时清理表数据的功能,可见需要定期人为的进行维护清除不需要的数据以保障系统的正常运行
评论
后发表回复

作者

文章

阅读量

获赞

扫一扫
联系客服