ピボットテーブル

Converting Multiple CSV Rows to Individual Columns – StackOverflow

やりながら書いていて,途中で効率悪い部分に気付いたけど,
やり直すのも面倒なのでそのまま.

%%writefile file
# hostname,interval,timestamp,CPU,%user,%nice,%system,%iowait,%steal,%idle
hostname,600,2018-07-24 00:10:01 UTC,-1,5.19,0,1.52,0.09,0.13,93.07
hostname,600,2018-07-24 00:10:01 UTC,0,5.37,0,1.58,0.15,0.15,92.76
hostname,600,2018-07-24 00:10:01 UTC,1,8.36,0,1.75,0.08,0.1,89.7
hostname,600,2018-07-24 00:10:01 UTC,2,3.87,0,1.38,0.07,0.12,94.55
hostname,600,2018-07-24 00:10:01 UTC,3,3.16,0,1.36,0.05,0.14,95.29
hostname,600,2018-07-24 00:20:01 UTC,-1,5.13,0,1.52,0.08,0.13,93.15
hostname,600,2018-07-24 00:20:01 UTC,0,4.38,0,1.54,0.13,0.15,93.8
hostname,600,2018-07-24 00:20:01 UTC,1,5.23,0,1.49,0.07,0.11,93.09
hostname,600,2018-07-24 00:20:01 UTC,2,5.26,0,1.53,0.07,0.12,93.03
hostname,600,2018-07-24 00:20:01 UTC,3,5.64,0,1.52,0.04,0.12,92.68
import pandas as pd


df = pd.read_csv('file')
df
    # hostname  interval    timestamp   CPU %user   %nice   %system %iowait %steal  %idle
0   hostname    600 2018-07-24 00:10:01 UTC -1  5.19    0   1.52    0.09    0.13    93.07
1   hostname    600 2018-07-24 00:10:01 UTC 0   5.37    0   1.58    0.15    0.15    92.76
2   hostname    600 2018-07-24 00:10:01 UTC 1   8.36    0   1.75    0.08    0.10    89.70
3   hostname    600 2018-07-24 00:10:01 UTC 2   3.87    0   1.38    0.07    0.12    94.55
4   hostname    600 2018-07-24 00:10:01 UTC 3   3.16    0   1.36    0.05    0.14    95.29
5   hostname    600 2018-07-24 00:20:01 UTC -1  5.13    0   1.52    0.08    0.13    93.15
6   hostname    600 2018-07-24 00:20:01 UTC 0   4.38    0   1.54    0.13    0.15    93.80
7   hostname    600 2018-07-24 00:20:01 UTC 1   5.23    0   1.49    0.07    0.11    93.09
8   hostname    600 2018-07-24 00:20:01 UTC 2   5.26    0   1.53    0.07    0.12    93.03
9   hostname    600 2018-07-24 00:20:01 UTC 3   5.64    0   1.52    0.04    0.12    92.68

というデータについて,キーに沿って行方向のデータを列方向へ纏める.
つまり,ピボットテーブル.

lst = ['# hostname', 'interval', 'timestamp', 'CPU']
col = [s for s in df.columns if s not in lst]
pt = df.pivot_table(index=lst, values=col, aggfunc='sum')
pt

FireShot Capture 823 - 20180728.ipynb - Colaboratory_ - https___colab.research.google.com_

期待される出力と同じ形式にするため,アンスタック.

res = pt.unstack(level=3)
res

FireShot Capture 824 - 20180728.ipynb - Colaboratory_ - https___colab.research.google.com_

カラム名を整える.

res.columns = [f'{x[0]} {x[-1]}'  if x[-1]!=-1 else f'{x[0]} ALL' for x in res.columns.values.tolist()]
res

FireShot Capture 825 - 20180728.ipynb - Colaboratory_ - https___colab.research.google.com_

順番を合わせる.

res.reindex(sorted(res.columns, key=lambda x: [x.split()[-1] if x.split()[-1] != 'ALL' else '-1']), axis=1)

FireShot Capture 826 - 20180728.ipynb - Colaboratory_ - https___colab.research.google.com_

纏めると,

import pandas as pd


df = pd.read_csv('file')

lst = ['# hostname', 'interval', 'timestamp', 'CPU']
col = [s for s in df.columns if s not in lst]
pt = df.pivot_table(index=lst, values=col, aggfunc='sum')

res = pt.unstack(level=3)
res.columns = [f'{x[0]} {x[-1]}' for x in res.columns.values.tolist()]
res = res.reindex(sorted(res.columns, key=lambda x: x.split()[-1]), axis=1)
res.columns = res.columns.str.replace('-1', 'ALL')

res

FireShot Capture 827 - 20180728.ipynb - Colaboratory_ - https___colab.research.google.com_

カテゴリー: 未分類 パーマリンク