Use printf to format list that is uneven – StackOverflow
Name Gender Mark1 Mark2 Mark3 AA M 20 15 35 BB F 22 17 44 CC F 19 14 25 DD M 15 20 42 EE F 18 22 30 FF M 0 20 45 ↓ Male Female 20 15 35 22 17 44 15 20 42 19 14 25 0 20 45 18 22 30
という処理を考える.
import io import pandas as pd strings = """Name Gender Mark1 Mark2 Mark3 AA M 20 15 35 BB F 22 17 44 CC F 19 14 25 DD M 15 20 42 EE F 18 22 30 FF M 0 20 45""" vals = 'Mark1 Mark2 Mark3'.split() cols = ['Male', 'Female'] d = dict(zip(['M', 'F'], cols)) pt = ( pd.read_csv(io.StringIO(strings), sep='\s+') .assign(ind=lambda df: df.groupby('Gender').cumcount()) .pivot('ind', 'Gender', vals) .swaplevel(1, 0, 1) .sort_index(1) .rename(columns=lambda x: d.get(x, '')) .rename_axis(index=lambda x: None, columns=lambda x: None)[cols] ) pt
Male Female 0 20 15 35 22 17 44 1 15 20 42 19 14 25 2 0 20 45 18 22 30
形に拘ると少し面倒臭いというか何というか.
Awkの方がシンプルで,上のイメージで云えば,カムカウントでグルーピングされた値(2つの連想配列)から値をプリントするだけ.
%%bash time { echo """Name Gender Mark1 Mark2 Mark3 AA M 20 15 35 BB F 22 17 44 CC F 19 14 25 DD M 15 20 42 EE F 18 22 30 FF M 0 20 45""" | awk ' $2=="M"{m[i++]=$3 OFS $4 OFS $5;next} $2=="F"{f[j++]=$3 OFS $4 OFS $5;next} END{print "Male \t Female";for(k=0;k<=i;k++)print m[k] OFS f[k]}' }
Male Female 20 15 35 22 17 44 15 20 42 19 14 25 0 20 45 18 22 30 real 0m0.004s user 0m0.001s sys 0m0.003s
単にフォーマットだけであればAWkは便利だなと.