pandas高级处理-合并
如果你的数据由多张表组成,那么有时候需要将不同的内容合并在一起分析
1 pd.concat实现数据合并- pd.concat([data1, data2], axis=1)
- 按照行或列进行合并,axis=0为列索引,axis=1为行索引
比如我们将刚才处理好的one-hot编码与原数据合并
# 按照行索引进行
pd.concat([data, dummies], axis=1)
【这儿是紧接着上篇博文数据往下走的】
2 pd.merge- pd.merge(left, right, how='inner', on=None)
- 可以指定按照两组数据的共同键值对合并或者左右各自
left: DataFrameright: 另一个DataFrameon: 指定的共同键- how:按照什么方式连接
leftLEFT OUTER JOINUse keys from left frame onlyrightRIGHT OUTER JOINUse keys from right frame onlyouterFULL OUTER JOINUse union of keys from both framesinnerINNER JOINUse intersection of keys from both frames
2.1 pd.merge合并
left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
'key2': ['K0', 'K1', 'K0', 'K1'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})
right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
'key2': ['K0', 'K0', 'K0', 'K0'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})
# 默认内连接
result = pd.merge(left, right, on=['key1', 'key2'])
- 左连接
result = pd.merge(left, right, how='left', on=['key1', 'key2'])
- 右连接
result = pd.merge(left, right, how='right', on=['key1', 'key2'])
- 外链接
result = pd.merge(left, right, how='outer', on=['key1', 'key2'])
- pd.concat([数据1, 数据2], axis=**)
- pd.merge(left, right, how=, on=)
- how -- 以何种方式连接
- on -- 连接的键的依据是哪几个
