第一次尝试用Python连接clickhouse数据库,踩了不少坑,特此记录,帮助后人少犯错!
运行环境:
- python 3.8.3
- clickhouse_driver==0.2.3
- clickhouse_sqlalchemy==0.2.0
- sqlalchemy==1.4.32
借鉴网上的方法
from clickhouse_driver import Client client = Client(host=host, port=8123, database=database,user=user ,password=pw) sql = 'SHOW TABLES' res = client.execute(sql)
报错:UnexpectedPacketFromServerError: Code: 102 原因:端口问题,HTTP协议(默认端口8123);TCP (Native)协议(默认端口号为9000),Python里的clickhouse_driver用的tcp端口9000,DBeaver使用的是HTTP协议所以可以使用8123端口。
修改后
from clickhouse_driver import Client client = Client(host=host, port=9000, database=database,user=user ,password=pw) sql = 'SHOW TABLES' res = client.execute(sql)
报错:SocketTimeoutError: Code: 209.
原因:这里贴上 GitHub 上作者说的解决方案,传送门
发现这个错误的原因,也是因为没有设置9000端口?感觉很懵比。于是放弃了Client,试了一下另一种连接方式。
from clickhouse_driver import connect #账号:密码@主机名:端口号/数据库 conn = connect(f'clickhouse://{user}:{pw}@{host}:9000/{database}') cursor = conn.cursor() cursor.execute('SHOW TABLES')
报了一样的错误,服了。 最后放弃了clickhouse_driver,尝试用clickhouse_sqlalchemy与sqlalchemy成功解决
clickhouse_sqlalchemy直接附上成功连接的代码。
from clickhouse_sqlalchemy import make_session from sqlalchemy import create_engine import pandas as pd conf = { "user": "xxx", "password": "xxx", "server_host": "xx.xxx.xx.xxx", "port": "8123", "db": "xxx" } connection = 'clickhouse://{user}:{password}@{server_host}:{port}/{db}'.format(**conf) engine = create_engine(connection, pool_size=100, pool_recycle=3600, pool_timeout=20) sql = 'SHOW TABLES' session = make_session(engine) cursor = session.execute(sql) try: fields = cursor._metadata.keys df = pd.DataFrame([dict(zip(fields, item)) for item in cursor.fetchall()]) finally: cursor.close() session.close()