Spark-TFRecord 用于从Apache Spark读取和写入Tensorflow TFRecord数据,基于Spark Tensorflow连接器实现(https://github.com/tensorflow/ecosystem/tree/master/spark/spark-tensorflow-connector),使用Spark FileFormat trait重现,以提供分区功能。
linkedin / spark-tfrecord
先决条件
- 先决条件
- Maven 构建库
- spark-tfrecord 案例体验
Apache Spark 2.0 (or later)
Maven 构建库使用Maven 3.3.9或更新版本构建库,如下所示:
# Build Spark-TFRecord
git clone https://github.com/linkedin/spark-tfrecord.git
cd spark-tfrecord
mvn clean install
# one can specific the spark v