ISpout是实现Spout的核心接口, Spout负责将数据送到topology中处理, Storm 会跟踪Spout发出的tuple的DAG:
- 当Storm发现tuple的DAG成功的执行处理, 会发送一个ack message给spout,
- 当执行失败, 会发送以恶fail message 给spout;
Spout每次释放tuple, 都会用一个id标记该tuple, 这个id可以是任何类型, 当storm ack 或fail一个message, 就会通过id来追溯到和那个Spout相关联,如果spout省略了id,或设置一个null, 那么storm就不追溯到这个tuple信息, 那就无法反馈ack或fail信息,spout也无法收到信息。 Storm在相同的线程中执行ack , fail , nextTuple,这意味着ISpout的实现者不用考虑这些方法的并发性问题, 但是,同时要保证nextTuple方法不能阻塞,否侧导致ack, fail被阻塞,等待执行,然而fail的timeout决定不能被阻塞。
ISpout接口提供如下方法以供实现者来实现:在Spout初始化时被调用, 提供Spout的执行环境
/**
* Called when a task for this component is initialized within a worker on the cluster.
* It provides the spout with the environment in which the spout executes.
*
* This includes the:
*
* @param conf The Storm configuration for this spout. This is the configuration provided to the topology merged in with cluster configuration on this machine.
* @param context This object can be used to get information about this task's place within the topology, including the task id and component id of this task, input and output information, etc.
* @param collector The collector is used to emit tuples from this spout. Tuples can be emitted at any time, including the open and close methods. The collector is thread-safe and should be saved as an instance variable of this spout object.
*/
void open(Map conf, TopologyContext context, SpoutOutputCollector collector);
1.2 close()
ISpout被杀死时被调用, 当时也不保证该方法被调用, 因为在集群中supervisor通过kill -9
杀死worker进程。 当topology在Storm的本地模式运行总杀死,该方法一定会调用。
/**
* Called when an ISpout is going to be shutdown. There is no guarentee that close
* will be called, because the supervisor kill -9's worker processes on the cluster.
*
* The one context where close is guaranteed to be called is a topology is
* killed when running Storm in local mode.
*/
void close();
1.3 activate()
/**
* Called when a spout has been activated out of a deactivated mode.
* nextTuple will be called on this spout soon. A spout can become activated
* after having been deactivated when the topology is manipulated using the
* `storm` client.
*/
void activate()
1.4 deactivate()
/**
* Called when a spout has been deactivated. nextTuple will not be called while
* a spout is deactivated. The spout may or may not be reactivated in the future.
*/
void deactivate();
1.5 nextTuple()
当nextTuple被调用, Strom被请求, Spout会发射Tuple到output collecter, 如果没有Tuple可发射, 该方法就会返回, nextTuple, ack , fail 三个方法在Spout的任务中, 必须是在一个线程中,一个紧密的循环。 如果没有tuple可发射该方法会休眠短暂的时间。
/**
* When this method is called, Storm is requesting that the Spout emit tuples to the
* output collector. This method should be non-blocking, so if the Spout has no tuples
* to emit, this method should return. nextTuple, ack, and fail are all called in a tight
* loop in a single thread in the spout task. When there are no tuples to emit, it is courteous
* to have nextTuple sleep for a short amount of time (like a single millisecond)
* so as not to waste too much CPU.
*/
void nextTuple();
1.6 ack()
tuple处理执行成功,storm 会反馈给spout一个成功消息, 通过该tuple的spout指定的id,并将该id踢出队列,防止再次执行。
/**
* Storm has determined that the tuple emitted by this spout with the msgId identifier
* has been fully processed. Typically, an implementation of this method will take that
* message off the queue and prevent it from being replayed.
*/
void ack(Object msgId);
1.7 fail()
tuple执行失败, 被调用,反馈给spout,重新执行。
/**
* The tuple emitted by this spout with the msgId identifier has failed to be
* fully processed. Typically, an implementation of this method will put that
* message back on the queue to be replayed at a later time.
*/
void fail(Object msgId);