|
Event Stream Intelligence: Esper & NEsper |
FAQ
How does it work?
- How does Esper work? How does Esper allow you to search and match patterns on temporal events?
- What is the difference between Esper and an in-memory database?
- How does the engine discern which data to retain? Is it based solely on the queries registered with the engine at the time the event comes in?
- What happens on engine start? I assume that if I have a time based query, since there's no history within the engine, there's no way to get any events to fire until the time and events have been consumed?
- Can queries be added to the system only on engine start, or can they be added dynamically? Will those queries work with any internally stored historical data when they're started?
- The engine can be configured to work with DOM objects and many other event representations. Is that configuration static, or can those configurations be changed or added to on a running system?
- When working with composite streams, i.e. when using the 'insert into' mechanism, does the entity being inserted into have to be a registered object in the system or are those created simply by registering the query?
- Could you explain the concept of data windows for a database programmer?
- What additional components does Esper require to run?
- How has this been tested? What guarantees do I have that the next release works just as well?
- Can I run it with multiple threads? What, if anything, is multithread-safe?
- What operating systems has it been tested on?
- What is the footprint of Esper in a typical installation, i.e. what is the RAM, disk and CPU usage?
- If one overloads Esper with events through a JMS queue, will Esper queue events internally until it processes them or will events stay in the JMS queue until Esper process them?
- Is there a way to send a bunch of events into Esper and get back a notification when it is done with processing all events?
- What is the policy you recommend on UpdateListener (or Observer) if UpdateListener does long processing? For example, why don't you create a thread or attach a JMS queue for output in some examples?
- It claims to be fast...how does it do that? Has this claim been tested?
- Do you have any benchmarks available for Esper? Such as results from Linear Road?
- What or how many events does Esper keep in memory? Does Esper keep matching events of a query in memory?
- What's the advantage of using Esper over other products?
- What are the advantages of Esper's Event Processing Language (EPL)?
- What business areas/problems is Esper best suited for?
- What might be some misuses for it?
- What is the intended audience and what is their interface?
- How does Esper hook into an existing event-based system?
- How would you position Esper next to existing solutions (or hacks) in the industry?
- What is the concept or philosophy behind the design?
- What is the history? Where did it start and come from?
- How difficult is modification to the Esper Event Processing Language (EPL)? For example adding aggregate functions, or modifying the language all together?
How does it work?
- How does Esper work? How does Esper allow you to search and match patterns on temporal events?
-
Esper is an event stream processing (ESP) and event correlation engine (CEP) written in Java. Basically instead of working as a database where you put stuff in to later poll it using SQL queries, Esper works as real time engine that triggers actions when event conditions occur among event streams. A tailored Event Processing Language (EPL) allows registering queries in the engine, using Java objects (POJO, JavaBean) to represent events. A listener class - which is basically also a POJO - will then be called by the engine when the EPL condition is matched as events come in. The EPL allows expressing complex matching conditions that include temporal windows, and join different event streams, as well as filter and sort them.
A simple example could be to compute the average stock price of the BEA tick on a sliding window of 30 seconds. Given a StockTick event bean with a price and symbol property and the EPL "select avg(price) from StockTick.win:time(30 sec) where symbol='BEA'", a POJO would get notified as tick come in - and in real world millions of ticks can come in - so there's no way to store them all to later query them using a classical database architecture. Statements can be much more complex, and also be combined together with "followed by" conditions.
The internals of Esper are made up of fairly complex algorithms primarily relying on state machines and delta networks in which only changes to data are communicated across object boundaries when required.
[top] - What is the difference between Esper and an in-memory database?
-
The Esper engine works a bit like a database turned upside-down. Instead of storing the data and running queries against stored data, the Esper engine allows applications to store queries and run the data through. Response from the Esper engine is real-time when conditions occur that match queries. The execution model is thus continuous rather then only when a query is submitted.
[top] - How does the engine discern which data to retain? Is it based solely on the queries registered with the engine at the time the event comes in?
-
Yes, the data that the engine retains is based solely on the queries registered with the engine. The engine retains the minimum needed events and/or derived data to satisfy any started statements. Thus, if one has an engine running and consuming events, but has no queries registered with the engine, the engine does not retain any data.
[top] - What happens on engine start? I assume that if I have a time based query, since there's no history within the engine, there's no way to get any events to fire until the time and events have been consumed?
-
The Named Window feature allows starting queries from a prior event history. Named windows are more similar to traditional tables and help to initialize new queries with data.
Currently the Esper engine itself does not provide state persistence, fail-over or recovery, or an event replay mechanism. The Esper engine does not write to disk, cluster or persist in any other way engine state. Thus if your application process or your hardware system goes down, then the engine state is lost.
If you need fail-over and/or recovery capability, then the EsperHA (Esper High Availability) product by EsperTech can be a good solution. Please contact us for further information.
Alternatively your application could replay events into the engine, but that is currently a process the application or middleware must do and that Esper has no facilities for.
[top] - Can queries be added to the system only on engine start, or can they be added dynamically? Will those queries work with any internally stored historical data when they're started?
-
Queries can be added, stopped, started, or destroyed while the engine is running in a multithread-safe fashion. The facility to explicitly attach or initialize a new query from a prior query or data window is a feature of named windows.
[top] - The engine can be configured to work with DOM objects and many other event representations. Is that configuration static, or can those configurations be changed or added to on a running system?
-
The runtime configuration API allows most configuration actions to be performed at runtime. This API is available via the administrative interface of the engine.
[top] - When working with composite streams, i.e. when using the 'insert into' mechanism, does the entity being inserted into have to be a registered object in the system or are those created simply by registering the query?
-
No, there is no registration required. The creation of the query that contains the insert-into clause creates the new stream and makes it available to use in further statements.
[top] - Could you explain the concept of data windows for a database programmer?
-
One could perhaps think of a table with a timestamp column containing the time when the row was inserted. We could create a view that sorts by timestamp descending and selects all rows between the current timestamp and say up to 1 minute prior to now. Every time we fire a query against this view, the view returns the recent rows added in the last 1 minute. The rows returned are the contents of a 1 minute time window. Every time the query is fired we get a new window. Older rows would seem to leave the window while new rows would seem to enter the window.
[top]
Technical
- What additional components does Esper require to run?
-
Please see the feature list for this information. And the "lib" folder in the source distribution contains a readme file that describes the dependencies.
[top] - How has this been tested? What guarantees do I have that the next release works just as well?
-
Esper uses the JUnit testing framework to automate regression testing of the system. Esper's build process runs the currently around 2055 unit test methods, many running multiple test scenarios, in 550 JUnit test classes when a build occurs. The test classes test the Esper engine from an end-to-end perspective as well as perform tests against individual classes and components. Tests are very detailed in asserting the expected results allowing us to detect most problems before they can reach you. There is also a good set of tests for multithread-safety as well as tests that assert performance of statements and configurability of the engine. See the package "com.espertech.esper.regression" in the test source folder as the entry point for end-to-end testing against the client API.
The Esper team follows the practice of test-driven development (TDD) rigorously, ensuring that each feature added has automated test coverage. We develop and evolve the tests for each feature along with the feature that is currently being developed.
We have received a recent test coverage result that we quote here: "I used Cobertura for this, the line coverage is 85% and the branch coverage is 93%. This gives me a good feeling, but one concern is that the tests take long time to run, but I can live with that."
[top] - Can I run it with multiple threads? What, if anything, is multithread-safe?
-
All administrative and runtime operations are multithread-safe as of release 1.5 for all types of statements. Applications can perform multithreaded sends of events into the engine as well as create, start and stop statements during operation, while retaining full control over threading and efficiently sharing resources between statements.
Additionally, Esper supports multiple independent Esper engines per Java VM. Thus applications can segregate work to multiple engine instances allocating one or more threads to each engine instance.
Iterating (pull-model) of result data by using a statement's safeIterator method concurrently to the statement's processing of events is also thread-safe.
Not thread-safe are the following: iterating via iterator method (use safeIterator instead), configuration API, SODA API when sharing a statement object model instance between threads.
[top] - What operating systems has it been tested on?
-
We test the engine each release on Windows XP and on Redhat Linux. Please see the change history notes that indicate what JDK version(s) are used for building the release, and which we certified to pass all tests.
[top] - What is the footprint of Esper in a typical installation, i.e. what is the RAM, disk and CPU usage?
-
The kernel itself if very lightweight and fits in a few MB heap (in RAM). The disk usage is also limited (logs, jar file of a about 2 MB incl. third party jars, and a few KB for configuration files).
The CPU consumption is a factor of the events entering and exiting the system, and also of the actual listeners you register with the statements.
The heap consumption (RAM) is proportional to the number of streams and statements you deal with and the window sizes (correlating / computing average over 100 events or 100000 events, or for 10 seconds or 10 days).
For more information, please check the performance section in the reference documentation.
[top] - If one overloads Esper with events through a JMS queue, will Esper queue events internally until it processes them or will events stay in the JMS queue until Esper process them?
-
Esper does not have an internal event queue. The threading is completely driven by the application that embeds Esper. The relevant doc chapter is here.
Therefore, if your application uses the listener model to read messages that arrive from a JMS queue, then your JMS implementation MOM middleware (MOM=message-oriented middleware, i.e. ActiveMQ and many others) provides the threads that deliver messages. Say you add N listeners (javax.jms.MessageListener) to JMS destination then the MOM supplies N threads that each deliver a message asynchronously, on message arrival. If your application has the JMS listener send one or more events per message into Esper, then the thread blocks until Esper has processed the event, and only after Esper has processed each event does your listener code return from the onMessage method, and the MOM delivers the next message to that listener.
As part of the Esper distribution comes an example "JMS Server Shell" and client, which are a configurable JMS-based server and client to process events from JMS destinations.
One can of course also use the receive model (blocking calls) to receive from a JMS queue with N application threads each performing a blocking receive (see javax.jms.MessageConsumer to receive with timeout). When the consumer receives a message it can send the event(s) into Esper from the same thread. Again the application thread will call the next JMS receive after Esper has processed the event(s) completely.
Another alternative is to de-couple JMS reads from Esper event processing by putting a blocking queue and a threadpool between the JMS receivers or listeners and Esper event processing. The disadvantage lies in a larger number of thread context switches (not within Esper, but within your application and the MOM) which can overall lower performance.
The EsperIO package has JMS adapters that are based on the Spring framework and it's JMS template to configure JMS connectivity without the need to write code.
[top] - Is there a way to send a bunch of events into Esper and get back a notification when it is done with processing all events?
-
As discussed before, there is no internal event queue, and the application threads process all events. Esper does not have a built-in mechanism to bunch up messages, however the Java concurrent library provides very good infrastructure to do this. A bunch of messages can be collected into a list, and a threadpool (see java.util.concurrent.Executors) and Callable, that takes the list and sends the events to Esper, could be used here, for example.
Our benchmark kit provides an example built upon concurrent queues and thread pools.
[top] - What is the policy you recommend on UpdateListener (or Observer) if UpdateListener does long processing? For example, why don't you create a thread or attach a JMS queue for output in some examples?
-
A blocking UpdateListener or subscriber blocks event processing for that thread. Decoupling output processing via a further threadpool may have an advantage if output listener processing can be very slow, but incurs the cost of further threads and context switching.
One factor to consider is the number of output events. A second factor is the action that you application may perform for each output event, such as whether the event needs to be communicated to another system or simply displayed.
The examples generally leave the transport and threading out of the picture, since that is specific to any application and particular integration environment as well as event stream density. We also want to keep examples simple.
[top] - It claims to be fast...how does it do that? Has this claim been tested?
-
We have published benchmark kit results and have the benchmark kit itself available for download. The results are summarized with further links on the performance page.
Esper internally builds all the indexes and uses many optimization techniques hidden to your application. These techniques are verified as part of the performance-asserting regression tests that are part of the source code, and that are executed as part of our build process.
There is also currently no standard performance test for CEP/ESP engines out there.
[top] - Do you have any benchmarks available for Esper? Such as results from Linear Road?
-
We have published benchmark kit results and have the benchmark kit itself available for download. The results are summarized with further links on the performance page.
There is no benchmark industry-wide that would allow easy comparison.
We also have an RIFD demo example that is designed to do performance testing (we can for example run about 100 000 event per second against 2000 statements on a single dual core CPU of a commodity hardware - but one would argue this does not mean a lot if we don't look at event, statement complexity, underlying resiliency etc.).
Compared to other software in the CEP space, (N)Esper can run on a very large number of platforms - basically any platforms that has a Java or .Net runtime, either 32bit or 64bit, with no lock-in with any operating system (slightly more true for Esper than NEsper of course). It is also possible to run Esper on modern compute appliances such as from Azul(R) technologies in the field of high performance computing (hundreds of core, real time capabilities, etc). This is strictly not possible for other CEP engines.
[top] - What or how many events does Esper keep in memory? Does Esper keep matching events of a query in memory?
-
Esper offers built-in data windows as part of the event processing language that instruct the engine how many or how long events must be considered. For example, a time window with an interval length of 10 seconds instructs the engine to retain the last 10 seconds of events as a moving data window.
There is quite a few different eviction policies for different situations. Most are row-count based or time-based but there is also multi-policy. A sort-based eviction policy is also possible, as is combining different data windows. And the engine can process events without any data windows as well.
The policy for holding events in memory depends on your statement, especially the data window used (sliding, tumbling and time or length etc), and whether it uses patterns, output rate limiting or is a join.
For example, if your statement employs no data window, Esper keeps no events in memory. This is a sample query that keeps no events in memory:
select * from Quote(price>100)
Or if your statement uses a unique window on a property then Esper will keep the last event per property value in memory. Here the last event per 'assetId' is kept in memory:
select * from RfidEvent.std:unique(assetId)
Note that Esper also re-uses (shares) data windows between statements, if possible. For views that derive values from an event stream, no events are kept in memory. For aggregations, only the aggregation values are kept in memory. For patterns, the events that participate in the pattern are kept in memory only if tagged. For output rate limiting, events are buffered unless you use the "snapshot" keyword - There is a section in the output limiting doc explaining what is buffered and when for output rate limiting. For joins, if no data window is specified, the keep-all data window applies.
[top]
General
- What's the advantage of using Esper over other products?
-
Comparing Esper to some of the leading commercial products out there, we think there are several unique features of Esper: Esper's pull API is noteworthy, one of our customers recently remarked "Indeed one of the important feature of a Real-Time analytics is to be able to connect to CEP on-demand, basically if the analytic is off-line, the server is continuing to calculate."
Other unique features are the Statement Object Model API, which is a set of classes to directly construct, manipulate or interrogate EPL statements. We think we are feature equivalent if not ahead of many commercial products, also considering performance.
Esper can easily be embedded in any Java runtime including J2EE application servers or standalone Java applications. Events in Esper can be represented as plain Java objects (POJO) eliminating the need to format, feed and receive events into a separate possibly remote engine process. Esper is an entirely free open-source component available under the GNU GPL license (GPL also known as GPL v2). The open-source nature of Esper helps in tailoring the event processing language and other community driven features.
Esper and NEsper are embeddable components written in Java and C# and are therefore suitable for integration into any Java process or .NET-based process. Esper and NEsper are not a server by itself but are designed to hook into any sort of server, ranging from market standard JEE server (weblogic, websphere, jboss etc), service bus, or lightweight solutions (OSGi based, grid etc) and also Microsoft based .Net technologies. NEsper is suitable for use in desktop end-user stations.
The other advantages this model offers is that the components can run standalone in your development environment making development and testing much easier, while for the target production environment this makes it much more tailored to what you really need, or possibly have already in place. End to end performance and latency is also enhanced as your application may then not need to transport events to a dedicated remote server process, but can process events at the event source saving marshalling/unmarshalling/network.
[top] - What are the advantages of Esper's Event Processing Language (EPL)?
-
The Esper event processing language (EPL) converges event stream processing (filtering, joins, aggregation) and complex event processing (causality) into one single language. The core language is SQL-like ensuring rapid learning, but is also highly oriented toward support of modern technologies so it is for example object oriented (more than table oriented), enabling for simple extension. The language, of course, includes event windows and causality patterns as first citizens. We natively support several types of event formats, from Java/.Net object, maps, to XML documents.
[top] - What business areas/problems is Esper best suited for?
-
Esper is best suited for real-time event driven applications. Typical application areas are business process management and automation, finance, network and application monitoring and sensor network applications. Esper take much of the complexity out of developing applications that detect patterns among events, filter events, aggregate time or length windows of events, join event streams, trigger based on absence of events etc.
A primary difference with system relying on classical SQL databases is that we do not query a repository for events matching some conditions, but instead trigger customized actions as the flow of events come in matching event conditions - hence drastically reducing the latency.
[top] - What might be some misuses for it?
-
Esper is not designed for storing and retrieval of fairly static data - that is better left to conventional databases. In-memory databases may be better suited to CEP applications than traditional relational databases as they generally have good query performance. Yet they are not optimized to provide immediate, real-time query results required for CEP and event stream analysis.
[top] - What is the intended audience and what is their interface?
-
Esper doesn't have a GUI and access is via API only, currently. The intended audience is developers of CEP or ESP applications.
[top] - How does Esper hook into an existing event-based system?
-
Today, event based system are often confused with message based system such as JMS and alike back-ends or Enterprise Service Bus (ESB). Messaging system are event based but usually do not correlate events or process event streams at all. They simply process unitary events - and it becomes quickly complex if you need to correlate messages.
Esper can hook into any Java based system as a message consumer - whose main purpose would be to listen efficiently to everything whilst sending back other events when a matching condition has occurred. On the other hand, hooking Esper in a classic messaging system allows having it correlate events even if those come from non-Java based systems. This is just an example. Esper is completely container agnostic, can run as a standalone component, and the 1.0 release simply assumes events to be POJO. We are currently working on providing native XML format support as well.
[top] - How would you position Esper next to existing solutions (or hacks) in the industry?
-
Esper is the only open source Event Stream Processing out there - and this is not to be confused with classical rules engines. Compared to commercial ESP engines implementations, Esper may lack high availability options in the 1.0 release, but its open source nature is a driver to get the Event processing language right, and get the Java community up to speed on those concepts - before tackling enterprise grade requirements in upcoming versions.
Generally Rete-based production rule engines can indeed be used to address part of what CEP coins, especially triggering by correlating events, possibly including a temporal relationship between events. That said the ESP side - Event Stream Processing - is a different beast, where what matters is the "S" for Stream. In this side of the Esper engine we provide several language facilities to build expressions using time not for temporal relationship (happened before etc) but for sliding window. As an example, this makes it very valuable for computing things like volume weighted average price (VWAP) of ticks - which would possibly be awful using something Rete-based like a classical rule engine.
[top] - What is the concept or philosophy behind the design?
-
Esper was developed using test-driven development and excellent automated test coverage. Esper design evolved by re-factoring with courage towards higher design quality. Favorite patterns are dependency injection/inversion of control by context injection, Immutable, Specification, GOF patterns (except Singleton :).
[top] - What is the history? Where did it start and come from?
-
Thomas Bernhardt is the project founder and project lead. He works as a software architect for a major financial institution. On a project in 2004 Thomas had been asked to evaluate different rules engines for use in an application for monitoring a trading system. The rules engines proved cumbersome and slow. There was no budget for a CEP product, thus a custom application to solve the monitoring problem was developed. The project lead's interest in CEP and ESP technologies lead to further independent research into the knowledge space followed by prototypes build for no particular organization or purpose. The prototypes evolved and the Esper project started.
[top] - How difficult is modification to the Esper Event Processing Language (EPL)? For example adding aggregate functions, or modifying the language all together?
-
Esper allows plug-in of aggregation functions, views, pattern objects and user functions as described in the documentation. Extending the language is thus easy and custom extensions directly surface in the EPL.
For even more flexibility, it is also possible to contribute your own extensions to the project in other areas if you comply with the open source GPL terms. The grammar is specified in ANTLR and the well-componentized code constructs an internal representation of statements that allows for easy extension.
The statement object model API also deserves a mention here since that allows an application to compile, interrogate and build EPL statements not jus as strings but as comprehensive object representations.
[top]
About Esper and NEsper
NewsAbout Esper for Java
About NEsper for .NET
License
Terms of Use
Tutorials and Case Studies
TutorialQuick Start
Short Case Study
Longer Case Study
Solution Patterns
OnJava Article
TheServerSide Article
Past Presentations
FAQ for Java
FAQ for .NET
Additional Examples
Technology Links
Esper for Java
DownloadChange History
Documentation
Reporting Issues
Building
On Performance
NEsper for .NET
DownloadChange History
Documentation
Reporting Issues
Building
The Esper/NEsper Team
How To ContributeMailing Lists
Roadmap
Source Repository
Last Published: May 11, 2008
Version: 2.1.0
