Deep|A discussion about RAG, Vector Database , and Their Monetization Potential, as well as an Analysis of Elastic as a Company-Part-2
Introducing recent changes in Elastic NV, particularly the new opportunities in vector databases and artificial intelligence
1. Key Technologies of Elastic NV
Elastic's core technology stack is commonly referred to as E-L-K, where E stands for Elasticsearch, L for Logstash, and K for Kibana. The three fundamental components support different upper layer functionalities.
1.1 Logstash: is an important core component of the ELK which is mainly used for the user to collect data from a variety of sources, transform it and then send the result to the desired location. It can be used when complex pipelines are handling multiple data formats.
1.2 Elasticsearch is a full-text search and analytics engine based on Apache Lucene. Elastic makes it easier to perform data aggregation and integration operations on data from multiple sources and to perform unstructured queries such as Fuzzy Searches on the stored data. It stores data in a document-like format, similar to how MongoDB does it. Data is serialized in JSON format. This adds a Non-relational nature to it and thus, it can also be used as a NoSQL/Non-relational database
1.3 Kibana is an open-source visualization. It is used for time-series analysis, log analysis, and application monitoring. It offers a presentation tool, known as Canvas. With this tool, you can create slide decks or dashboards that extract live data directly from Elasticsearch. It lets the users visualize their Elasticsearch data and navigate the Elastic Stack. Live data can be seen through the help of Charts, tables, maps, and other tools in Kibana.
2. Major Solutions of Elastic NV
Elastic has developed three main solutions around the three core fundamental technologies: The first is Enterprise Search. The second is Observability, which is usually used to monitor the health of server clusters. The third is Security, which is closely related to log analysis and mainly works as the data infrastructure of SIEM (Security Information and Event Management) and facilitate the operations of cybersecurity. As an open-source software, Elastic offers a highly flexible deployment environment, supporting on-premises, hybrid cloud, and pure cloud solutions.
Next, I will briefly introduce the current status of Elastic's three major solutions
2.1 Enterprise Search:
This solution primarily serves web and app developers. Many websites and applications’ search capabilities are powered by Elasticsearch. Currently, Elastic's dominance in the Enterprise Search field is largely due to its market position as an open-source project, which has practically become the standard in this domain, firmly occupying user mindset. Although early attempts were made by Google and other open-source projects, Elastic has emerged as the sole leader. Its open-source project has also brought paying customers to the commercial version of Enterprise Search.
Within the search scenario, customer’s index data and search construction workflows can be retained, which enhances customer retention. This data mainly consists of enterprise internal document index data. When uploaded to Elasticsearch, these documents are converted into Elastic's proprietary format, related to an inverted index. This concept involves the mechanism behind search implementation. For example, when searching for a keyword like "open source," Elasticsearch constructs a dictionary for each keyword and records which part and page of a document includes the keyword. Therefore, the creation of an inverted index leads to data retention. The larger the amount of data customers need to search, the more extensive the directories stored in Elasticsearch, resulting in greater data retention.
Regarding workflows, Elasticsearch differs from common search tools like Windows' built-in search. Basic searches often use simple brute-force algorithms, such as strict keyword matching, displaying results in sequence. In contrast, Elasticsearch, through its inverted index design, can rank search results more effectively. For customers, Elasticsearch can prioritize the most relevant results based on keyword and document relevance and even allows for customized configurations to meet specific needs. The retention of data and workflows forms a moat for Elasticsearch.
However, Elastic faces challenges. Its biggest drawback lies in the saturation of the enterprise search market, where Elastic holds an absolute leadership position, leading to the obsolescence of many old search products, such as Google's search products. Moreover, for most applications, search is too common a feature to directly attract external user traffic or generate direct revenue. Currently, it is more of a cost item for applications.
Additionally, Elastic's paid Enterprise version faces its own competition from the open-source product, as Elasticsearch users are often technically skilled developers capable of building their tech stack using open-source software. Only in traditional industries (such as finance and retail) do companies, despite not being in cutting-edge tech sectors, have application-building needs. Due to relatively weaker technical capabilities, they may opt to purchase ready-made solutions instead of building their own tech stack from the ground up with open-source tools. Aside from the open-source product, Elastic also contends with competition from public clouds, especially Amazon. Elastic's open-source licensing history involves competition with Amazon's OpenSearch, which will be discussed later. Azure also has its own search solution, though its performance is subpar, allowing Elastic to maintain a relatively good relationship with Azure in this regard. Recently, vector search was added to Elastic’s Enterprise Search as a new feature, aimed at introducing more monetization opportunities.
2.2 Observability
Next, let's talk about Observability, which is primarily aimed at application operations personnel for tracking the status of clusters. Typically, operations personnel need to focus on three types of data: Metrics, Trace, and Log.
Elastic initially entered the market through the Log. Logs are text files generated by machines, and their format is relatively standard. As a text search engine, Elastic is very well-suited for log searches. Elastic has already established a strong market position in the field of log searches, and users often prefer Elastic Search when thinking about log searches.
In the field of Observability, Elastic also has a price advantage. Currently, products from other listed companies (such as Dynatrace, Datadog, and the now-privatized New Relic) are often criticized by users for being too expensive, whereas Elastic is rarely blamed for pricing issues. This is closely related to Elastic's business model: the existence of its open-source products limits the possibility of arbitrary price increases; at the same time, Elastic adopts a subscription model based on infrastructure and software, rather than pricing each feature separately like Datadog. Therefore, in cases where operational workloads are heavy, Elastic offers better value for money.
Elastic's main issue in Observability is that it has not performed well enough in production system. While it holds a strong position in the logging domain, there is still significant room for improvement in the usability of its Observability products. It is worth mentioning that the current trend in the Observability field is to unify the three types of observability data onto a single platform. Datadog stands out in this regard, while Elastic has begun to pay attention to this trend, its progress in platformization is relatively slow and may take two or more years to catch up.
Additionally, Elastic's strengths in the traditional log search domain are also facing challenges from Datadog. Datadog has excelled in decoupling computation and storage for log queries, addressing Elastic's previous shortcomings in this area. Data is typically generated continuously while a program runs, but users only query logs when they encounter issues, rather than needing to query them continuously. Therefore, the growth in the demand for computation and storage is imbalanced. Datadog has made a series of improvements based on open-source Elastic, creating a superior log analysis tool that successfully decouples computation from storage. Despite Elastic's improvements over the past two years, there remains a gap compared to Datadog. At the same time, Datadog has also made numerous innovations in platformization, developing many small features that enrich its platform, such as dedicated modules and dashboards for specific use cases (like tracking Snowflake and Oracle consumption), which far exceed Elastic in richness.
Our research has found that there is differentiated competition between Datadog and Dynatrace in the top tier. Dynatrace primarily targets the large enterprise market, while DataDog started from the small and medium companies and now is expanding into the large enterprise sector. Elastic's changes in products are mainly focused on enhancing usability. Elastic has recognized this issue and is working to improve platformization and the decoupling of computation and storage. The “Serverless” product introduced by Elastic is a signal of this improvement, indicating that the binding between software usage and infrastructure will be significantly reduced.
According to related reports from Gartner, the leaders in this field are Datadog, Dynatrace, and New Relic. In 2023, Datadog experienced a slight decline in performance due to decreased cloud consumption, but overall, these three companies remain at the forefront. This year's situation shows that New Relic's ranking has dropped, which aligns with its somewhat stagnated product development. Dynatrace and Datadog have emerged as clear leaders. In the Observability space, Elastic's products have a solid foundation, and if improvements are made in sales strategy and usability, there should be certain opportunities for growth.
2.3 Security
In the security domain, Elastic primarily offers SIEM(Security Information and Event Management) as its key security product. The input for SIEM mainly consists of logs, while the output involves detecting hacker activities through these logs.
Elastic's advantage in the SIEM market is its relatively low cost, despite Splunk being the market leader. Many companies use Elastic alongside Splunk; Splunk is more powerful and user-friendly. Elastic, being cost-effective, can be used to initially filter all log data, with results then sent to Splunk for in-depth analysis. Elastic has the potential to progressively close the gap with Splunk in the security field, although completely replacing Splunk in the short term is challenging. Since Splunk's delisting in 2020 and subsequent privatization by Cisco, its development has stalled, with no particularly valuable new features released recently. While it is feasible for Elastic to catch up with Splunk in functionality, the migration cost from Splunk remains high. The SIEM module requires log analysis, typically done by security personnel who write query scripts based on experience. Different SIEM tools use their own proprietary query languages, such as Splunk's SPL. If a client has built numerous query rules in Splunk, replicating the original workflows when migrating to other tools is difficult (similar to Oracle migration). Therefore, while Elastic proposes replacing Splunk, a more pragmatic approach might be gradually eroding Splunk's market share by replacing it with functional modules.
Elastic is currently taking several actions in cybersecurity aimed at competing with Splunk and trying to migrate Splunk's customers to Elastic. These efforts include the introduction of a dedicated query language, ESQL, to lower the learning curve for users. In the field of Generative AI, mimicking Microsoft Copilot, Elastic has also launched an AI assistant(launched in 2023.7) capable of describing attack events and providing handling suggestions. Additionally, this AI assistant supports code conversion from Splunk's SPL language to ESQL to address migration cost issues.
On the commercial side, Elastic offers incentives during the migration period, allowing customers to avoid paying both vendors simultaneously.
Elastic's new feature "Attack Discovery" is a cutting-edge attempt that can automatically merge alerts, simplifying the work of security analysts, similar to the autonomous agent capabilities many companies have recently introduced. This feature aims to alleviate the challenges faced by security analysts when dealing with a large volume of alerts.
According to Gartner's ranking, Elastic has a certain market position in the SIEM field, but it is not among the leading players. While having a strong data platform is important, having templates and expertise that effectively address security issues creates a significant barrier to entry. To surpass Splunk in industry standing will likely take a considerable amount of time.
Elastic's business model employs a virtual machine pricing approach, rather than the more commonly seen consumption-based pricing among publicly listed companies. When users purchase Elastic, they essentially buy a cloud instance that has the complete Elastic software installed.
This pricing model is relatively outdated, as it restricts user consumption to a single cloud instance and fails to maximize the potential for users to utilize the services they may need most. It is a less flexible pricing structure. However, it does have certain advantages; compared to a consumption-based pricing model, this approach is relatively straightforward, making pricing more transparent for customers.
In contrast, customers using a consumption-based pricing model often face challenges when utilizing multiple modules, as each module may have different pricing units. This can lead to unpredictable expenses, and customers frequently find themselves unintentionally overspending (not realizing that certain configurations could incur significant costs), requiring negotiations with vendors to seek relief from substantial charges.
Elastic's pricing is relatively straightforward, primarily based on the volume of logs. Users can estimate the required cluster size based on log volume, along with the added value of the software to determine the price. When choosing Elastic, customers can clearly understand their infrastructure costs and software fees.
The drawback of this pricing model is that when users first start using it, if their usage is low, the scaling process with Elastic can be more complex. However, once users reach a stable usage level, Elastic offers better value for money. Additionally, Datadog's pricing can be quite complex, especially when enabling different feature modules; customers need to clearly identify which components require scaling and which do not.
In comparison to Datadog’s pricing model, the differences make direct comparisons challenging. Datadog charges for log analysis based on millions of events rather than log storage volume, which provides more flexibility during fluctuations in log volume. On the other hand, in stable conditions, Elastic's pricing may be more advantageous. According to DB-Engine rankings, Elastic Search(red ellipse) is regarded as one of the most popular databases, even surpassing some well-known projects.
It is important to note that the statistics primarily measure the popularity of projects. The evaluation of DB Engine popularity relies on various weighted metrics such as Google Trends and data crawled from major job hunting websites, reflecting the demand for hiring related to the database.
It's worth mentioning that since 2022, both Elastic Search and other document search projects (like Apache Solr and Microsoft's AI Search) have shown a downward trend. This is because the demand for software has experienced a significant reduction, and these software companies have also undergone layoffs.
Additionally, OpenSearch was originally an Amazon search project launched in 2016 under the name "AWS Elastic Search." It was later renamed due to disputes with Elastic. In 2022, Amazon rebranded it as OpenSearch, which put the project back on a growth trajectory and may have some impact on Elastic Search, especially among Amazon's users.
2.4 Other Solutions of Elastic
The other components of Elastic’s tech stack include Kibana, which is a UI for creating dashboards. While Kibana itself does not have particularly outstanding features, it is worth noting a competitor, Grafana, a company focused on dashboard development. Grafana Labs has rapidly risen as a commercial entity in the observability space. Grafana made its first appearance in the Gartner Magic Quadrant in 2023. Its position has already surpassed that of Elastic. According to Gartner's Magic Quadrant for 2024, Grafana has positioned itself in the Leaders quadrant. Grafana launched an open-source dashboard product relatively early on. Over the past two years, its sales team has grown rapidly, achieving significant progress in productization and usability. Grafana's primary goal is to compete with Datadog, meaning that Datadog will face greater competitive pressure in the next two years.
2.5 Elastic's Challenges in Observability and Security
Elastic is relatively weak at the data collection layer of its entire tech stack. Initially, Elastic's stack was known as ELK, with data collection handled by Logstash. While Logstash was designed specifically for log processing, it falls short in a broader environment that requires unified observability.
Competitors like Datadog and CrowdStrike have adopted the "One Agent" concept for data collection, allowing users to gather all data with just one agent, which is clearly more convenient. It is important to note that the agent here is different from an AI agent; it is merely a front-end probe for data collection.
Minimizing the number of agents means less overhead on system resources, but the issue lies in the fact that integrating multiple agents together significantly reduces stability. Generally, agents tend to have relatively high system permissions. If an agent is unstable and experiences a crash, it can lead to incidents such as the blue screen events seen recently with CrowdStrike.
Elastic recognized this issue and began to improve its own agent, but the initial improvements did not achieve the expected results, particularly with the Beat project, which was not successful. As a response, Elastic launched a new agent system called Fleet, which serves to manage multiple agents—akin to a fleet. A stable unified agent requires a long period of refinement, and it is clear that Elastic still has a long way to go in this regard.
2.6 A Summary of Elastic’s Fundamental Businesses
Elastic's core business encompasses three main areas—Enterprise Search, Observability, and Security—all of which are relatively mature markets. While Elastic maintains a leading position in Enterprise Search, this market is comparatively small and does not represent a particularly mature or large commercial sector.
In terms of observability and security, Elastic also possesses some advantages; however, its ease of use remains insufficient, and product maturity needs improvement.
These issues have been discussed for some time in the stock market, leading to a gradual decline in focus on its traditional business. The new perspective now is that with the rise of AI and the emergence of RAG (retrieval-augmented generation) and sunny databases, new opportunities are opening up for Elastic.
3. How is Elastic’s Opportunity in RAG and Vector Databases
Keep reading with a 7-day free trial
Subscribe to FundamentalBottom to keep reading this post and get 7 days of free access to the full post archives.