Databricks recently announced its Series J funding round, successfully raising $10 billion at a valuation of $62 billion. Led by Thrive Capital alongside high-profile investors such as Andreessen Horowitz and Insight Partners, the company intends to invest this capital towards new artificial intelligence (AI) products, acquisitions and significant expansion of its international operations. In the announcement, Databricks reported that it expects to achieve an annual revenue run rate of $3 billion in the quarter ending January 31, 2025.
Founded in 2013, Databricks initially gained prominence for its cloud-based Apache Spark services, aimed at enhancing big data processing and creating an alternative to MapReduce. Spark was initially started by Databricks co-founder Matei Zaharia at UC Berkeley's AMPLab in 2009. In 2013, the project was donated to the Apache Software Foundation. The Spark framework includes Spark SQL, DataFrames, Spark Streaming, MLlib and GraphX. By providing robust analytics over massive data sets, Spark became the de facto way to process big data. Although Databricks’ commercial operations focus exclusively on providing managed cloud services, it has also supported the broader open-source Spark community.
Over time, the worlds of data lakes and data warehouses collided. Databricks introduced the concept of a data lakehouse, adding Databricks SQL as well as open table formats. My colleague Matt Aslett has written about the importance of open table formats and Databricks’ support for both Delta Lake and, more recently, Apache Iceberg, including its acquisition of Tabular.
As the years progressed, Databricks evolved beyond its initial offerings. The company’s Data Intelligence Platform is now positioned as providing a lakehouse-based environment for
One of the company's hallmark events, the Data + AI Summit, has established itself as a nexus for industry stakeholders, offering insight into cutting-edge applications of data and AI technologies. Recently, the Summit focused on GenAI integrations, including functionalities that allow users to streamline the development of AI-based applications within the Lakehouse environment. This year, the emphasis was on announcing new capabilities like Mosaic AI Agent Framework and enhancements to its Lakehouse Platform for AI and data. The launch of Databricks AI/BI also represents the company’s entry into the self-service analytics market with two new AI-powered capabilities: low-code dashboarding and a conversational interface.
Databricks also unveiled LakeFlow to bring together all types of data engineering processes into a unified platform. Building on Databricks’ acquisition of Arcion as well as its existing Delta Live Tables and Databricks Workflows functionality, LakeFlow automates ingestion, transformation and monitoring, simplifying the overall experience for data teams. Such innovations are vital as businesses continue to grapple with the complexities of managing and optimizing data pipelines across various environments. Databricks was also rated Exemplary in our Data Intelligence, Data Integration and Data Governance Buyers Guides.
Shortly after the event, Databricks announced general availability of Databricks Assistant. Databricks Assistant can be used to generate code, provide help and troubleshoot errors. It can also be used to create visualizations and dashboards, although this feature is still in preview mode.
In conclusion, Databricks has made significant strides from its initial focus on Spark processing for big data, becoming an industry leader that unifies data and AI under a single platform. The current push for GenAI and the substantial funding acquired provide the resources for Databricks’ continued investments. As enterprises consider their data and AI strategies, I recommend they include Databricks in their evaluations.
Regards,
David Menninger