Data Format

What is a Data Format?

Data format is the definition of the structure of data within a database or file system that gives the information its meaning. Structured data is usually defined by rows and columns, where columns represent different fields corresponding to, for example, name, address, and phone number, and each field has a defined type, such as integers, floating point numbers, characters, and Boolean. Rows then represent individual records that fill in each column with its corresponding value. Unstructured data includes audio or video objects with a format that can be recognized and played back by software capable of decoding the data from that object.


Why is Data Format Important?

Source data can come in many different data formats. To run analytics effectively, a data scientist must first convert that source data to a common format for each model to process. With many different data sources and different analytic routines, that data wrangling can take 80 to 90 percent of the time spent on developing a new model. Having a model-driven architecture that simplifies the conversion of the source data to a standard, easy-to-use format ready for analytics reduces the overall time required and allows the data scientist to focus on machine learning model development and the training life cycle.


Data Format in the C3 AI Platform

The model-driven architecture of the C3 AI® Platform makes it easier and more intuitive to integrate new data sources with any data format into the platform and quickly prepare it for analysis. The C3 AI Platform offers a choice of full-code, low-code, and no-code methods to view and analyze the source data. There are over 25 prebuilt connectors to access cloud and on-premises data sources as well as prebuilt object models that can accelerate development of enterprise AI applications for industries such as oil and gas, utilities, financial services, aerospace and defense, and manufacturing.