Unstructured Core Library

The unstructured library is designed to help preprocess structure unstructured text documents for use in downstream machine learning tasks. Examples of documents that can be processed using the unstructured library include PDFs, XML and HTML documents.

Library Documentation


Instructions on how to install the unstructured library on your system.

Unstructured API

Access all the power of unstructured through the unstructured-api or learn to host it locally.


Learn more about partitioning, cleaning, and staging bricks, including advanced usage patterns.

Source Connectors

Connect to your favorite data storage platforms for an effortless batch processing of your files.

Destination Connectors

Connect to your favorite data storage platforms to write you ingest results to.


Learn more about how metadata is tracked in the unstructured library.


Examples of other types of workflows within the unstructured package.


We make it easy for you to connect your output with other popular ML services.

Best Practices

Learn best practices to optimize document information extraction using unstructured library.