Unstructured Core Library
The unstructured
library is designed to help preprocess structure unstructured text documents
for use in downstream machine learning tasks. Examples of documents that can be processed
using the unstructured
library include PDFs, XML and HTML documents.
Library Documentation
- Installation
Instructions on how to install the
unstructured
library on your system.- Unstructured API
Access all the power of
unstructured
through theunstructured-api
or learn to host it locally.- Bricks
Learn more about partitioning, cleaning, and staging bricks, including advanced usage patterns.
- Source Connectors
Connect to your favorite data storage platforms for an effortless batch processing of your files.
- Destination Connectors
Connect to your favorite data storage platforms to write you ingest results to.
- Metadata
Learn more about how metadata is tracked in the
unstructured
library.- Examples
Examples of other types of workflows within the
unstructured
package.- Integrations
We make it easy for you to connect your output with other popular ML services.
- Best Practices
Learn best practices to optimize document information extraction using
unstructured
library.