Using the Unified Conversion API, Core ML Tools converts a third-party model to a more malleable representation in Model Intermediate Language (MIL) before translating to the Core ML format. This allows for adjustments like fine-tuning for specific hardware capabilities and applying quantization for model size reduction before deploying the final, optimized Core ML model to an iOS app.
Core ML Tools also supports compression workflows for reducing the size of models. In Core ML Tools 7, there is support for post-training compression (for all models) and training time compression (for PyTorch models). The first approach is faster and doesn't require data, whereas the second offers better accuracy retention by leveraging data for fine-tuning.
For more information, check out Apple’s Core ML Tools guide.