Comment on page
Most image analysis tools aren't actually aligned with how the best R&D teams actually work with images. This creates a lot of friction and undesired outcomes.
- Images that may look quite a bit different or be more difficult than "model" images or cells, due to differences in acquisition or differences in treatment / gene variation.
- Limited coding/scripting resources due to a mix of team members that may or may not know how to code at a high level.
- Objectives that depend on how mature your application is:
- Early experiments and creation: Add new, accurate, analyses quickly as R&D direction shifts, extracting early results. Might not need massive compute at this stage.
- Optimization: Maximize accuracy and adapt to different imaging conditions, phenotypes, and objects easily. May need to run large-scale analysis - should only pay when it's used (serverless architecture). Easy access to data, analyses, and results for multiple team members.
- As analyses mature: Quality control, downstream processing, streamlined workflows, and economies of scale.
If you resonate with any of the above statements, we've built Biodock to be built around them!
Biodock is an end-to-end image analysis platform built around the way that R&D teams actually work. Scientists on our platform tackle even very difficult images, all in a no-code, intuitive package. See Path 2: Train a fully-automated AI model for some examples of what scientists on our platform accomplish.
Our platform brings together:
- A best in class training dashboard, including versioned, powerful image transformer models and AI-assisted labeling. Always be able to reproduce runs and access results.
- A results dashboard with instant statistics, plots, visualizations, and QC for every analysis. Download anything, including raw predictions, per-object metrics, and more.
- A run engine that automatically handles annoying problems, like cropping and stitching for large images, serverless runs, progress, parallelization, and more.
- Massive compute to handle the largest images (pathology) or largest scale (high throughput)
- Collaborative sharing of files, projects, and models with access roles.
- Data integrations with AWS S3, Google Drive, Dropbox, OneDrive, Box
- An open documented REST API to automate running analysis or integrate it as part of a larger pipeline
After working with lots of the best teams in bio, we've seen many approaches to image analysis that often lead to bad outcomes. While these approaches aren't necessarily invalid, we generally advise against them unless you know that your application is especially well suited to these.
Below, we've compiled a list of some of these not recommended approaches, and why we think that generally they don't lead to the outcomes that teams think they will when they embark on those paths.
It's not too difficult for a talented engineer to get good performance on a subsection of an image by fine tuning an open-source model. However, this is only the tip of the iceberg. Once you get past initial model training, there are far more difficult problems, including:
- Labeling data to train the model (Pay? Use a commercial or free software?)
- Training optimally, with the newest architectures and best practices for bio
- Doing QC on results, and making sure those efforts result in better model performance
- Keeping around different model versions and managing your ground-truth data for reproducibility.
- Running images on different sizes of images, including those that may not fit in memory.
- Running inference at scale and stitching together objects across crop boundaries.
- Sharing the model in a way that can be run by your team and beyond, including people who are not computational.
- Having a tested and scalable API to automate analysis.
- Scaling and deploying your compute in a way that will work on larger analyses.
- Adapting your model to new data or variations.
- And many, many more considerations!
If none of these apply to you, it might be a good option. However, these are fairly large, pure software problems that each could be a full time job even for the best engineers with significant software and AI expertise. Our entire team works only on these things so that you don't need to.
In reality, the best investment on your computational team is if they can work on novel problems to drive forward your R&D, and not have to deal with reinventing the wheel.
At first glance, this may seem to be the easiest option. However, this can start to become an issue due to generalization. You may have experienced this before - an analysis made for a certain type of image doesn't work as well as soon as your images look a little bit different.
Generally, these licenses also tend to be inflexible in the case that research direction changes, and are often gated in a way that makes them work against your computational team members.
We've found that most other "AI" software products tend to have you annotate patches of data, and train using models like DenseNet, VGG, and other models that don't perform too well on images by modern standards. While pretty convenient, their "pixel classification" architectures tend to fail on all but the easiest analyses.
We don't want to take away from the amazing tools that are available open source, like FIJI (ImageJ), CellProfiler, etc. If your analysis is low-volume and simple, extending them with macros might be a low-barrier way to do analysis, especially if your team has experience with them.
However, we generally find that in addition to the problems above from buying licenses, teams "get stuck" with the capability of these software packages. It's generally difficult to adapt when you run into problems (images are too difficult for good performance, images are too large, analysis is too slow, too many images, etc.) Teams often end up with pipelines that only work on simple images and often need to be tweaked for each new batch of images. They end up without a reproducible ledger of results, and need to build the rest of their pipeline, integrations, data outputs, and more.
Despite a host of open-source and commercial tools available, we believe that Biodock is the right choice for a large and expanding array of image analysis and organizations. However, Biodock isn't made for everything. If you fit in the list below and need to do analysis now, you should probably evaluate other tools.
- Applications that require local deployment (low latency or realtime applications, on-premise server communication, etc)
- 3D volumes (Z-stack 2D segmentation is supported)
- Object tracking keeping the same ID across frames (non-tracking time series can be done with image groups). Coming soon.