Twine helps life science companies leverage the latest "big data" technologies to take advantage of the explosion in data.


Innovative methodologies, tool libraries and data models specific to life science data and that leverage the advances in big data technologies are the framework of the Twine Life Science Intelligence Platform.

  • In addition to the data scalability inherited from Spark, our proprietary architecture and tool set provides code and team scalability through the following features:

    • Multi-level modular design allows developers to work on large scale projects, and enable easy code and data reuse
    • Multi-grain variable traceability to support full scope knowledge transparency to developers and data users
    • Provides interfaces to multiple languages(Scala, R and Python) for easy integration into existing code and leverage existing developer experiences
    • Pure text code, can utilize modern CM (Configuration Management) tools to track and merge changes among team members
    • Automatic Data and Code version synchronization to enable coordination on both code and data level
    • Data publishing mechanism to support inter-team coordination
    • Built-in data quality management to ensure data quality on a continuous bases
    • Tools for data discovery (e.g. Schema Discovery, Enhanced Data Dictionary, Primary Key Discovery).
  • Twine has combined expertise from a variety of different methodologies into a flexible fuzzy matching library to integrate life science data. Specific steps, include:

    • Data standardization and tokenization
    • Spelling and Sound based Fuzzy logic
    • Statistical matching for disambiguation
    • Multi-level matching frameworkSpelling and Sound based Fuzzy logic
    • Life science data are highly variable in terms of the quality of the source and data management, and the typical effort required to process and clean the data on an ongoing basis is significant.
    • Twine has developed an automatic online data monitoring and managing system which is configured to handle the wide variety of public, third-party and proprietary data sources commonly used for life science commercial analytics
  • Our proprietary data model framework is an industry first and based on a number of concepts, including:

    • Entity - the smallest unit of a class of similar items captured in data. Entity could form groups as hierarchies, segments, clusters, and/or by shared attributes
    • Event – things happen at some time point on individual entity or between entities
    • State – relatively stable status of an entity in a given time period


We work with companies bringing their first product to market without a data warehouse to large global players with huge data management needs. We typically work with clients across the functional area spectrum.

  • data management Data management
  • technology Technology
  • market analytics Market analytics
  • data management Client sales
  • marketing Marketing


Unique and powerful combination of the leading big data scientists and technologies, with life science analytics and business consultants.