Combination Product Stability Testing: Attribute Selection and Acceptance Logic for Drug–Device Systems

Designing Stability Programs for Drug–Device Combination Products: Selecting Attributes and Setting Acceptance Criteria That Hold Up Globally

Regulatory Frame & Scope for Combination Products

Stability programs for drug–device combination product platforms must integrate two regulatory grammars: medicinal product stability under ICH Q1A(R2)/Q1E (and Q1B where photolability is relevant) and device-centric considerations that arise from materials, delivery mechanics, and human factors. The dossier must demonstrate that the drug product maintains quality, safety, and efficacy through the labeled shelf life and, where applicable, through in-use or on-body wear time; and that the device constituent does not compromise the medicinal product through sorption, permeation, or leachables, nor lose functional performance (e.g., dose delivery, actuation force, flow or spray pattern) as the system ages. Authorities in the US, UK, and EU take a harmonized view of the drug component—long-term, intermediate (if triggered), and accelerated data at label-relevant conditions with evaluation per ICH Q1E—while expecting device-relevant evidence that is commensurate with risk and mechanism. Thus, stability scope is broader than for a stand-alone drug: chemical/physical quality attributes are necessary but not sufficient; delivery-system attributes and material interactions are part of the same totality of evidence.

Practically, the “frame” starts with a structured mapping of the combination product: (1) route and modality (e.g., prefilled syringe, autoinjector, metered-dose inhaler, dry-powder inhaler, nasal spray, ophthalmic dropperette, transdermal patch, on-body injector, topical pump), (2) container/closure and fluid path materials (glass, cyclic olefin polymer, elastomers, adhesives, polyolefins, silicones), (3) user-interface and functional elements (springs, valves, meters, dose counters), and (4) drug product mechanisms susceptible to material or device influences (oxidation, hydrolysis, potency drift, particulate, rheology). Each mechanism informs attribute selection and acceptance logic. The program remains anchored in ICH Q1A(R2): long-term at 25 °C/60 % RH or 30 °C/75 % RH as appropriate to target markets; accelerated at 40 °C/75 % RH; intermediate when accelerated shows significant change; refrigerated or frozen regimes where the label requires. But beyond that, the plan explicitly ties in device performance testing at end-of-shelf-life states, container-closure integrity (CCI) verification for sterile or microbiologically sensitive products, and extractables and leachables (E&L) linkages when material contact could alter drug quality. In short, the scope is integrated: one stability argument, two constituent types, and multiple mechanisms addressed with proportionate evidence.

Attribute Selection by Platform: From Chemical Quality to Device Performance

Attribute selection begins with the drug product’s critical quality attributes (CQAs)—assay, related substances, dissolution (or aerodynamic performance for inhalation), particulates, pH, osmolality, appearance, water content, and microbiological endpoints as applicable. For combination platforms, expand the attribute set to include those that reflect device-influenced risks and delivery consistency at aged states. For prefilled syringes and autoinjectors, include delivered volume, glide force/activation force profiles, needle shield removal force, dose accuracy, and silicone oil or subvisible particles that may increase with aging or agitation. For nasal and ophthalmic pumps/sprays, test priming/re-priming, spray pattern and plume geometry, droplet size distribution, shot weight, and dose content uniformity after storage at long-term and accelerated conditions. For metered-dose and dry-powder inhalers, include delivered dose uniformity, aerodynamic particle size distribution (APSD), valve/actuator integrity, and counter function; storage may alter propellant composition or device seals, affecting performance. For transdermal systems, monitor adhesive tack/peel, drug content uniformity, residual drug after wear, and release rate as rheology or backing permeability changes with aging. Each platform has a signature set of functional attributes that must be aged and tested in the worst-case configuration.

Acceptance logic flows from intended clinical performance and relevant standards. Delivered dose accuracy, spray plume metrics, or actuation forces require quantitative acceptance criteria aligned to compendial or product-specific guidance (e.g., dose within a defined percentage of label claim across a specified number of actuations; force within ergonomic and functional bounds; spray morphology within validated ranges linked to deposition). Chemical and microbiological criteria remain specification-driven (lower/upper limits for assay/impurities, micro limits or sterility assurance), and must be met at shelf-life horizons under ICH Q1E’s prediction-bound logic. Attribute selection should also reflect material-interaction risks: where sorption to elastomers threatens potency or preservative free fraction, include relevant chemical surrogates (e.g., free preservative assay) and, if applicable, antimicrobial effectiveness at end of shelf life. Importantly, design choices should be explicit about which attributes are “governing” for expiry—the ones likely to run closest to limits (e.g., impurity X growth in highest-permeability blister; delivered dose drift at low canister fill) and thus require complete long-term arcs across lots. The attribute canvas is therefore stratified: universal drug CQAs, platform-specific device metrics, and mechanism-driven interaction indicators, each with clear acceptance definitions.

Acceptance Criteria & Decision Rules: How to Set, Justify, and Apply Them

Acceptance criteria must be coherent across constituents and defensible against variability expected at aged states. For chemical CQAs, criteria typically align with release specifications and are evaluated using ICH Q1E: expiry is assigned at the time where the one-sided 95 % prediction bound for a future lot remains within specification. For device performance, acceptance is a blend of fixed thresholds and distribution-based criteria. Delivered dose or volume typically uses two-sided tolerances around label claim with unit-to-unit coverage (e.g., 95 % of units within ±X %), while actuation force may use limits linked to validated usability/human-factors thresholds. Spray/plume metrics, APSD, or release rates may use ranges justified by clinically relevant deposition or pharmacokinetic targets. Where standards exist (e.g., specific inhalation or ophthalmic compendial tests), adopt their acceptance language and tie your internal ranges to development data; where standards are absent, derive limits from clinical performance envelopes, process capability, and risk analysis, then confirm with aged performance during stability.

Decision rules must be stated prospectively. For drug CQAs, follow ICH Q1E modeling with poolability tests across lots and pack configurations; guardband expiry if prediction bounds approach limits. For device metrics, adopt unit-aware rules that reflect the geometry of data (e.g., n actuations per container, n containers per lot). Define when a container is a unit of analysis and when a container contributes multiple units (e.g., multiple actuations), and declare how non-independence is handled in summary statistics. For borderline device metrics, require confirmation on replicate containers to avoid false accepts/rejects stemming from a single unit anomaly. Across all attributes, specify OOT/OOS criteria aligned to evaluation logic: for chemical trends, use projection-based OOT rules; for device metrics, use drift or variance expansion beyond predefined control bands across ages. Replacement rules—single confirmatory run from pre-allocated reserve only under documented laboratory invalidation—apply to both chemical and device tests. Acceptance is thus not merely numerical; it is a system of prospectively declared logic that transforms aged measurements into shelf-life conclusions for complex, drug–device systems.

Conditions, Storage Scenarios & Worst-Case Selection (ICH Zone-Aware)

Condition architecture follows ICH Q1A(R2) but must reflect device-specific risks and user environments. For room-temperature products, long-term at 25 °C/60 % RH is standard; for tropical deployment, long-term at 30 °C/75 % RH anchors labels; accelerated at 40 °C/75 % RH reveals mechanisms and triggers intermediate conditions when significant change is observed. Refrigerated or frozen labels require 2–8 °C or colder long-term, with carefully justified excursions and thaw/equilibration SOPs before testing. Device risks often hinge on humidity and temperature: elastomer permeability, adhesive tack, spring performance, and propellant behavior are all temperature-sensitive; moisture uptake drives dissolution drift or spray consistency. Therefore, worst-case selection must combine pack/permeability extremes with device tolerances: smallest strength with highest surface-area-to-volume ratio; thinnest or most permeable barrier; lowest fill fraction for canisters or cartridges at late life; and user-relevant angles or orientations for sprays at the end of canister life.

Stability chambers and execution details matter. Samples are stored in qualified chambers with mapping at storage locations and robust alarm/recovery policies; for device-heavy programs, physical positioning and restraints prevent unintended mechanical stress. Pulls must capture realistic in-use states at shelf life: for multidose presentations, prime/re-prime cycles are executed on aged containers; for autoinjectors, actuation force is tested on aged devices under temperature-controlled conditions that reflect user environments; for patches, peel/tack at end-of-shelf life mirrors skin-temperature conditions. If the label allows CRT excursions for refrigerated products, a targeted excursion arm with device performance checks (e.g., dose accuracy post-excursion) can be decisive. Photolabile systems incorporate ICH Q1B studies (either standalone or integrated) and, where transparent reservoirs are used, photoprotection claims align with real-world light exposures. Through zone-aware design plus worst-case selection, the program ensures that the governing combination—chemically and functionally—appears at the long-term anchors that determine expiry and usability.

Materials, E&L, and Container-Closure Integrity: Linking to Stability Claims

Combination products are uniquely exposed to material interactions because device constituents create extended fluid paths or contact areas. The E&L program must be risk-based and integrated with stability. Extractables and leachables plans identify critical contact materials (e.g., elastomeric plungers, gaskets, adhesives, inked components, polymeric reservoirs, lubricants), map process and sterilization conditions, and characterize chemical risks (monomers, oligomers, antioxidants, plasticizers, catalyst residues, silicone derivatives). Extractables studies (often at exaggerated conditions) define potential migrants; targeted leachables studies on aged, real-time samples confirm presence/absence and quantify relevant analytes. Acceptance hinges on toxicological assessment and thresholds of toxicological concern, but stability data must also show absence of analytical confounding (e.g., chromatographic interferences) and chemical impact on CQAs (e.g., assay drift from sorption). The E&L narrative should directly connect to aged states: “At 24 months, no target leachable exceeded acceptance, and no impact observed on potency or impurities.”

For sterile or microbiologically sensitive products, container-closure integrity (CCI) is vital. USP <1207> families (deterministic methods such as helium leak, vacuum decay, high-voltage leak detection) or validated probabilistic tests demonstrate integrity at initial and aged states. Aging may embrittle polymers or relax seals; therefore, CCI at end-of-shelf life for worst-case packs is compelling. Acceptance is binary (pass/fail within method sensitivity), but the method’s detection limit must be appropriate to the microbial ingress risk model; stability pulls should coordinate so that destructive CCI consumption does not cannibalize chemical/device testing. For preservative-containing multidose systems, E&L/CCI are complemented by antimicrobial effectiveness testing at end-of-shelf life if the contact path or packaging could diminish free preservative. In total, E&L and CCI are not peripheral—they are mechanistic pillars that explain why the combination remains safe and functional as it ages, and they must be explicitly tied to the stability claims in the dossier.

Analytics & Method Readiness for Integrated Drug–Device Programs

Analytical methods must be fit for both drug and device data geometries. For chemical CQAs, validated stability-indicating methods with forced-degradation specificity, robust integration rules, and system suitability tuned to detect meaningful drift are prerequisites; evaluation uses ICH Q1E modeling with poolability assessments across lots and presentations. For device metrics, methods are often standard-operating procedures with calibrated rigs and traceable metrology: force gauges for actuation/glide, automated spray analyzers for plume geometry and droplet size, delivered volume/dose rigs, leak/flow apparatus for on-body injectors, APSD instrumentation for inhalation, peel/tack testers for patches. Readiness means that these methods are not lab curiosities but production-ready: calibrated, cross-site comparable where necessary, and exercised on aged samples during method shake-down. Data integrity expectations apply equally: unit-level data captured with immutable IDs; sample-to-measurement traceability; rounding/reportable arithmetic fixed in controlled templates; and predefined rules for invalidation and single confirmatory testing from reserve when a laboratory assignable cause exists.

Integration across constituents is critical in reporting. For example, a nasal spray stability table at 24 months should display chemical potency/impurities alongside delivered dose per actuation, spray pattern metrics, and shot weight, with footnotes that clearly link units and containers. Where a chemical attribute appears pressured (e.g., rising leachable near threshold), present orthogonal evidence (toxicological assessment, absence of impact on potency/impurities, constant device performance) that supports continued acceptability. For multi-lot datasets, show that device metrics do not degrade across lots as materials age, and that variability is within acceptance envelopes established at release. Finally, coordinate micro/in-use where relevant: aged multidose ophthalmics should pair chemical data with antimicrobial effectiveness and device dose accuracy to support “use within X days after opening.” By operationalizing analytics across both worlds, the program produces a coherent, reviewer-friendly data package.

Risk Controls, Trending & OOT/OOS Handling Tailored to Combo Platforms

Trending must be tuned to attribute geometry. For chemical CQAs, model-based projections and residual-based out-of-trend (OOT) rules work well: trigger when the one-sided prediction bound at the claim horizon crosses a limit, or when a point lies >3σ from the fitted line without assignable cause. For device metrics, use trend bands around functional thresholds and monitor both central tendency and dispersion across units. Examples: delivered dose mean within ±X % and % units within spec; actuation force mean and 95th percentile below the usability ceiling; APSD metrics within bounds; peel/tack medians within adhesive acceptance. Flags are meaningful only if unit-level data are captured and summarized consistently across ages; avoid over-averaging that hides tails, because it is usually the tail (worst-case units) that affects patient performance.

OOT/OOS handling must preserve dataset integrity. OOT for device metrics should trigger verification (calibration, fixture checks, operator technique review) and, if a laboratory cause is plausible and documented, may justify a single confirmatory set on pre-allocated reserve devices. OOS for device metrics—true failure of acceptance—requires investigation akin to chemical OOS, with root cause across materials (aging elastomer force relaxation, adhesive degradation), process capability (component variability), and test execution. Replacement rules are the same across constituents: one confirmed, predeclared path; no serial retesting. Crucially, do not “manufacture” on-time points with reserve when a pull misses its window; stability modeling tolerates sparse data better than manipulated chronology. For high-risk platforms, install early-signal designs (e.g., mid-shelf-life device checks on worst-case packs) so that drift is detected while corrective levers (component changes, lubricant management, label refinements) remain available. This disciplined approach keeps combination-product stability evidence defensible even when mechanisms are multi-factorial.

Operational Playbook & Templates: Making the Program Executable

Execution quality determines credibility. Publish a combination-product stability playbook containing: (1) a Platform Attribute Matrix that lists drug CQAs and device metrics per platform, with acceptance/units/replicate plans; (2) a Worst-Case Map identifying strength×pack×device configurations that must appear at all late long-term anchors; (3) a Reserve Budget per age for both chemical and device tests (e.g., extra vials for assay/impurities; extra canisters or pumps for functional tests) tied to single-use, predeclared confirmation rules; (4) synchronized Pull Schedules that integrate chemical pulls and device functional testing to prevent cannibalization of units; and (5) Data Templates with unit-level tables, summary fields, and fixed rounding/reportable logic. For multi-site programs, include a Comparability Module: a short, pre-study exercise using retained material that demonstrates cross-site equivalence on key device and chemical methods, locking fixtures and operator technique before first real pull.

On the shop floor, the playbook becomes a set of checklists. Device checklists include fixture calibration, environmental set-points for testing, pre-test conditioning of aged units, and operator steps (e.g., priming profiles). Chemical checklists mirror standard method readiness (SST, calibration, integration rules). Chain-of-custody forms carry unique IDs that bind aged containers/devices to results, and separate reserve from primary units. Reporting templates include a Coverage Grid (lot × condition × age × configuration) that marks which combinations were tested at each age, and clearly identifies the governing path for expiry. When the program runs on rails—predefined attributes, fixed acceptance, synchronized calendars, and controlled templates—combination-product stability testing looks and feels like a single, coherent system, which is exactly how reviewers will read it.

Reviewer Pushbacks & Model Answers Specific to Combination Products

Typical pushbacks reflect integration gaps. “Where is the link between E&L and stability?” Answer by pointing to targeted leachables on aged lots at long-term anchors and showing absence below toxicological thresholds, alongside demonstration that no analytical interference or potency drift occurred. “Why were device metrics tested only on fresh units?” Respond with the schedule showing device functional testing on aged units at end-of-shelf life, with acceptance tied to clinical performance envelopes. “How did you choose worst-case?” Provide the worst-case map and rationale (highest permeability pack, lowest fill, smallest strength), and the coverage grid showing these combinations at 24/36-month anchors. “Why is expiry based on chemical attribute X when device metric Y looks marginal?” Explain that expiry is controlled by chemical attribute X per ICH Q1E; device metric Y remained within acceptance across aged units with guardbanded margins, and risk analysis indicates no clinical impact; commit to lifecycle monitoring if needed.

Model language that consistently clears assessment is precise and traceable. Examples: “Expiry is assigned when the one-sided 95 % prediction bound for a future lot at 24 months remains ≤ specification for Impurity A; pooled slope across three lots is supported by tests of slope equality; the worst-case configuration (Strength 5 mg, COP syringe with elastomer B) governs the bound.” Or: “Delivered dose accuracy on aged canisters at 30/75 met predefined acceptance (mean within ±10 %, ≥90 % units within range) across the shelf life; actuation force at 25 °C remained below the usability ceiling with 95th percentile < X N; together these support consistent dose delivery.” Avoid narrative that separates drug and device into unrelated silos; instead, present a single argument where each component reinforces the other. Reviewers are not opposed to complexity; they are opposed to ambiguity. A well-structured, integrated response earns confidence and speeds assessment.

Lifecycle Management & Multi-Region Alignment

Combination products evolve post-approval—component suppliers change, device sub-assemblies are optimized, new strengths or packs are added, and markets with different climatic zones are entered. Lifecycle stability must preserve the integrated grammar. For component changes that could affect E&L or device performance (e.g., alternative elastomer, lubricant, adhesive), run targeted E&L confirmation and device functional tests on aged states of the new configuration, and bridge chemical CQAs with pooled ICH Q1E evaluation; if margins thin, temporarily guardband expiry or limit distribution while more data accrue. For new strengths or packs, use ICH Q1D bracketing/matrixing to reduce test burden but keep the governing worst-case in full long-term arcs across at least two lots. For zone expansion (e.g., adding 30/75 labeling), run complete long-term arcs for two lots in the new zone and re-verify device metrics at those aged states; present side-by-side evaluation demonstrating that both chemical and device attributes remain controlled.

Multi-region dossiers benefit from consistent structure even when tests differ slightly by compendia or local preferences. Keep acceptance language stable across US/UK/EU submissions; map any regional nuances (e.g., preferred device metrics or reporting formats) explicitly without changing the underlying logic. Maintain a living Change Index that ties each post-approval change to its confirmatory stability/E&L/device evidence and to any label modifications. Finally, institutionalize cross-product learning: trend device metric drift, E&L detections, and CCI outcomes across platforms; feed these insights into supplier controls, design refinements, and future attribute selection. The result is a resilient, extensible stability capability for combination products that delivers coherent, globally portable evidence from development through lifecycle.