SSRN Author: Boxiao ChenBoxiao Chen SSRN Content
https://privwww.ssrn.com/author=2481276
https://privwww.ssrn.com/rss/en-usWed, 30 Jun 2021 01:19:50 GMTeditor@ssrn.com (Editor)Wed, 30 Jun 2021 01:19:50 GMTwebmaster@ssrn.com (WebMaster)SSRN RSS Generator 1.0REVISION: Tailored Base-Surge Policies in Dual-Sourcing Inventory Systems with Demand LearningWe consider a periodic-review dual-sourcing inventory system, in which the expedited supplier is faster and more costly, while the regular supplier is slower and cheaper. Under full demand distributional information, it is well-known that the optimal policy is extremely complex but the celebrated Tailored Base-Surge (TBS) policy performs near optimally. Under such a policy, a constant order is placed at the regular source in each period, while the order placed at the expedited source follows a simple order-up-to rule. In this paper, we assume that the firm does not know the demand distribution a priori, and makes adaptive inventory ordering decisions in each period based only on the past sales (a.k.a. censored demand) data. The standard performance measure is regret, which is the cost difference between a feasible learning algorithm and the clairvoyant (full-information) benchmark. When the benchmark is chosen to be the (full-information) optimal Tailored Base-Surge policy, we ...
https://privwww.ssrn.com/abstract=3456834
https://privwww.ssrn.com/2038061.htmlMon, 28 Jun 2021 22:04:19 GMTREVISION: Tailored Base-Surge Policies in Dual-Sourcing Inventory Systems with Demand LearningWe consider a periodic-review dual-sourcing inventory system, in which the expedited supplier is faster and more costly, while the regular supplier is slower and cheaper. Under full demand distributional information, it is well-known that the optimal policy is extremely complex but the celebrated Tailored Base-Surge (TBS) policy performs near optimally. Under such a policy, a constant order is placed at the regular source in each period, while the order placed at the expedited source follows a simple order-up-to rule. In this paper, we assume that the firm does not know the demand distribution a priori, and makes adaptive inventory ordering decisions in each period based only on the past sales (a.k.a. censored demand) data. The standard performance measure is regret, which is the cost difference between a feasible learning algorithm and the clairvoyant (full-information) benchmark. When the benchmark is chosen to be the (full-information) optimal Tailored Base-Surge policy, we ...
https://privwww.ssrn.com/abstract=3456834
https://privwww.ssrn.com/2037499.htmlMon, 28 Jun 2021 09:30:54 GMTREVISION: Tailored Base-Surge Policies in Dual-Sourcing Inventory Systems with Demand LearningWe consider a periodic-review dual-sourcing inventory system, in which the expedited supplier is faster and more costly, while the regular supplier is slower and cheaper. Under full demand distributional information, it is well-known that the optimal policy is extremely complex but the celebrated Tailored Base-Surge (TBS) policy performs near optimally. Under such a policy, a constant order is placed at the regular source in each period, while the order placed at the expedited source follows a simple order-up-to rule. In this paper, we assume that the firm does not know the demand distribution a priori, and makes adaptive inventory ordering decisions in each period based only on the past sales (a.k.a. censored demand) data. The standard performance measure is regret, which is the cost difference between a feasible learning algorithm and the clairvoyant (full-information) benchmark. When the benchmark is chosen to be the (full-information) optimal Tailored Base-Surge policy, we ...
https://privwww.ssrn.com/abstract=3456834
https://privwww.ssrn.com/2037498.htmlMon, 28 Jun 2021 09:30:54 GMTREVISION: Pricing and Positioning of Horizontally Differentiated Products with Incomplete Demand InformationWe consider the problem of determining the optimal prices and product configurations of horizontally differentiated products when customers purchase according to a locational (Hotelling) choice model, and where the problem parameters are initially unknown to the decision maker. Both for the single-product and multiple-product setting we propose a data-driven algorithm that learns the optimal prices and product configurations from accumulating sales data, and we show that their regret -- the expected cumulative loss caused by not using optimal decisions -- after T time periods is O(T^{1/2 + o(1)}). We accompany this result by showing that, even in the single-product setting, the regret of any algorithm is bounded from below by a constant times T^{1/2}, implying that our algorithms are asymptotically near-optimal. A numerical study that compares our algorithms to a benchmark shows that our algorithm is also competitive on a finite time horizon.
https://privwww.ssrn.com/abstract=3682921
https://privwww.ssrn.com/1985483.htmlThu, 28 Jan 2021 12:45:31 GMTREVISION: Dynamic Pricing and Inventory Control with Fixed Ordering Cost and Incomplete Demand InformationWe consider the periodic review dynamic pricing and inventory control problem with fixed ordering cost. Demand is random and price dependent, and unsatisfied demand is backlogged. With complete demand information, the celebrated (s,S,p) policy is proved to be optimal, where s and S are the reorder point and order-up-to level for ordering strategy, and p, a function of on-hand inventory level, characterizes the pricing strategy. In this paper, we consider incomplete demand information and develop online learning algorithms whose average profit approaches that of the optimal (s,S,p) with a tight O ̃(√T) regret rate. <br><br>A number of salient features differentiate our work from the existing online learning researches in the OM literature. First, computing the optimal (s,S,p) policy requires solving a dynamic programming (DP) over multiple periods involving unknown quantities, which is different from the majority of learning problems in operations management that only require solving ...
https://privwww.ssrn.com/abstract=3632475
https://privwww.ssrn.com/1981711.htmlMon, 18 Jan 2021 10:24:38 GMTREVISION: Optimal Policies for Dynamic Pricing and Inventory Control with Nonparametric Censored DemandsWe study the fundamental model in joint pricing and inventory replenishment control under the learning-while-doing framework, with T consecutive review periods and the firm not knowing the demand curve a priori. At the beginning of each period, the retailer makes both a price decision and an inventory order-up-to level decision, and collects revenues from consumers' realized demands while suffering costs from either holding unsold inventory items, or lost sales from unsatisfied customer demands. We make the following contributions to this fundamental problem as follows:<br><br>1. We propose a novel inversion method based on empirical measures to consistently estimate the difference of the instantaneous reward functions at two prices, directly tackling the fundamental challenge brought by censored demands, without raising the order-up-to levels to unnaturally high levels to collect more demand information. Based on this technical innovation, we design bisection and trisection search ...
https://privwww.ssrn.com/abstract=3750413
https://privwww.ssrn.com/1974862.htmlSat, 26 Dec 2020 10:31:41 GMTNew: Self-adapting Robustness in Demand LearningWe study dynamic pricing over a finite number of periods in the presence of demand model ambiguity. Departing from the typical no-regret learning environment, where price changes are allowed at any time, pricing decisions are made at pre-specified points in time and each price can be applied to a large number of arrivals. In this environment, which arises in retailing, a pricing decision based on an incorrect demand model can significantly impact cumulative revenue. We develop an adaptively-robust-learning (ARL) pricing policy that learns the true model parameters from the data while actively managing demand model ambiguity. It optimizes an objective that is robust with respect to a self-adapting set of demand models, where a given model is included in this set only if the sales data revealed from prior pricing decisions makes it ``probable'. As a result, it gracefully transitions from being robust when demand model ambiguity is high to minimizing regret when this ambiguity diminishes
https://privwww.ssrn.com/abstract=3734591
https://privwww.ssrn.com/1972874.htmlThu, 17 Dec 2020 10:33:38 GMTREVISION: Optimal Policies for Dynamic Pricing and Inventory Control with Nonparametric Censored DemandsWe study the fundamental model in joint pricing and inventory replenishment control under the learning-while-doing framework, with T consecutive review periods and the firm not knowing the demand curve a priori. At the beginning of each period, the retailer makes both a price decision and an inventory order-up-to level decision, and collects revenues from consumers' realized demands while suffering costs from either holding unsold inventory items, or lost sales from unsatisfied customer demands. We make the following contributions to this fundamental problem as follows:<br><br>1. We propose a novel inversion method based on empirical measures to consistently estimate the difference of the instantaneous reward functions at two prices, directly tackling the fundamental challenge brought by censored demands, without raising the order-up-to levels to unnaturally high levels to collect more demand information. Based on this technical innovation, we design bisection and trisection search ...
https://privwww.ssrn.com/abstract=3750413
https://privwww.ssrn.com/1972873.htmlThu, 17 Dec 2020 10:30:01 GMTREVISION: Nonparametric Learning Algorithms for Joint Pricing and Inventory Control with Lost-Sales and Censored DemandWe consider a joint pricing and inventory control problem in which the customer’s response to selling price and the demand distribution are not known a priori. Unsatisfied demand is lost and unobserved, and the only available information for decision-making is the observed sales data (a.k.a. censored demand). Conventional approaches, such as stochastic approximation, online convex optimization, and continuum-armed bandit algorithms, cannot be employed since neither the realized values of the profit function nor its derivatives are known. A major challenge of this problem lies in that the estimated profit function constructed from observed sales data is multimodal in price. We develop a nonparametric spline approximation based learning algorithm. The algorithm separates the planning horizon into a disjoint exploration phase and an exploitation phase. During the exploration phase, the price space is discretized, and each price is offered an equal number of periods together with a ...
https://privwww.ssrn.com/abstract=2836057
https://privwww.ssrn.com/1949348.htmlFri, 09 Oct 2020 11:02:57 GMTREVISION: Pricing and Positioning of Horizontally Differentiated Products with Incomplete Demand InformationWe consider the problem of determining the optimal prices and product configurations of horizontally differentiated products when customers purchase according to a locational (Hotelling) choice model, and where the problem parameters are initially unknown to the decision maker. Both for the single-product and multiple-product setting we propose a data-driven algorithm that learns the optimal prices and product configurations from accumulating sales data, and we show that their regret -- the expected cumulative loss caused by not using optimal decisions -- after T time periods is O(T^{1/2 + o(1)}). We accompany this result by showing that, even in the single-product setting, the regret of any algorithm is bounded from below by a constant times T^{1/2}, implying that our algorithms are asymptotically near-optimal. A numerical study that compares our algorithms to a benchmark shows that our algorithm is also competitive on a finite time horizon.
https://privwww.ssrn.com/abstract=3682921
https://privwww.ssrn.com/1940143.htmlThu, 10 Sep 2020 16:26:48 GMTREVISION: Parametric Demand Learning with Limited Price Explorations in a Backlog Stochastic Inventory SystemIn this paper we study a multi-period stochastic inventory system with backlogs. Demand in each period is random and price sensitive, but the firm has little or no prior knowledge about demand distribution and how each customer responds to the selling price, so the firm needs to make periodic pricing and inventory replenishment decisions to maximize expected total profit. We consider the scenario where the firm is faced with the business constraint that prevents it from conducting extensive price exploration, and develop parametric data-driven algorithms for pricing and inventory decisions. We measure the performances of the algorithms by regret, which is the profit loss compared to a clairvoyant who has complete information about the demand distribution. We analyze the cases where the number of price changes is restricted to a given number or a small number relative to the planning horizon, and show that the regrets for the corresponding learning algorithms converge at the best ...
https://privwww.ssrn.com/abstract=3157347
https://privwww.ssrn.com/1937047.htmlTue, 01 Sep 2020 08:22:46 GMTREVISION: Parametric Demand Learning with Limited Price Explorations in a Backlog Stochastic Inventory SystemIn this paper we study a multi-period stochastic inventory system with backlogs. Demand in each period is random and price sensitive, but the firm has little or no prior knowledge about demand distribution and how each customer responds to the selling price, so the firm needs to make periodic pricing and inventory replenishment decisions to maximize expected total profit. We consider the scenario where the firm is faced with the business constraint that prevents it from conducting extensive price exploration, and develop parametric data-driven algorithms for pricing and inventory decisions. We measure the performances of the algorithms by regret, which is the profit loss compared to a clairvoyant who has complete information about the demand distribution. We analyze the cases where the number of price changes is restricted to a given number or a small number relative to the planning horizon, and show that the regrets for the corresponding learning algorithms converge at the best ...
https://privwww.ssrn.com/abstract=3157347
https://privwww.ssrn.com/1936718.htmlMon, 31 Aug 2020 09:34:04 GMTREVISION: Multi-Modal Dynamic PricingWe consider a stylistic question of dynamic pricing of a single product with demand learning. The candidate prices belong to a wide range of price interval, and the modeling of the demand functions is nonparametric in nature, imposing only smoothness regularity conditions. One important aspect of our modeling is the possibility of the expected reward function to be non-convex and indeed multi-modal, which leads to many conceptual and technical challenges. Our proposed algorithm is inspired by both the Upper-Confidence-Bound (UCB) algorithm for multi-armed bandit and the Optimism-in-Face-of-Uncertainty (OFU) principle arising from linear contextual bandits. Through rigorous regret analysis, we demonstrate that our proposed algorithm achieves optimal worst-case regret over a wide range of smooth function classes. More specifically, for k-times smooth functions and T selling periods, the regret of our propose algorithm is O(T^{(k+1)/(2k+1)}), which is shown to be optimal via information ...
https://privwww.ssrn.com/abstract=3489355
https://privwww.ssrn.com/1922766.htmlMon, 20 Jul 2020 09:55:47 GMTREVISION: Dynamic Pricing and Inventory Control with Fixed Ordering Cost and Incomplete Demand InformationWe consider the periodic review dynamic pricing and inventory control problem with fixed ordering cost. Demand is random and price dependent, and unsatisfied demand is backlogged. With complete demand information, the celebrated (s,S,p) policy is proved to be optimal, where s and S are the reorder point and order-up-to level for ordering strategy, and p, a function of on-hand inventory level, characterizes the pricing strategy. In this paper, we consider incomplete demand information and develop online learning algorithms whose average profit approaches that of the optimal (s,S,p) with a tight O ̃(√T) regret rate. <br><br>A number of salient features differentiate our work from the existing online learning researches in the OM literature. First, computing the optimal (s,S,p) policy requires solving a dynamic programming (DP) over multiple periods involving unknown quantities, which is different from the majority of learning problems in operations management that only require solving ...
https://privwww.ssrn.com/abstract=3632475
https://privwww.ssrn.com/1917181.htmlTue, 07 Jul 2020 07:55:18 GMT