This blog post is part of a series that explains the study [“APCs — Mirroring the impact factor or legacy of the subscription-based model?”](https://dx.doi.org/10.4119/unibi/2931061) of OA2020-DE.
In the [last blog post](/en/blog/2019/01/08/APCregressionsanalyse_descriptivestatistics/), we analyzed the relationship between the variables with the help of simple, statistical measures. The results indicate that there is positive relationship between APCs, the citation impact (SNIP) and whether the journals is hybrid (costlier and with more impact) or open-access (cheaper and with less impact). However, to state a causal relationship, one needs to run a careful regression analysis.
## Data set
To avoid misleading results, the sample at hand has to be representative for the population. However, much of the [OpenAPC](https://treemaps.intact-project.org/apcdata/openapc/) data does not meet this condition. In our case, we observe a sample of APCs, but for some countries, the sample is not a random drawn from the population, as high APCs are systematically under-reported to the OpenAPC project. In Germany, the Deutsche Forschungsgemeinschaft (DFG), a funding organization, supports publication funds at some universities. If a member of the university is submitting or corresponding author of an article in an open-access journal, the publication fund can take over the obligation to pay the APC up to EUR 2,000. The APC must not be above this limit to be covered by the DFG-supported publication funds. Otherwise, the author has to pay the APC out of department, third-party or private funds. Publication funds systematically report to the OpenAPC project whereas there are almost no ways to report otherwise funded APCs. To make things worse, authors could choose not to publish in expensive open-access journals at all, but to publish in subscription-based journals. The stricter the conditions (e.g. a price cap) the less representative the sample is likely to be. To our knowledge, the conditions for APC funding are least restrictive in United Kingdom. Fortunately, the OpenAPC data set contains plenty of UK data from 2014 to 2016, so that we can base the entire analysis on the UK sub-sample.
## Regression equation
The level of the actually paid APCs are explained by the citation impact (SNIP) of a journal, whether the journal is hybrid or not, which publisher issues the journal, for which subject area the journal is relevant and the year of the payment. Moreover, we check whether the effect of SNIP differs for open-access vs. hybrid journals. The following equation is estimated by ordinary least squares (rounded coefficients below):
We arrive at the following estimated APC-equation. Except for the “Wiley-Blackwell”-coefficient, all coefficients are statistically highly significant.
The isolated effect (i.e. the effect that corrects for other factors) of citation impact on APCs amounts to EUR 728 at open-access journals and to just about EUR 188 EUR (= 728–540) at hybrid journal for each additional SNIP score. That means that “on average” an open-access journal with a SNIP-value of two charges about EUR 728 more than an open-access journal with a SNIP-value of one (other things being equal). For each additional SNIP score, a hybrid journal charges just about EUR 188 more. The finding that a hybrid journal is less sensitive to its impact does not mean that it is cheaper than open-access journals. The “fixed part”, i.e. that part that cannot be explained by the considered variables but is still included, is EUR 519 for publications in open-access journals and EUR 1,915 for publications in hybrid journals (again, other things being equal). Considering that an author could choose to publish an article in one of two alike journals—in a hybrid one for EUR 1,915 or in an open-access one for EUR 519, I wonder what kind of “value” do hybrid journals deliver to researchers for the additional EUR 1,396?
## Estimated, linear relationship
For the vast bulk of articles that appear in zero to average-impact journals, APCs in hybrid journals are much costlier than in open-access counterparts. In the figure below, you can see at which SNIP value journals with different publication modes charge comparable APCs (red and blue line crossing approx. at SNIP=2.5). The figure visualizes the estimated, linear relationship between SNIP and APC. The red line shows the hybrid pricing-pattern of journals in the life sciences in 2016 at the “other” publishers; the blue line the pricing-pattern for their open-access counterparts. If the journals is published in another subject area, by another publisher or in another year, this induces a likewise shift of the lines up or down, respectively. Publications at Elsevier are most expensive and least expensive at PLoS, costlier in life sciences than in social sciences and humanities, other things being equal.
We can calculate the estimated APC for each journal by inserting the journal characteristics in the above equation, as you can see in these examples:
## Conclusion
The multivariate regression analysis largely confirms the insights from the descriptive statistics ([last blog post](/en/blog/2019/01/08/APCregressionsanalyse_descriptivestatistics/)), which was the state of the information science about APCs thitherto. Provided that the APC-equation is correctly specified, now, we can
+ confirm that the relationship between APCs and the other variables is not random,
+ show the magnitude (in euro) of the isolated effect of each variable on the APC-level,
+ identify two pricing-pattern depending on the publication mode (open-access vs. hybrid),
+ use the estimated equation for the prediction of APCs (in euro) of closed-access journals, or for journals that we lack APC information,
+ predict how much hybrid journals would charge if they flipped to open-access or adopted the open-access pricing-setting behavior,
+ predict how much open-access journals would charge is if they adopt the hybrid pricing-setting behavior.
We will discuss the conclusions and their implications for the financial aspects of the open-access transformation in the next, and for the time being last blog post of this series.
## More information
Schönfelder, Nina (2018). *APCs — Mirroring the impact factor or legacy of the subscription-based model?*. Universität Bielefeld. doi:[10.4119/unibi/2931061](https://dx.doi.org/10.4119/unibi/2931061)
Blogpost 1 - [APCs — Mirroring the impact factor or legacy of the subscription-based model? An introduction.](/en/blog/2018/11/26/APCregressionanalysis_introduction/)
Blogpost 2 - [APCs — Mirroring the impact factor or legacy of the subscription-based model? The database.](/en/blog/2018/12/10/APCregressionsanalysis_database/)
Blogpost 3 - [APCs — Mirroring the impact factor or legacy of the subscription-based model? Descriptive statistics.](/en/blog/2019/01/08/APCregressionsanalyse_descriptivestatistics/)