@@ -135,3 +135,117 @@ names in parentheses are python variable names
135135- useSpheroid (use_spheroid) - whether to use a cartesian or spheroidal distance calculation. Default is false
136136
137137In both cases the output is the input DataFrame with the weights column added to each row.
138+
139+ ## Moran I
140+
141+ Moran I is the spatial autocorrelation algorithm, which is using spatial
142+ location and non-spatial attribute. When the value is close to the 1 it
143+ means that there is spatial correlation, when it is close to 0 then the
144+ correlation does not exist and data is randomly distributed. When the
145+ MoranI autocorrelation value is close to -1 it means that there is negative
146+ correlation. Negative correlation means that close values has dissimilar values.
147+
148+ You can see spatial correlation values on the figure below
149+
150+ - on the left there is negative correlation (-1)
151+ - in the middle correlation is positive (1)
152+ - on the right the correlation is close to zero and data is random.
153+
154+ ![ moranI.png] ( ../../image/moranI.png )
155+
156+ Moran statistics can be used as the Scala/Java and Python functions.
157+ As the input function requires weight DataFrame. You can create the
158+ weight DataFrame using Apache Sedona weighting functions. You need
159+ to keep in mind that your input has to have id column that uniquely identifies
160+ the feature and value field. The required minimal schema for the MoranI Apache Sedona
161+ function is:
162+
163+ ```
164+ |-- id: integer (nullable = true)
165+ |-- value: double (nullable = true)
166+ |-- weights: array (nullable = false)
167+ | |-- element: struct (containsNull = false)
168+ | | |-- neighbor: struct (nullable = false)
169+ | | | |-- id: integer (nullable = true)
170+ | | | |-- value: double (nullable = true)
171+ | | |-- value: double (nullable = true)
172+ ```
173+
174+ You can manipulate the value column name and id using function parameters.
175+
176+ To use the [ Apache Sedona weight functions] ( #adddistancebandcolumn ) you need to pass the id column and value column to kept parameters.
177+
178+ === "Scala"
179+
180+ ```scala
181+ val weights = Weighting.addDistanceBandColumn(
182+ positiveCorrelationFrame,
183+ 1.0,
184+ savedAttributes = Seq("id", "value")
185+ )
186+
187+ val moranResult = Moran.getGlobal(weights, idColumn = "id")
188+
189+ // result fields
190+ moranResult.getPNorm
191+ moranResult.getI
192+ moranResult.getZNorm
193+ ```
194+
195+ === "Python"
196+
197+ ```python
198+ from sedona.spark.stats.autocorrelation.moran import Moran
199+ from sedona.spark.stats.weighting import add_binary_distance_band_column
200+
201+ result = add_binary_distance_band_column(
202+ df,
203+ 1.0,
204+ saved_attributes=["id", "value"]
205+ )
206+
207+ moran_i_result = Moran.get_global(result)
208+
209+ ## result fields
210+ moran_i_result.p_norm
211+ moran_i_result.i
212+ moran_i_result.z_norm
213+ ```
214+
215+ In the result you get the Z norm, P norm and Moran I value.
216+
217+ The full signatures of the functions
218+
219+ === "Scala"
220+
221+ ```scala
222+ def getGlobal(
223+ dataframe: DataFrame,
224+ twoTailed: Boolean = true,
225+ idColumn: String = ID_COLUMN,
226+ valueColumnName: String = VALUE_COLUMN): MoranResult
227+
228+ // java interface
229+ public interface MoranResult {
230+ public double getI();
231+ public double getPNorm();
232+ public double getZNorm();
233+ }
234+ ```
235+
236+ === "Python"
237+
238+ ```python
239+ def get_global(
240+ df: DataFrame,
241+ two_tailed: bool = True,
242+ id_column: str = "id",
243+ value_column: str = "value",
244+ ) -> MoranResult
245+
246+ @dataclass
247+ class MoranResult:
248+ i: float
249+ p_norm: float
250+ z_norm: float
251+ ```
0 commit comments